Skip to main content

Monitoring setup on Supported systems

This tutorial guides you through the process of installing a sample monitoring setup on a VM using OpenTelemetry that can be used together with the Observability patterns provided by nevisAdmin 4.

caution

The configuration presented here is intended to be used with Product Analytics, and stores all data for 24 hours.

OpenTelemetry agent

Download opentelemetry-extensions-all-<version>.jar from the Nevis portal and save it as /opt/agent/opentelemetry-javaagent.jar

Prometheus

Installer

The following example script can be used to install Prometheus and start it as a systemd service:

#!/bin/bash
set -e
prometheus_version=2.52.0

echo "Adding Prometheus user and group"
cat >> /etc/passwd <<EOF
prometheus:x:31000:31000:Prometheus server:/var/opt/prometheus:/sbin/nologin
EOF
cat >> /etc/group <<EOF
prometheus:x:31000:
EOF

echo "Adding Prometheus configuration"
mkdir -p /var/opt/prometheus/conf
cat > /var/opt/prometheus/conf/prometheus.yml << EOF
# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs: []
EOF
mkdir -p /var/opt/prometheus/data
chown -R prometheus:prometheus /var/opt/prometheus
chgrp -R prometheus /var/opt/prometheus

echo "Install Prometheus"
echo "fetching https://github.com/prometheus/prometheus/releases/download/v${prometheus_version}/prometheus-${prometheus_version}.linux-amd64.tar.gz"
curl -L -o prometheus-${prometheus_version}.linux-amd64.tar.gz "https://github.com/prometheus/prometheus/releases/download/v${prometheus_version}/prometheus-${prometheus_version}.linux-amd64.tar.gz"
tar -zxf prometheus-${prometheus_version}.linux-amd64.tar.gz
mv -Z prometheus-${prometheus_version}.linux-amd64 /opt
if [ $? -ne 0 ]; then
echo "ERROR: prometheus-${prometheus_version}.linux-amd64.tar.gz corrupt"
exit 1
fi
ln -s /opt/prometheus-${prometheus_version}.linux-amd64 /opt/prometheus
chown -R prometheus:prometheus /opt/prometheus-${prometheus_version}.linux-amd64
chgrp -R prometheus /opt/prometheus-${prometheus_version}.linux-amd64

echo "Create Prometheus systemd file"
cat > /etc/systemd/system/prometheus.service << EOF
[Unit]
Description=Prometheus Monitoring
Wants=network-online.target
After=network-online.target

[Service]
User=prometheus
Group=prometheus
Restart=on-failure

ExecStart=/opt/prometheus/prometheus \
--config.file=/var/opt/prometheus/conf/prometheus.yml \
--storage.tsdb.path=/var/opt/prometheus/data \
--storage.tsdb.retention.time=24h \
--web.enable-remote-write-receiver \
--log.level=warn

[Install]
WantedBy=multi-user.target
EOF

echo "Enable prometheus systemd service"
systemctl enable prometheus.service
echo "Installation finished"

Manual installation

Follow the official installation guide: https://prometheus.io/docs/prometheus/latest/installation/

With the following configuration:

global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs: []

And start it with the --web.enable-remote-write-receiver feature flag.

OpenTelemetry collector

Installer

The following example script can be used to install the OpenTelemetry collector and start it as a systemd service:

#!/bin/bash
set -e
echo "Adding otelcol user and group"
otelcol_version=0.93.0
cat >> /etc/passwd <<EOF
otelcol:x:31001:31001:OpenTelemetry collector:/var/opt/otelcol:/sbin/nologin
EOF
cat >> /etc/group <<EOF
otelcol:x:31001:
EOF

echo "Adding OpenTelemetry configuration"
mkdir -p /var/opt/otelcol/conf
cat > /var/opt/otelcol/conf/otelcol-config.yml << EOF
extensions:
zpages:
endpoint: localhost:55679

receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318

processors:
batch:
memory_limiter:
# 75% of maximum memory up to 2G
limit_mib: 1536
# 25% of limit up to 2G
spike_limit_mib: 512
check_interval: 5s

exporters:
logging: {}
prometheusremotewrite:
endpoint: http://localhost:9090/api/v1/write
debug:
verbosity: basic

service:
pipelines:
logs:
processors: [memory_limiter, batch]
exporters: [logging]
receivers: [otlp]
traces:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [logging]
metrics:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [prometheusremotewrite]
extensions: [zpages]
EOF

echo "Installing OpenTelemetry"
chown -R otelcol:otelcol /var/opt/otelcol
mkdir -p /opt/otelcol-${otelcol_version}
echo "fetching https://github.com/open-telemetry/opentelemetry-collector-releases/releases/download/v${otelcol_version}/otelcol_${otelcol_version}_linux_amd64.tar.gz"
curl -L -o otelcol-${otelcol_version}.linux-amd64.tar.gz "https://github.com/open-telemetry/opentelemetry-collector-releases/releases/download/v${otelcol_version}/otelcol_${otelcol_version}_linux_amd64.tar.gz"
mkdir -p otelcol-${otelcol_version}
tar -zxf otelcol-${otelcol_version}.linux-amd64.tar.gz -C otelcol-${otelcol_version}
mv -Z otelcol-${otelcol_version} /opt
if [ $? -ne 0 ]; then
echo "ERROR: otelcol-${otelcol_version}.linux-amd64.tar.gz corrupt"
exit 1
fi
ln -s /opt/otelcol-${otelcol_version} /opt/otelcol
chown -R prometheus:prometheus /opt/otelcol-${otelcol_version}
chgrp -R prometheus /opt/otelcol-${otelcol_version}

echo "Create OpenTelemetry collector systemd file"
cat > /etc/systemd/system/otelcol.service << EOF
[Unit]
Description=OpenTelemetry collector
Wants=network-online.target
After=network-online.target

[Service]
User=otelcol
Group=otelcol
Restart=on-failure

ExecStart=/opt/otelcol/otelcol \
--config=/var/opt/otelcol/conf/otelcol-config.yml

[Install]
WantedBy=multi-user.target
EOF

echo "Enable otelcol systemd service"
systemctl enable otelcol.service
echo "Installation finished"

Manual installation

Follow the official installation guide: https://opentelemetry.io/docs/collector/installation/

And use the following configuration:

extensions:
zpages:
endpoint: localhost:55679

receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318

processors:
batch:
memory_limiter:
# 75% of maximum memory up to 2G
limit_mib: 1536
# 25% of limit up to 2G
spike_limit_mib: 512
check_interval: 5s

exporters:
logging: {}
prometheusremotewrite:
endpoint: http://localhost:9090/api/v1/write
debug:
verbosity: basic

service:
pipelines:
logs:
processors: [memory_limiter, batch]
exporters: [logging]
receivers: [otlp]
traces:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [logging]
metrics:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [prometheusremotewrite]
extensions: [zpages]

Endpoints

otelUrl: http://<vm-address>:4318
tracesEndpoint: http://<vm-address>:4318/v1/traces
metricsEndpoint: http://<vm-address>:4318/v1/metrics
logsEndpoint: http://<vm-address>:4318/v1/logs
prometheusUrl: http://<vm-address>:9090