Monitoring setup on Supported systems
This tutorial guides you through the process of installing a sample monitoring setup on a VM using OpenTelemetry that can be used together with the Observability patterns provided by nevisAdmin 4.
The configuration presented here is intended to be used with Product Analytics, and stores all data for 24 hours.
OpenTelemetry agent
Download opentelemetry-extensions-all-<version>.jar
from the Nevis portal and save it as /opt/agent/opentelemetry-javaagent.jar
Prometheus
Installer
The following example script can be used to install Prometheus and start it as a systemd service:
#!/bin/bash
set -e
prometheus_version=2.52.0
echo "Adding Prometheus user and group"
cat >> /etc/passwd <<EOF
prometheus:x:31000:31000:Prometheus server:/var/opt/prometheus:/sbin/nologin
EOF
cat >> /etc/group <<EOF
prometheus:x:31000:
EOF
echo "Adding Prometheus configuration"
mkdir -p /var/opt/prometheus/conf
cat > /var/opt/prometheus/conf/prometheus.yml << EOF
# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs: []
EOF
mkdir -p /var/opt/prometheus/data
chown -R prometheus:prometheus /var/opt/prometheus
chgrp -R prometheus /var/opt/prometheus
echo "Install Prometheus"
echo "fetching https://github.com/prometheus/prometheus/releases/download/v${prometheus_version}/prometheus-${prometheus_version}.linux-amd64.tar.gz"
curl -L -o prometheus-${prometheus_version}.linux-amd64.tar.gz "https://github.com/prometheus/prometheus/releases/download/v${prometheus_version}/prometheus-${prometheus_version}.linux-amd64.tar.gz"
tar -zxf prometheus-${prometheus_version}.linux-amd64.tar.gz
mv -Z prometheus-${prometheus_version}.linux-amd64 /opt
if [ $? -ne 0 ]; then
echo "ERROR: prometheus-${prometheus_version}.linux-amd64.tar.gz corrupt"
exit 1
fi
ln -s /opt/prometheus-${prometheus_version}.linux-amd64 /opt/prometheus
chown -R prometheus:prometheus /opt/prometheus-${prometheus_version}.linux-amd64
chgrp -R prometheus /opt/prometheus-${prometheus_version}.linux-amd64
echo "Create Prometheus systemd file"
cat > /etc/systemd/system/prometheus.service << EOF
[Unit]
Description=Prometheus Monitoring
Wants=network-online.target
After=network-online.target
[Service]
User=prometheus
Group=prometheus
Restart=on-failure
ExecStart=/opt/prometheus/prometheus \
--config.file=/var/opt/prometheus/conf/prometheus.yml \
--storage.tsdb.path=/var/opt/prometheus/data \
--storage.tsdb.retention.time=24h \
--web.enable-remote-write-receiver \
--log.level=warn
[Install]
WantedBy=multi-user.target
EOF
echo "Enable prometheus systemd service"
systemctl enable prometheus.service
echo "Installation finished"
Manual installation
Follow the official installation guide: https://prometheus.io/docs/prometheus/latest/installation/
With the following configuration:
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs: []
And start it with the --web.enable-remote-write-receiver
feature flag.
OpenTelemetry collector
Installer
The following example script can be used to install the OpenTelemetry collector and start it as a systemd service:
#!/bin/bash
set -e
echo "Adding otelcol user and group"
otelcol_version=0.93.0
cat >> /etc/passwd <<EOF
otelcol:x:31001:31001:OpenTelemetry collector:/var/opt/otelcol:/sbin/nologin
EOF
cat >> /etc/group <<EOF
otelcol:x:31001:
EOF
echo "Adding OpenTelemetry configuration"
mkdir -p /var/opt/otelcol/conf
cat > /var/opt/otelcol/conf/otelcol-config.yml << EOF
extensions:
zpages:
endpoint: localhost:55679
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
processors:
batch:
memory_limiter:
# 75% of maximum memory up to 2G
limit_mib: 1536
# 25% of limit up to 2G
spike_limit_mib: 512
check_interval: 5s
exporters:
logging: {}
prometheusremotewrite:
endpoint: http://localhost:9090/api/v1/write
debug:
verbosity: basic
service:
pipelines:
logs:
processors: [memory_limiter, batch]
exporters: [logging]
receivers: [otlp]
traces:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [logging]
metrics:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [prometheusremotewrite]
extensions: [zpages]
EOF
echo "Installing OpenTelemetry"
chown -R otelcol:otelcol /var/opt/otelcol
mkdir -p /opt/otelcol-${otelcol_version}
echo "fetching https://github.com/open-telemetry/opentelemetry-collector-releases/releases/download/v${otelcol_version}/otelcol_${otelcol_version}_linux_amd64.tar.gz"
curl -L -o otelcol-${otelcol_version}.linux-amd64.tar.gz "https://github.com/open-telemetry/opentelemetry-collector-releases/releases/download/v${otelcol_version}/otelcol_${otelcol_version}_linux_amd64.tar.gz"
mkdir -p otelcol-${otelcol_version}
tar -zxf otelcol-${otelcol_version}.linux-amd64.tar.gz -C otelcol-${otelcol_version}
mv -Z otelcol-${otelcol_version} /opt
if [ $? -ne 0 ]; then
echo "ERROR: otelcol-${otelcol_version}.linux-amd64.tar.gz corrupt"
exit 1
fi
ln -s /opt/otelcol-${otelcol_version} /opt/otelcol
chown -R prometheus:prometheus /opt/otelcol-${otelcol_version}
chgrp -R prometheus /opt/otelcol-${otelcol_version}
echo "Create OpenTelemetry collector systemd file"
cat > /etc/systemd/system/otelcol.service << EOF
[Unit]
Description=OpenTelemetry collector
Wants=network-online.target
After=network-online.target
[Service]
User=otelcol
Group=otelcol
Restart=on-failure
ExecStart=/opt/otelcol/otelcol \
--config=/var/opt/otelcol/conf/otelcol-config.yml
[Install]
WantedBy=multi-user.target
EOF
echo "Enable otelcol systemd service"
systemctl enable otelcol.service
echo "Installation finished"
Manual installation
Follow the official installation guide: https://opentelemetry.io/docs/collector/installation/
And use the following configuration:
extensions:
zpages:
endpoint: localhost:55679
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
processors:
batch:
memory_limiter:
# 75% of maximum memory up to 2G
limit_mib: 1536
# 25% of limit up to 2G
spike_limit_mib: 512
check_interval: 5s
exporters:
logging: {}
prometheusremotewrite:
endpoint: http://localhost:9090/api/v1/write
debug:
verbosity: basic
service:
pipelines:
logs:
processors: [memory_limiter, batch]
exporters: [logging]
receivers: [otlp]
traces:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [logging]
metrics:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [prometheusremotewrite]
extensions: [zpages]
Endpoints
otelUrl: http://<vm-address>:4318
tracesEndpoint: http://<vm-address>:4318/v1/traces
metricsEndpoint: http://<vm-address>:4318/v1/metrics
logsEndpoint: http://<vm-address>:4318/v1/logs
prometheusUrl: http://<vm-address>:9090