From 22bb13cb24bb08ae99875cf418707ad51b3fc458 Mon Sep 17 00:00:00 2001 From: Christian Cleberg Date: Fri, 20 Sep 2024 13:51:33 -0500 Subject: add prometheus-grafana-cloud post --- .../blog/2024-09-19-prometheus-grafana-cloud.org | 347 +++++++++++++++++++++ 1 file changed, 347 insertions(+) create mode 100644 content/blog/2024-09-19-prometheus-grafana-cloud.org (limited to 'content/blog/2024-09-19-prometheus-grafana-cloud.org') diff --git a/content/blog/2024-09-19-prometheus-grafana-cloud.org b/content/blog/2024-09-19-prometheus-grafana-cloud.org new file mode 100644 index 0000000..924eab9 --- /dev/null +++ b/content/blog/2024-09-19-prometheus-grafana-cloud.org @@ -0,0 +1,347 @@ +#+date: <2024-09-20 Fri 13:38:52> +#+title: Linux Observability with Self-Hosted Prometheus and Grafana Cloud +#+description: Learn how to self-host a Prometheus data collection tool with Docker and visualize the results with Grafana Cloud. +#+filetags: :linux:grafana: +#+slug: prometheus-grafana-cloud + +This tutorial will guide you through the process of: + +1. Configuring a free Grafana cloud account. +2. Installing Prometheus to store metrics. +3. Installing Node Exporter to export machine metrics for Prometheus. +4. Installing Nginx Exporter to export Nginx metrics for Prometheus. +5. Visualizing data in Grafana dashboards. +6. Configure alerts based on Grafana metrics. + +* Grafana Cloud + +To get started, visit the [[https://grafana.com/auth/sign-up/create-user][Grafana website]] and create a free account. + +** Prometheus Data Source + +By default, a Prometheus data source should exist in your data sources page +(=$yourOrg.grafana.net/connections/datasources=). If not, add a new data source +using the Prometheus type. + +Once you have a valid Prometheus data source, open the data source and note the +following items: + +| Data | Example | +|-----------------------+----------------------------------------------------------------| +| Prometheus Server URL | https://prometheus-prod-13-prod-us-east-0.grafana.net/api/prom | +|-----------------------+----------------------------------------------------------------| +| User | 1234567 | +|-----------------------+----------------------------------------------------------------| +| Password | configured | + +** Cloud Access Policy Token + +Now let's create an access token in Grafana. Navigate to the Administration +> Users and Access > Cloud Access Policies page and create an access policy. + +The =metrics > write= scope must be enabled within the access policy you choose. + +Once you have an access policy with the correct scope, click the Add Token +button and be sure to copy and save the token since it will disappear once the +modal window is closed. + +** Dashboards + +Finally, let's create a couple dashboards so that we can easily explore the data +that we will be importing from the server. + +I recommend importing the following dashboards: + +- [[https://grafana.com/grafana/dashboards/1860-node-exporter-full/][Node Exporter Full]] +- [[https://github.com/nginxinc/nginx-prometheus-exporter/blob/main/grafana][nginx-prometheus-exporter]] +- Prometheus 2.0 Stats + +Refer to the bottom of the post for dashboard screenshots! + +* Docker + +On the machine that you want to observe, make sure Docker and Docker Compose are +installed. This tutorial will be using Docker Compose to create a group of +containers that will work together to send metrics to Grafana. + +Let's start by creating a working directory. + +#+begin_src shell +mkdir ~/prometheus && \ +cd ~/prometheus && \ +nano compose.yml +#+end_src + +Within the =compose.yml= file, let's paste the following: + +#+begin_src yaml +# compose.yml + +networks: + monitoring: + driver: bridge + +volumes: + prometheus_data: {} + +services: + nginx-exporter: + image: nginx/nginx-prometheus-exporter + container_name: nginx-exporter + restart: unless-stopped + command: + - '--nginx.scrape-uri=http://host.docker.internal:8080/stub_status' + expose: + - 9113 + networks: + - monitoring + extra_hosts: + - host.docker.internal:host-gateway + + node-exporter: + image: prom/node-exporter:latest + container_name: node-exporter + restart: unless-stopped + volumes: + - /proc:/host/proc:ro + - /sys:/host/sys:ro + - /:/rootfs:ro + command: + - '--path.procfs=/host/proc' + - '--path.rootfs=/rootfs' + - '--path.sysfs=/host/sys' + - '--collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)' + expose: + - 9100 + networks: + - monitoring + + prometheus: + image: prom/prometheus:latest + container_name: prometheus + restart: unless-stopped + volumes: + - ./prometheus.yml:/etc/prometheus/prometheus.yml + - prometheus_data:/prometheus + command: + - '--config.file=/etc/prometheus/prometheus.yml' + - '--storage.tsdb.path=/prometheus' + - '--web.console.libraries=/etc/prometheus/console_libraries' + - '--web.console.templates=/etc/prometheus/consoles' + - '--web.enable-lifecycle' + expose: + - 9090 + networks: + - monitoring +#+end_src + +#+begin_src shell +sudo docker compose up -d +#+end_src + +> I'm not sure if it made a difference but I also whitelisted port 8080 on my +> local firewall with =sudo ufw allow 8080=. + +Next, let's create a =prometheus.yml= configuration file. + +,#+begin_src sh +nano prometheus.yml +#+end_src + +#+begin_src yaml +# prometheus.yml + +global: + scrape_interval: 1m + +scrape_configs: + - job_name: 'prometheus' + scrape_interval: 1m + static_configs: + - targets: ['localhost:9090'] + + - job_name: 'node' + static_configs: + - targets: ['node-exporter:9100'] + + - job_name: 'nginx' + scrape_interval: 5s + static_configs: + - targets: ['nginx-exporter:9113'] + +remote_write: + - url: 'https://prometheus-prod-13-prod-us-east-0.grafana.net/api/prom/push' + basic_auth: + username: 'prometheus-grafana-username' + password: 'access-policy-token' +#+end_src + +** Nginx + +To enable to the Nginx statistics we need for the nginx-exporter container, we +need to modify the Nginx configuration on the host. + +More specifically, we need to create a path for the =stub_status= to be returned +when we query port 8080 on our localhost. + +#+begin_src shell +sudo nano /etc/nginx/conf.d/default.conf +#+end_src + +#+begin_src conf +server { + listen 8080; + listen [::]:8080; + + location /stub_status { + stub_status; + } +} +#+end_src + +#+begin_src shell +sudo systemctl restart nginx.service +#+end_src + +** Debugging + +At this point, everything should be running smoothly. If not, here are a few +areas to check and see if any obvious errors exist. + +Nginx: Curl the stub_status from the Nginx web server on the host machine to see +if Nginx and stub_status are working properly. + +#+begin_src shell +curl http://127.0.0.1:8080/stub_status + +# EXPECTED RESULTS: +Active connections: 101 +server accepts handled requests + 7510 7510 9654 +Reading: 0 Writing: 1 Waiting: 93 +#+end_src + +Nginx-Exporter: Curl the exported Nginx metrics. + +#+begin_src sh +# Figure out the IP address of the Docker container +sudo docker network inspect grafana_monitoring + +... +"Name": "nginx-exporter", +"EndpointID": "ef999a53eb9e0753199a680f8d78db7c2a8d5f442626df0b1bb945f03b73dcdd", +"MacAddress": "02:42:c0:a8:40:02", +"IPv4Address": "192.168.64.2/20", +... + +# Curl the exported Nginx metrics +curl 192.168.64.2:9113/metrics + +# EXPECTED RESULTS: +... +# HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles. +# TYPE go_gc_duration_seconds summary +go_gc_duration_seconds{quantile="0"} 2.9927e-05 +go_gc_duration_seconds{quantile="0.25"} 4.24e-05 +go_gc_duration_seconds{quantile="0.5"} 4.8531e-05 +... +#+end_src + +Node-Exporter: Curl the exporter node machine metrics. + +#+begin_src shell +# Curl the exported Node metrics +curl 192.168.64.3:9100/metrics + +# EXPECTED RESULTS: +... +# HELP promhttp_metric_handler_requests_total Total number of scrapes by HTTP status code. +# TYPE promhttp_metric_handler_requests_total counter +promhttp_metric_handler_requests_total{code="200"} 47 +promhttp_metric_handler_requests_total{code="500"} 0 +promhttp_metric_handler_requests_total{code="503"} 0 +... +#+end_src + +Grafana: Open the Explore panel and look to see if any metrics are coming +through the Prometheus data source. If not, something on the machine is +preventing data from flowing through. + +* Alerts & IRM + +Now that we have our data connected and visualized, we can define alerting rules +and determine what Grafana should do when an alert is triggered. + +** OnCall + +#+caption: OnCall +[[https://img.cleberg.net/blog/20240920-prometheus-grafana-cloud/oncall.png]] + +Within the Alerts & IRM section of Grafana (=/alerts-and-incidents=), open the +Users page. + +The Users page allows you to configure user connections such as: + +- Mobile App +- Slack +- Telegram +- MS Teams +- iCal +- Google Calendar + +In addition to the connections of each user, you can specify how each user or +team is alerted for Default Notifications and Important Notifications. + +Finally, you can access the Schedules page within the OnCall module to schedule +users and teams to be on call for specific date and time ranges. For my +purposes, I put myself on-call 24/7 so that I receive all alerts. + +#+caption: User Information +[[https://img.cleberg.net/blog/20240920-prometheus-grafana-cloud/irm_user_info.png]] + +** Alerting + +#+caption: Alerting Insights +[[https://img.cleberg.net/blog/20240920-prometheus-grafana-cloud/alerting_insights.png]] + +Now that we have defined users and team associated with an on-call schedule and +configured to receive the proper alerts, let's define a rule that will generate +alerts. + +Within the Alerting section of the Alerts & IRM module, you can create alert +rules, contact points, and notification policies. + +Let's start by opening the Alert Rules page and click the New Alert Rule button. + +As shown in the image below, we will create an alert for high CPU temperature by querying the =node_hwmon_temp_celsius= metric from our Prometheus data source. + +Next, we will set the threshold to be anything above 50 (degrees Celsius). +Finally, we will tell Grafana to evaluate this every 1 minute via our Default +evaluation group. This is connected to our Grafana email, but can be associated +with any notification policy. + +#+caption: New Alert Rule +[[https://img.cleberg.net/blog/20240920-prometheus-grafana-cloud/new_alert.png]] + +When the alert fires, it will generate an email (or whatever notification policy +you assigned) and will look something like the following image. + +#+caption: Alerting Example +[[https://img.cleberg.net/blog/20240920-prometheus-grafana-cloud/email_alert.png]] + +** Dashboards + +As promised above, here are some dashboard screenshots based on the +configurations above. + +#+caption: Nginx Dashboard +[[https://img.cleberg.net/blog/20240920-prometheus-grafana-cloud/dashboard_nginx.png]] + +#+caption: Node Dashboard +[[https://img.cleberg.net/blog/20240920-prometheus-grafana-cloud/dashboard_node.png]] + +#+caption: OnCall Dashboard +[[https://img.cleberg.net/blog/20240920-prometheus-grafana-cloud/dashboard_oncall.png]] + +#+caption: Prometheus Dashboard +[[https://img.cleberg.net/blog/20240920-prometheus-grafana-cloud/dashboard_prometheus.png]] -- cgit v1.2.3-70-g09d2