From 5b92b44405d854ace236fe36884827eb14b5c853 Mon Sep 17 00:00:00 2001 From: Christian Cleberg Date: Fri, 20 Sep 2024 14:01:38 -0500 Subject: fix broken src block and src titles in latest post --- .../blog/2024-09-19-prometheus-grafana-cloud.org | 349 --------------------- 1 file changed, 349 deletions(-) delete mode 100644 content/blog/2024-09-19-prometheus-grafana-cloud.org (limited to 'content/blog/2024-09-19-prometheus-grafana-cloud.org') diff --git a/content/blog/2024-09-19-prometheus-grafana-cloud.org b/content/blog/2024-09-19-prometheus-grafana-cloud.org deleted file mode 100644 index aa70f14..0000000 --- a/content/blog/2024-09-19-prometheus-grafana-cloud.org +++ /dev/null @@ -1,349 +0,0 @@ -#+date: <2024-09-20 Fri 13:38:52> -#+title: Linux Observability with Self-Hosted Prometheus and Grafana Cloud -#+description: Learn how to self-host a Prometheus data collection tool with Docker and visualize the results with Grafana Cloud. -#+filetags: :linux:grafana: -#+slug: prometheus-grafana-cloud - -This tutorial will guide you through the process of: - -1. Configuring a free Grafana cloud account. -2. Installing Prometheus to store metrics. -3. Installing Node Exporter to export machine metrics for Prometheus. -4. Installing Nginx Exporter to export Nginx metrics for Prometheus. -5. Visualizing data in Grafana dashboards. -6. Configure alerts based on Grafana metrics. - -* Grafana Cloud - -To get started, visit the [[https://grafana.com/auth/sign-up/create-user][Grafana website]] and create a free account. - -** Prometheus Data Source - -By default, a Prometheus data source should exist in your data sources page -(=$yourOrg.grafana.net/connections/datasources=). If not, add a new data source -using the Prometheus type. - -Once you have a valid Prometheus data source, open the data source and note the -following items: - -| Data | Example | -|-----------------------+---------------------------------------------------------------------| -| Prometheus Server URL | https://prometheus-prod-13-prod-us-east-0.grafana.net/api/prom/push | -|-----------------------+---------------------------------------------------------------------| -| User | 1234567 | -|-----------------------+---------------------------------------------------------------------| -| Password | configured | - -** Cloud Access Policy Token - -Now let's create an access token in Grafana. Navigate to the Administration -> Users and Access > Cloud Access Policies page and create an access policy. - -The =metrics > write= scope must be enabled within the access policy you choose. - -Once you have an access policy with the correct scope, click the Add Token -button and be sure to copy and save the token since it will disappear once the -modal window is closed. - -** Dashboards - -Finally, let's create a couple dashboards so that we can easily explore the data -that we will be importing from the server. - -I recommend importing the following dashboards: - -- [[https://grafana.com/grafana/dashboards/1860-node-exporter-full/][Node Exporter Full]] -- [[https://github.com/nginxinc/nginx-prometheus-exporter/blob/main/grafana][nginx-prometheus-exporter]] -- Prometheus 2.0 Stats - -Refer to the bottom of the post for dashboard screenshots! - -* Docker - -On the machine that you want to observe, make sure Docker and Docker Compose are -installed. This tutorial will be using Docker Compose to create a group of -containers that will work together to send metrics to Grafana. - -Let's start by creating a working directory. - -#+begin_src shell -mkdir ~/prometheus && \ -cd ~/prometheus && \ -nano compose.yml -#+end_src - -Within the =compose.yml= file, let's paste the following: - -#+begin_src yaml -# compose.yml - -networks: - monitoring: - driver: bridge - -volumes: - prometheus_data: {} - -services: - nginx-exporter: - image: nginx/nginx-prometheus-exporter - container_name: nginx-exporter - restart: unless-stopped - command: - - '--nginx.scrape-uri=http://host.docker.internal:8080/stub_status' - expose: - - 9113 - networks: - - monitoring - extra_hosts: - - host.docker.internal:host-gateway - - node-exporter: - image: prom/node-exporter:latest - container_name: node-exporter - restart: unless-stopped - volumes: - - /proc:/host/proc:ro - - /sys:/host/sys:ro - - /:/rootfs:ro - command: - - '--path.procfs=/host/proc' - - '--path.rootfs=/rootfs' - - '--path.sysfs=/host/sys' - - '--collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)' - expose: - - 9100 - networks: - - monitoring - - prometheus: - image: prom/prometheus:latest - container_name: prometheus - restart: unless-stopped - volumes: - - ./prometheus.yml:/etc/prometheus/prometheus.yml - - prometheus_data:/prometheus - command: - - '--config.file=/etc/prometheus/prometheus.yml' - - '--storage.tsdb.path=/prometheus' - - '--web.console.libraries=/etc/prometheus/console_libraries' - - '--web.console.templates=/etc/prometheus/consoles' - - '--web.enable-lifecycle' - expose: - - 9090 - networks: - - monitoring -#+end_src - -#+begin_src shell -sudo docker compose up -d -#+end_src - -#+begin_quote -I'm not sure if it made a difference but I also whitelisted port 8080 on my -local firewall with =sudo ufw allow 8080=. -#+end_quote - -Next, let's create a =prometheus.yml= configuration file. - -,#+begin_src sh -nano prometheus.yml -#+end_src - -#+begin_src yaml -# prometheus.yml - -global: - scrape_interval: 1m - -scrape_configs: - - job_name: 'prometheus' - scrape_interval: 1m - static_configs: - - targets: ['localhost:9090'] - - - job_name: 'node' - static_configs: - - targets: ['node-exporter:9100'] - - - job_name: 'nginx' - scrape_interval: 5s - static_configs: - - targets: ['nginx-exporter:9113'] - -remote_write: - - url: 'https://prometheus-prod-13-prod-us-east-0.grafana.net/api/prom/push' - basic_auth: - username: 'prometheus-grafana-username' - password: 'access-policy-token' -#+end_src - -** Nginx - -To enable to the Nginx statistics we need for the nginx-exporter container, we -need to modify the Nginx configuration on the host. - -More specifically, we need to create a path for the =stub_status= to be returned -when we query port 8080 on our localhost. - -#+begin_src shell -sudo nano /etc/nginx/conf.d/default.conf -#+end_src - -#+begin_src conf -server { - listen 8080; - listen [::]:8080; - - location /stub_status { - stub_status; - } -} -#+end_src - -#+begin_src shell -sudo systemctl restart nginx.service -#+end_src - -** Debugging - -At this point, everything should be running smoothly. If not, here are a few -areas to check and see if any obvious errors exist. - -Nginx: Curl the stub_status from the Nginx web server on the host machine to see -if Nginx and stub_status are working properly. - -#+begin_src shell -curl http://127.0.0.1:8080/stub_status - -# EXPECTED RESULTS: -Active connections: 101 -server accepts handled requests - 7510 7510 9654 -Reading: 0 Writing: 1 Waiting: 93 -#+end_src - -Nginx-Exporter: Curl the exported Nginx metrics. - -#+begin_src sh -# Figure out the IP address of the Docker container -sudo docker network inspect grafana_monitoring - -... -"Name": "nginx-exporter", -"EndpointID": "ef999a53eb9e0753199a680f8d78db7c2a8d5f442626df0b1bb945f03b73dcdd", -"MacAddress": "02:42:c0:a8:40:02", -"IPv4Address": "192.168.64.2/20", -... - -# Curl the exported Nginx metrics -curl 192.168.64.2:9113/metrics - -# EXPECTED RESULTS: -... -# HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles. -# TYPE go_gc_duration_seconds summary -go_gc_duration_seconds{quantile="0"} 2.9927e-05 -go_gc_duration_seconds{quantile="0.25"} 4.24e-05 -go_gc_duration_seconds{quantile="0.5"} 4.8531e-05 -... -#+end_src - -Node-Exporter: Curl the exporter node machine metrics. - -#+begin_src shell -# Curl the exported Node metrics -curl 192.168.64.3:9100/metrics - -# EXPECTED RESULTS: -... -# HELP promhttp_metric_handler_requests_total Total number of scrapes by HTTP status code. -# TYPE promhttp_metric_handler_requests_total counter -promhttp_metric_handler_requests_total{code="200"} 47 -promhttp_metric_handler_requests_total{code="500"} 0 -promhttp_metric_handler_requests_total{code="503"} 0 -... -#+end_src - -Grafana: Open the Explore panel and look to see if any metrics are coming -through the Prometheus data source. If not, something on the machine is -preventing data from flowing through. - -* Alerts & IRM - -Now that we have our data connected and visualized, we can define alerting rules -and determine what Grafana should do when an alert is triggered. - -** OnCall - -#+caption: OnCall -[[https://img.cleberg.net/blog/20240920-prometheus-grafana-cloud/oncall.png]] - -Within the Alerts & IRM section of Grafana (=/alerts-and-incidents=), open the -Users page. - -The Users page allows you to configure user connections such as: - -- Mobile App -- Slack -- Telegram -- MS Teams -- iCal -- Google Calendar - -In addition to the connections of each user, you can specify how each user or -team is alerted for Default Notifications and Important Notifications. - -Finally, you can access the Schedules page within the OnCall module to schedule -users and teams to be on call for specific date and time ranges. For my -purposes, I put myself on-call 24/7 so that I receive all alerts. - -#+caption: User Information -[[https://img.cleberg.net/blog/20240920-prometheus-grafana-cloud/irm_user_info.png]] - -** Alerting - -#+caption: Alerting Insights -[[https://img.cleberg.net/blog/20240920-prometheus-grafana-cloud/alerting_insights.png]] - -Now that we have defined users and team associated with an on-call schedule and -configured to receive the proper alerts, let's define a rule that will generate -alerts. - -Within the Alerting section of the Alerts & IRM module, you can create alert -rules, contact points, and notification policies. - -Let's start by opening the Alert Rules page and click the New Alert Rule button. - -As shown in the image below, we will create an alert for high CPU temperature by querying the =node_hwmon_temp_celsius= metric from our Prometheus data source. - -Next, we will set the threshold to be anything above 50 (degrees Celsius). -Finally, we will tell Grafana to evaluate this every 1 minute via our Default -evaluation group. This is connected to our Grafana email, but can be associated -with any notification policy. - -#+caption: New Alert Rule -[[https://img.cleberg.net/blog/20240920-prometheus-grafana-cloud/new_alert.png]] - -When the alert fires, it will generate an email (or whatever notification policy -you assigned) and will look something like the following image. - -#+caption: Alerting Example -[[https://img.cleberg.net/blog/20240920-prometheus-grafana-cloud/email_alert.png]] - -** Dashboards - -As promised above, here are some dashboard screenshots based on the -configurations above. - -#+caption: Nginx Dashboard -[[https://img.cleberg.net/blog/20240920-prometheus-grafana-cloud/dashboard_nginx.png]] - -#+caption: Node Dashboard -[[https://img.cleberg.net/blog/20240920-prometheus-grafana-cloud/dashboard_node.png]] - -#+caption: OnCall Dashboard -[[https://img.cleberg.net/blog/20240920-prometheus-grafana-cloud/dashboard_oncall.png]] - -#+caption: Prometheus Dashboard -[[https://img.cleberg.net/blog/20240920-prometheus-grafana-cloud/dashboard_prometheus.png]] -- cgit v1.2.3-70-g09d2