aboutsummaryrefslogtreecommitdiff
path: root/content/blog/2024-09-20-prometheus-grafana-cloud.org
blob: 4eb05aaf519e4341eb73cb3c6a5cb8e382306613 (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
#+date: <2024-09-20 Fri 13:38:52>
#+title: Linux Observability with Self-Hosted Prometheus and Grafana Cloud
#+description: Learn how to self-host a Prometheus data collection tool with Docker and visualize the results with Grafana Cloud.
#+filetags: :linux:grafana:
#+slug: prometheus-grafana-cloud

This tutorial will guide you through the process of:

1. Configuring a free Grafana cloud account.
2. Installing Prometheus to store metrics.
3. Installing Node Exporter to export machine metrics for Prometheus.
4. Installing Nginx Exporter to export Nginx metrics for Prometheus.
5. Visualizing data in Grafana dashboards.
6. Configure alerts based on Grafana metrics.

* Grafana Cloud

To get started, visit the [[https://grafana.com/auth/sign-up/create-user][Grafana website]] and create a free account.

** Prometheus Data Source

By default, a Prometheus data source should exist in your data sources page
(=$yourOrg.grafana.net/connections/datasources=). If not, add a new data source
using the Prometheus type.

Once you have a valid Prometheus data source, open the data source and note the
following items:

| Data                  | Example                                                             |
|-----------------------+---------------------------------------------------------------------|
| Prometheus Server URL | https://prometheus-prod-13-prod-us-east-0.grafana.net/api/prom/push |
|-----------------------+---------------------------------------------------------------------|
| User                  | 1234567                                                             |
|-----------------------+---------------------------------------------------------------------|
| Password              | configured                                                          |

** Cloud Access Policy Token

Now let's create an access token in Grafana. Navigate to the Administration
> Users and Access > Cloud Access Policies page and create an access policy.

The =metrics > write= scope must be enabled within the access policy you choose.

Once you have an access policy with the correct scope, click the Add Token
button and be sure to copy and save the token since it will disappear once the
modal window is closed.

** Dashboards

Finally, let's create a couple dashboards so that we can easily explore the data
that we will be importing from the server.

I recommend importing the following dashboards:

- [[https://grafana.com/grafana/dashboards/1860-node-exporter-full/][Node Exporter Full]]
- [[https://github.com/nginxinc/nginx-prometheus-exporter/blob/main/grafana][nginx-prometheus-exporter]]
- Prometheus 2.0 Stats

Refer to the bottom of the post for dashboard screenshots!

* Docker

On the machine that you want to observe, make sure Docker and Docker Compose are
installed. This tutorial will be using Docker Compose to create a group of
containers that will work together to send metrics to Grafana.

Let's start by creating a working directory.

#+begin_src sh
mkdir ~/prometheus && \
cd ~/prometheus    && \
nano compose.yml
#+end_src

Within the =compose.yml= file, let's paste the following:

#+begin_src yaml
# compose.yml

networks:
  monitoring:
    driver: bridge

volumes:
  prometheus_data: {}

services:
  nginx-exporter:
    image: nginx/nginx-prometheus-exporter
    container_name: nginx-exporter
    restart: unless-stopped
    command:
      - '--nginx.scrape-uri=http://host.docker.internal:8080/stub_status'
    expose:
      - 9113
    networks:
      - monitoring
    extra_hosts:
      - host.docker.internal:host-gateway

  node-exporter:
    image: prom/node-exporter:latest
    container_name: node-exporter
    restart: unless-stopped
    volumes:
      - /proc:/host/proc:ro
      - /sys:/host/sys:ro
      - /:/rootfs:ro
    command:
      - '--path.procfs=/host/proc'
      - '--path.rootfs=/rootfs'
      - '--path.sysfs=/host/sys'
      - '--collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)'
    expose:
      - 9100
    networks:
      - monitoring

  prometheus:
    image: prom/prometheus:latest
    container_name: prometheus
    restart: unless-stopped
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
      - prometheus_data:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
      - '--web.console.libraries=/etc/prometheus/console_libraries'
      - '--web.console.templates=/etc/prometheus/consoles'
      - '--web.enable-lifecycle'
    expose:
      - 9090
    networks:
      - monitoring
#+end_src

#+begin_src sh
sudo docker compose up -d
#+end_src

#+begin_quote
I'm not sure if it made a difference but I also whitelisted port 8080 on my
local firewall with =sudo ufw allow 8080=.
#+end_quote

Next, let's create a =prometheus.yml= configuration file.

#+begin_src sh
nano prometheus.yml
#+end_src

#+begin_src yaml
# prometheus.yml

global:
  scrape_interval: 1m

scrape_configs:
  - job_name: 'prometheus'
    scrape_interval: 1m
    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'node'
    static_configs:
      - targets: ['node-exporter:9100']

  - job_name: 'nginx'
    scrape_interval: 5s
    static_configs:
      - targets: ['nginx-exporter:9113']

remote_write:
  - url: 'https://prometheus-prod-13-prod-us-east-0.grafana.net/api/prom/push'
    basic_auth:
      username: 'prometheus-grafana-username'
      password: 'access-policy-token'
#+end_src

** Nginx

To enable to the Nginx statistics we need for the nginx-exporter container, we
need to modify the Nginx configuration on the host.

More specifically, we need to create a path for the =stub_status= to be returned
when we query port 8080 on our localhost.

#+begin_src sh
sudo nano /etc/nginx/conf.d/default.conf
#+end_src

#+begin_src conf
server {
        listen 8080;
        listen [::]:8080;

        location /stub_status {
                stub_status;
        }
}
#+end_src

#+begin_src sh
sudo systemctl restart nginx.service
#+end_src

** Debugging

At this point, everything should be running smoothly. If not, here are a few
areas to check and see if any obvious errors exist.

Nginx: Curl the stub_status from the Nginx web server on the host machine to see
if Nginx and stub_status are working properly.

#+begin_src sh
curl http://127.0.0.1:8080/stub_status

# EXPECTED RESULTS:
Active connections: 101
server accepts handled requests
 7510 7510 9654
Reading: 0 Writing: 1 Waiting: 93
#+end_src

Nginx-Exporter: Curl the exported Nginx metrics.

#+begin_src sh
# Figure out the IP address of the Docker container
sudo docker network inspect grafana_monitoring

...
"Name": "nginx-exporter",
"EndpointID": "ef999a53eb9e0753199a680f8d78db7c2a8d5f442626df0b1bb945f03b73dcdd",
"MacAddress": "02:42:c0:a8:40:02",
"IPv4Address": "192.168.64.2/20",
...

# Curl the exported Nginx metrics
curl 192.168.64.2:9113/metrics

# EXPECTED RESULTS:
...
# HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 2.9927e-05
go_gc_duration_seconds{quantile="0.25"} 4.24e-05
go_gc_duration_seconds{quantile="0.5"} 4.8531e-05
...
#+end_src

Node-Exporter: Curl the exporter node machine metrics.

#+begin_src sh
# Curl the exported Node metrics
curl 192.168.64.3:9100/metrics

# EXPECTED RESULTS:
...
# HELP promhttp_metric_handler_requests_total Total number of scrapes by HTTP status code.
# TYPE promhttp_metric_handler_requests_total counter
promhttp_metric_handler_requests_total{code="200"} 47
promhttp_metric_handler_requests_total{code="500"} 0
promhttp_metric_handler_requests_total{code="503"} 0
...
#+end_src

Grafana: Open the Explore panel and look to see if any metrics are coming
through the Prometheus data source. If not, something on the machine is
preventing data from flowing through.

* Alerts & IRM

Now that we have our data connected and visualized, we can define alerting rules
and determine what Grafana should do when an alert is triggered.

** OnCall

#+caption: OnCall
[[https://img.cleberg.net/blog/20240920-prometheus-grafana-cloud/oncall.png]]

Within the Alerts & IRM section of Grafana (=/alerts-and-incidents=), open the
Users page.

The Users page allows you to configure user connections such as:

- Mobile App
- Slack
- Telegram
- MS Teams
- iCal
- Google Calendar

In addition to the connections of each user, you can specify how each user or
team is alerted for Default Notifications and Important Notifications.

Finally, you can access the Schedules page within the OnCall module to schedule
users and teams to be on call for specific date and time ranges. For my
purposes, I put myself on-call 24/7 so that I receive all alerts.

#+caption: User Information
[[https://img.cleberg.net/blog/20240920-prometheus-grafana-cloud/irm_user_info.png]]

** Alerting

#+caption: Alerting Insights
[[https://img.cleberg.net/blog/20240920-prometheus-grafana-cloud/alerting_insights.png]]

Now that we have defined users and team associated with an on-call schedule and
configured to receive the proper alerts, let's define a rule that will generate
alerts.

Within the Alerting section of the Alerts & IRM module, you can create alert
rules, contact points, and notification policies.

Let's start by opening the Alert Rules page and click the New Alert Rule button.

As shown in the image below, we will create an alert for high CPU temperature by querying the =node_hwmon_temp_celsius= metric from our Prometheus data source.

Next, we will set the threshold to be anything above 50 (degrees Celsius).
Finally, we will tell Grafana to evaluate this every 1 minute via our Default
evaluation group. This is connected to our Grafana email, but can be associated
with any notification policy.

#+caption: New Alert Rule
[[https://img.cleberg.net/blog/20240920-prometheus-grafana-cloud/new_alert.png]]

When the alert fires, it will generate an email (or whatever notification policy
you assigned) and will look something like the following image.

#+caption: Alerting Example
[[https://img.cleberg.net/blog/20240920-prometheus-grafana-cloud/email_alert.png]]

** Dashboards

As promised above, here are some dashboard screenshots based on the
configurations above.

#+caption: Nginx Dashboard
[[https://img.cleberg.net/blog/20240920-prometheus-grafana-cloud/dashboard_nginx.png]]

#+caption: Node Dashboard
[[https://img.cleberg.net/blog/20240920-prometheus-grafana-cloud/dashboard_node.png]]

#+caption: OnCall Dashboard
[[https://img.cleberg.net/blog/20240920-prometheus-grafana-cloud/dashboard_oncall.png]]

#+caption: Prometheus Dashboard
[[https://img.cleberg.net/blog/20240920-prometheus-grafana-cloud/dashboard_prometheus.png]]