Stay Ahead of Issues :- Setting Up Alerts with Prometheus and Grafana

Stay Ahead of Issues :- Setting Up Alerts with Prometheus and Grafana

In today's fast-paced tech environment staying ahead of issues is critical to maintaining system reliability and performance. This involves not just monitoring but also setting up robust alerting and notification systems. Prometheus with its powerful querying capabilities and Grafana with its intuitive visualization tools make a formidable pair for this purpose. In this blog we will explore how to set up alerting rules in Prometheus, integrate these alerts with Grafana for visual monitoring, configure notifications via various channels like email and Slack and provide real-world examples of effective alerting strategies.

Setting Up Alerting Rules in Prometheus

Prometheus is a powerful open-source monitoring and alerting toolkit designed for reliability and scalability. Setting up alerting rules in Prometheus involves defining conditions under which alerts should be triggered.

Step 1 :- Define Alerting Rules

Alerting rules are defined in Prometheus configuration files. These rules specify the conditions that should trigger alerts.

Example :- Basic CPU Usage Alert

Create or edit your Prometheus alerting rules file (e.g. alert.rules) :-

groups:
- name: dev-alerts
  rules:
  - alert: HighCPUUsage
    expr: avg(rate(node_cpu_seconds_total{mode="idle"}[5m])) by (instance) < 0.2
    for: 2m
    labels:
      severity: warning
    annotations:
      summary: "High CPU usage detected on instance {{ $labels.instance }}"
      description: "CPU usage is above 80% for more than 2 minutes."
  • alert :- The name of the alert.

  • expr :- The PromQL expression that evaluates the condition.

  • for :- Duration for which the condition should be true before firing the alert.

  • labels :- Additional information like severity.

  • annotations :- Detailed information about the alert.

Step 2 :- Load Alerting Rules

Make sure Prometheus loads the alerting rules by specifying them in the Prometheus configuration file (prometheus.yml) :-

rule_files:
  - "alert.rules"

Restart Prometheus to apply the changes.

systemctl restart prometheus

Step 3 :- Test Alerting Rules

Prometheus provides a web UI to test and view alerting rules. Access it at http://<prometheus-server>:9090/rules.

Integrating Alerting with Grafana for Visual Alerts

While Prometheus handles the backend alerting logic, Grafana provides a powerful and user-friendly interface to visualize and manage these alerts.

Step 1 :- Add Prometheus as a Data Source

  1. Navigate to Grafana :- Go to your Grafana instance.

  2. Add Data Source :- Go to Configuration > Data Sources > Add data source.

  3. Select Prometheus :- Choose Prometheus from the list of available data sources.

  4. Configure Prometheus :- Enter your Prometheus server URL (e.g. http://<prometheus-server>:9090) and save the configuration.

Step 2 :- Create a Dashboard Panel

  1. Create Dashboard :- Create a new dashboard or edit an existing one.

  2. Add Panel :- Add a new panel and configure it to use Prometheus as the data source.

  3. Query Data :- Use PromQL queries to fetch the data you want to monitor. For instance to monitor CPU usage :-

     avg(rate(node_cpu_seconds_total{mode="idle"}[5m])) by (instance)
    

Step 3 :- Configure Grafana Alerts

  1. Alert Tab :- In the panel settings go to the "Alert" tab.

  2. Create Alert :- Click "Create Alert" and define the alert conditions.

  3. Set Conditions :- Specify the alert conditions using PromQL queries. For example :-

     avg(rate(node_cpu_seconds_total{mode="idle"}[5m])) by (instance) < 0.2
    
  4. Notification Channels :- Configure where the alerts should be sent (email, Slack, etc.).

  5. Save Panel :- Save the panel to apply the alert configuration.

Example :- CPU Usage Alert in Grafana

Here’s an example of how to configure a CPU usage alert in Grafana :-

  1. Create a Panel :- Add a new graph panel.

  2. Prometheus Query :- Use the query :-

     avg(rate(node_cpu_seconds_total{mode="idle"}[5m])) by (instance)
    
  3. Alert Configuration :-

    • Create Alert :- In the Alert tab create a new alert.

    • Conditions :- Set the condition to trigger an alert when the CPU idle time is below 20% for 2 minutes.

    • Notifications :- Add notification channels like email or Slack.

Configuring Notifications via Email, Slack and Other Channels

Once alerts are set up configuring notifications ensures that the right people are notified promptly.

Step 1 :- Set Up Alertmanager

Prometheus uses Alertmanager to handle notifications. Install and configure Alertmanager to manage alerts and notifications.

Install Alertmanager

Download and extract Alertmanager :-

wget https://github.com/prometheus/alertmanager/releases/download/v0.21.0/alertmanager-0.21.0.linux-amd64.tar.gz
tar xvfz alertmanager-0.21.0.linux-amd64.tar.gz
cd alertmanager-0.21.0.linux-amd64

Configure Alertmanager

Create a configuration file (alertmanager.yml) :-

global:
  resolve_timeout: 5m

route:
  group_by: ['alertname']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 1h
  receiver: 'email-receiver'

receivers:
- name: 'email-receiver'
  email_configs:
  - to: 'alerts@company.com'
    from: 'alertmanager@company.com'
    smarthost: 'smtp.company.com:587'
    auth_username: 'username'
    auth_password: 'securepassword'

Start Alertmanager :-

./alertmanager --config.file=alertmanager.yml

Step 2 :- Configure Prometheus to Use Alertmanager

Update Prometheus configuration (prometheus.yml) to use Alertmanager :-

alerting:
  alertmanagers:
  - static_configs:
    - targets:
      - 'localhost:9093'

Step 3 :- Configure Notification Channels

Email Notifications

Ensure your Alertmanager configuration includes email settings as shown above.

Slack Notifications

To configure Slack notifications modify the alertmanager.yml :-

receivers:
- name: 'slack-receiver'
  slack_configs:
  - api_url: 'https://hooks.slack.com/services/your/slack/hook'
    channel: '#alerts'
    send_resolved: true

Restart Alertmanager to apply the changes.

Step 4 :- Add Notification Channels in Grafana

In Grafana set up notification channels to integrate with Alertmanager, email, Slack and other services.

Adding Notification Channels

  1. Navigate to Notification Channels :- Go to Alerting > Notification Channels.

  2. Add Channel :- Click "New Channel" and select the type (Email, Slack, etc.).

  3. Configure :- Enter the necessary details like email addresses, Slack webhook URLs, etc.

  4. Test :- Test the notification channel to ensure it is working correctly.

Real-World Examples of Alerting Strategies

Effective alerting strategies help in minimizing downtime and ensuring quick resolution of issues. Here are some real-world examples :-

Example 1 :- E-commerce Website Monitoring

For an e-commerce website monitoring the performance and availability of critical services is crucial.

Alert :- High Error Rate on Login Service

Prometheus Rule :-

- alert: HighErrorRate
  expr: rate(http_requests_total{job="login_service", status="5xx"}[1m]) > 0.05
  for: 1m
  labels:
    severity: critical
  annotations:
    summary: "High error rate on login service"
    description: "Error rate is above 5% for more than 1 minute."

Alertmanager Configuration :-

receivers:
- name: 'slack-receiver'
  slack_configs:
  - api_url: 'https://hooks.slack.com/services/your/slack/hook'
    channel: '#alerts'
    send_resolved: true

Example 2 :- Database Performance Monitoring

For a database system monitoring query performance and resource utilization is essential.

Alert :- High Query Latency

Prometheus Rule :-

- alert: HighQueryLatency
  expr: histogram_quantile(0.95, rate(query_duration_seconds_bucket[5m])) > 1
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: "High query latency"
    description: "95th percentile query latency is above 1 second for more than 5 minutes."

Alertmanager Configuration :-

receivers:
- name: 'email-receiver'
  email_configs:
  - to: 'dba-team@company.com'
    from: 'alertmanager@company.com'
    smarthost: 'smtp.company.com:587'
    auth_username: 'username'
    auth_password: 'securepassword'

Example 3 :- Infrastructure Monitoring

Monitoring the health of infrastructure components like servers and network devices is fundamental.

Alert :- Disk Space Usage

Prometheus Rule :-

- alert: HighDiskUsage
  expr: (node_filesystem_size_bytes{job="node", mountpoint="/"} - node_filesystem_free_bytes{job="node", mountpoint="/"}) / node_filesystem_size_bytes{job="node", mountpoint="/"} > 0.8
  for: 10m
  labels:
    severity: critical
  annotations:
    summary: "High disk space usage"
    description: "Disk usage is above 80% for more than 10 minutes on instance {{ $labels.instance }}."

Alertmanager Configuration :-

receivers:
- name: 'email-receiver'
  email_configs:
  - to: 'sysadmin@company.com'
    from: 'alertmanager@company.com'
    smarthost: 'smtp.company.com:587'
    auth_username: 'username'
    auth_password: 'securepassword'

Conclusion

Setting up alerts with Prometheus and Grafana enables you to stay ahead of issues by proactively monitoring your systems and notifying the right people when problems arise. By defining clear alerting rules integrating with Grafana for visualization and configuring notifications via various channels, you can ensure that your team is always informed and ready to act. The examples provided demonstrate how to apply these concepts to real-world scenarios helping you design effective alerting strategies tailored to your specific needs.

By leveraging the capabilities of Prometheus and Grafana you can enhance the reliability and performance of your systems, minimizing downtime and improving user satisfaction. Start implementing these strategies today to take your monitoring and alerting to the next level.

Let's connect and grow on Linkedin :Click Here

Let's connect and grow on Twitter :Click Here

Happy Monitoring!!!

Happy Reading!!!

Sudha Yadav