Alerts are a critical feature of your server monitoring system, ensuring that you are notified promptly when conditions on your servers require attention. This guide will help you understand how alerts work, how to configure them, and what each alert type means.
How Alerts Work
-
Triggering Conditions: Alerts are triggered when specific conditions, defined in the alert configuration, are met and persist for a set period (in minutes). For example, setting "5 minutes" means the condition must be true for five consecutive minutes before the alert is triggered.
-
Group-Based Notifications:
-
Alerts are associated with server groups.
-
Every user or integration linked to that group will receive notifications based on their individual preferences (e.g., SMS, email, Slack).
-
Notification settings are displayed in the Notifications box within the alerting tab.
-
-
Escalations: Alerts are only sent if the condition persists, preventing unnecessary notifications for transient issues.
Setting Up Alerts
Alerts are configured using a simple modal form. Here’s a step-by-step guide:
-
Choose Alert Type:
- Select the condition you want to monitor (e.g., CPU usage, memory usage, or process status).
-
Set the Condition:
- Specify how the alert will trigger (e.g., "greater than" a certain threshold or "equal to" a specific value).
-
Define the Threshold:
-
Set the exact value that will trigger the alert. For example:
-
CPU Usage > 90%
-
Memory Usage > 80%
-
Process = nginx
-
-
-
Set Duration:
- Choose how long the condition must persist (in minutes) before triggering an alert. A value of 5 means the alert triggers if the condition is sustained for 5 consecutive minutes.
Alert Types and Their Meaning
Here are the available alert types, with detailed explanations:
-
CPU Usage
-
Triggers when the server’s CPU usage exceeds the defined threshold for the specified duration.
-
Example: Alert if CPU usage is >90% for 5 minutes.
-
Use Case: High CPU usage can indicate resource exhaustion or inefficient processes.
-
-
Memory Usage
-
Triggers when the memory usage (RAM) exceeds the specified percentage.
-
Example: Alert if memory usage is >85% for 3 minutes.
-
Use Case: High memory usage can cause performance degradation and application crashes.
-
-
Disk Usage
-
Triggers when disk usage exceeds the specified threshold.
-
Example: Alert if disk usage >90% for 5 minutes.
-
Use Case: A nearly full disk can cause system errors or service disruptions.
-
-
Disk IO
-
Triggers when disk read/write operations exceed the set threshold.
-
Example: Alert if disk IO >500 MB/s for 10 minutes.
-
Use Case: High disk IO could indicate heavy database operations or backup processes.
-
-
Network Traffic
-
Triggers when network traffic exceeds a specified rate.
-
Example: Alert if outbound traffic >100 Mbps for 5 minutes.
-
Use Case: Unusually high network traffic might indicate a DDoS attack or data leak.
-
-
Process Not Running
-
Triggers when a specific process is not running.
-
Important Configuration Notes:
-
Condition: Must be set to "equals" (
==
). -
Threshold: Enter the exact process name as it appears in the "Process" tab under the
cmd
column (e.g.,nginx
orapache2
).
-
-
Example: Alert if
nginx
is not running for 2 minutes. -
Use Case: Ensures critical services (e.g., web servers, databases) are always operational.
-
-
Linux Load Average
-
Triggers when the load average exceeds the specified threshold.
-
Example: Alert if load average >4 for 5 minutes.
-
Use Case: High load averages indicate that the server’s processing capacity is under strain.
-
-
Custom Metric (if applicable)
-
Allows users to define custom alert conditions based on available metrics.
-
Example: Alert if a specific application metric exceeds a user-defined threshold.
-
Understanding Notifications
-
Who Gets Notified:
-
Alerts are sent to all users and integrations associated with the server group.
-
Individual notification preferences determine how alerts are received (e.g., Email, SMS, Slack).
-
-
Notifications Box:
-
The Notifications section on the alerting tab shows:
-
Recipients of the alerts.
-
Notification types for each recipient.
-
-
Common Scenarios and Tips
-
Scenario 1: High CPU Usage on a Web Server
-
Configure an alert for CPU usage >85% for 5 minutes.
-
Investigate spikes by checking active processes and web traffic.
-
-
Scenario 2: Disk Space Running Out
-
Set an alert for Disk Usage >90% for 5 minutes.
-
Proactively delete unused files or increase storage.
-
-
Scenario 3: Critical Process Stopped
-
Create an alert for
Process = nginx
to ensure your web server is always running. -
Use the exact process name from the Process tab.
-
FAQs
-
Why does my alert trigger after the set duration (e.g., 5 minutes)?
- Alerts are designed to trigger only if the condition persists for the specified duration. This helps prevent false alarms for transient issues.
-
Why can’t I see alerts for my process?
- Ensure you’ve entered the exact process name as it appears in the Process tab under the
cmd
column. Also, verify that the condition is set to "equals" (==
).
- Ensure you’ve entered the exact process name as it appears in the Process tab under the
-
Can I send alerts to multiple recipients?
- Yes, alerts are sent to all users and integrations in the server group. Each recipient will receive alerts according to their preferred notification method.
-
What if I don’t receive alerts?
- Check your notification settings and ensure your preferred method (e.g., Email, SMS) is enabled. Use the Send Test Notification feature to verify.
-
Can I change how long a condition must persist before triggering an alert?
- Yes, during setup, you can adjust the duration (in minutes) in the alert configuration modal.
-
How do I test an alert setup?
- Use the Send Test Notification feature on the Notifications tab to confirm that alerts are properly configured.
Best Practices for Alerts
-
Set Realistic Thresholds: Avoid overly sensitive alerts that may lead to alert fatigue.
-
Use Duration to Filter Noise: Set longer durations (e.g., 10 minutes) for less critical conditions to minimize false positives.
-
Test Regularly: Use the test notification feature to ensure recipients are correctly configured.
-
Monitor Critical Processes: Always set up alerts for essential services like web servers and databases.
With these guidelines, you can effectively configure alerts to ensure your server operations run smoothly. For further assistance, feel free to contact support!