The composition of Log Search Alert Rules in Azure Monitor involves several key components that work together to define how log data is queried, when the alert should trigger, and what actions should be taken once the condition is met. Here’s a detailed breakdown of the main elements that make up a Log Search Alert Rule in Azure:
Log Search Query
What it is:
This is the core part of the rule where you define the log query using Kusto Query Language (KQL). The query retrieves data from the logs generated by Azure resources, applications, or other log sources like Azure Activity Logs, Diagnostic Logs, or Application Insights.
Purpose:
The query is used to search for specific events or patterns in the log data (e.g., error messages, failed login attempts, or performance issues).
Example:
A query might look for failed login attempts in security logs:
xxxxxxxxxx
31SecurityEvent
2| where EventID == 4625
3| where TimeGenerated > ago(1h)
Alert Condition
What it is:
The alert condition defines the threshold or criteria that triggers the alert based on the results of the KQL query. You specify the condition that needs to be met for the alert to be triggered, such as the number of results returned by the query, or specific log patterns.
Purpose:
To define when the alert should fire based on the data returned from the query. This can include conditions like:
Number of results: The alert triggers when the query returns a certain number of log entries (e.g., more than 10 failed logins within 5 minutes).
No results: The alert fires when no results match the query over a defined period.
Aggregation: The alert could be based on aggregated results, such as the total number of error events in the last 10 minutes.
Example:
Trigger the alert if more than 5 failed login attempts (EventID 4625) occur within a 10-minute window:
xxxxxxxxxx
51SecurityEvent
2| where EventID == 4625
3| where TimeGenerated > ago(10m)
4| summarize count() by bin(TimeGenerated, 10m)
5| where count_ > 5
Alert Evaluation Frequency
What it is:
The evaluation frequency defines how often the system runs the query to evaluate whether the alert condition is met. This is typically defined as a time interval, such as every 5 minutes, 1 hour, etc.
Purpose:
To control how frequently Azure Monitor checks for the condition in your logs and evaluates whether it should trigger an alert.
Example:
Setting the evaluation frequency to 5 minutes means that the log query will be evaluated every 5 minutes to check if the defined condition is met.
Alert Window
What it is:
The alert window defines the time span over which the query is executed. It specifies how far back in time the query should look when evaluating the logs. This is different from the evaluation frequency, which controls how often the rule runs.
Purpose:
To determine the time period that the log search query covers. For example, you might want to check for events that occurred in the last 5 minutes or events that have happened over the past hour.
Example:
The alert rule could be set to look at logs for the last 5 minutes for each evaluation cycle.
Alert Severity
What it is:
The severity level is used to indicate the importance of the alert. This helps prioritize the response to the alert. Azure provides several severity levels, such as:
Sev 0 (Critical): Immediate attention needed, severe issues.
Sev 1 (Warning): Needs attention but not immediately critical.
Sev 2 (Informational): For informational purposes, may not require an immediate action.
Purpose:
To help categorize and prioritize alerts, allowing for different response protocols based on the severity.
Example:
A high number of failed login attempts might be classified as Severity 0 (Critical), while a less frequent event might be Severity 1 (Warning).
Action Groups
What it is:
Action Groups define what actions should be triggered when an alert is fired. These can include sending notifications, invoking automation, or calling external systems.
Purpose:
To specify what happens after the alert condition is met, such as sending an email, triggering a webhook, or running an Azure Logic App, Function, or Automation Runbook.
Example:
You can send an email notification to an administrator or trigger a webhook to an external monitoring system when an alert is triggered.
Alert Rule Name and Description
What it is:
This is the metadata for the alert rule, where you define the name and description of the alert.
Purpose:
To provide a clear identifier and description of the alert rule. This helps administrators and users understand what the rule monitors and its intended purpose.
Example:
An alert rule for monitoring failed login attempts might be named "Failed Login Attempts Alert" with a description like "Alerts when more than 5 failed login attempts occur within 10 minutes."
Alert Status
What it is:
The status of the alert rule can either be Enabled or Disabled. You can temporarily disable an alert if it is no longer needed or is under investigation.
Purpose:
To control whether the alert rule is active and monitoring log data.
Example:
If you’re troubleshooting a known issue, you might temporarily disable the alert rule.
Full Composition of a Log Search Alert Rule
When creating a log search alert rule, it consists of:
Log search query: A KQL query to search log data.
Alert condition: A threshold or criteria that triggers the alert.
Evaluation frequency: How often the query is evaluated (e.g., every 5 minutes).
Alert window: The time span for which the query is evaluated (e.g., last 5 minutes).
Alert severity: The priority level of the alert (e.g., Critical, Warning, Informational).
Action Groups: Actions taken when the alert triggers (e.g., notifications, webhooks).
Name and description: A unique identifier and description of the alert rule.
Alert status: Whether the alert rule is enabled or disabled.
Example of a Log Search Alert Rule in Azure
Here is a concrete example:
Log query: Search for failed login attempts in security logs.
xxxxxxxxxx
31SecurityEvent
2| where EventID == 4625
3| where TimeGenerated > ago(1h)
Alert condition: Trigger if there are more than 10 failed login attempts in the last 10 minutes.
xxxxxxxxxx
51SecurityEvent
2| where EventID == 4625
3| where TimeGenerated > ago(10m)
4| summarize count() by bin(TimeGenerated, 10m)
5| where count_ > 10
Evaluation frequency: Every 5 minutes.
Alert window: Last 10 minutes.
Alert severity: Severity 0 (Critical).
Action group: Send an email to the admin team.
Name: "Failed Login Attempts Alert".
Description: "Alert when more than 10 failed login attempts are detected in the last 10 minutes."
Summary
By composing these elements effectively, you can create an alert rule in Azure that helps you monitor and respond to specific log-based events in your environment.
Leave a Reply