Azure Centralized Monitoring is structured to provide seamless observability and management across diverse Azure resources, workloads, and environments. The architecture is categorized into Data Sources, Data Platform, and Consumption Layers.
Data Sources
The Data Sources layer identifies the origin of telemetry data. It encompasses Apps/Workloads, Infrastructure, Azure Platform, and Custom Sources. These sources produce the raw telemetry (metrics, logs, traces, and changes) required for effective monitoring.
Data Sources: Key Categories of Data Sources
1.1 Applications and Workloads
Definition:
Represents user-built or managed applications and services running on Azure or on-premises.
Telemetry Examples:
Application logs for APIs or web apps.
Dependency tracking for microservices (e.g., calls to databases, storage).
User telemetry like response times and session activity.
Tools:
Application Insights SDKs: Embedded in applications to capture telemetry.
Azure Monitor Agents: Collect telemetry from hosting infrastructure.
1.2 Infrastructure
Definition:
Includes the virtualized and physical components supporting workloads.
Telemetry Examples:
Virtual Machines (VMs): CPU, memory, disk usage.
Networks: Bandwidth, latency, dropped packets.
Containers and Kubernetes clusters: Pod health, resource allocation, and network policies.
Tools:
Azure Monitor Agent: Captures metrics and logs from VMs and Kubernetes.
Container Insights: Provides telemetry for AKS (Azure Kubernetes Service) and Docker workloads.
1.3 Azure Platform
Definition:
Monitoring telemetry generated directly by Azure’s built-in services and management layers.
Telemetry Examples:
Azure Resource Manager (ARM) activity logs.
Azure service-specific logs (e.g., SQL Database performance metrics).
Azure AD events (e.g., sign-ins, audit logs).
Tools:
Activity Logs: Track management operations on Azure resources.
Diagnostics Settings: Enable logs for specific Azure services like Event Hub, Cosmos DB, and Key Vault.
1.4 Custom Sources
Definition:
Telemetry sources not natively supported by Azure or external systems integrated into the monitoring pipeline.
Telemetry Examples:
Legacy on-premises systems or private cloud resources.
Third-party SaaS integrations (e.g., Salesforce or AWS resources).
Tools:
Azure Arc: Extends Azure monitoring capabilities to on-premises and multi-cloud environments.
Event Hub or REST APIs: Ingest telemetry from external systems.
Data Platform
The Data Platform layer processes, stores, and analyzes telemetry from the Data Sources. It organizes telemetry into Metrics, Logs, Traces, and Changes, each serving a specific monitoring purpose.
Key Telemetry Categories
2.1 Metrics
Definition:
Time-series data that measures resource performance or utilization.
Use Cases:
Monitoring real-time CPU or memory usage.
Tracking application response latency trends.
Granularity and Retention:
Granular (1-minute interval default).
Retained for 93 days by Azure Monitor.
Example Metrics:
VM CPU utilization.
API Gateway request counts and errors.
2.2 Logs
Definition:
Event-driven, detailed records capturing operations and transactions.
Log Types:
Activity Logs: Track management-level actions on Azure resources (e.g., creating a VM).
Resource Logs: Capture operational events specific to a resource (e.g., SQL query performance).
Diagnostic Logs: Enable detailed monitoring for troubleshooting.
Custom Logs: User-defined application-specific logs.
Retention and Querying:
Logs are retained in Log Analytics with a default retention of 30–730 days.
Queryable using KQL in Log Analytics Workspaces.
2.3 Traces
Definition:
Captures the flow of individual operations, often used for debugging distributed systems.
Purpose:
Tracks how requests traverse different services.
Identifies bottlenecks or performance issues in microservices.
Tools:
Application Insights Distributed Tracing.
2.4 Changes
Definition:
Tracks changes in configuration, state, or infrastructure.
Examples:
Software installed or uninstalled on VMs.
Updates to Azure Security Group rules.
Use Cases:
Configuration drift detection.
Identifying unauthorized modifications.
Core Components of the Data Platform
Data Collection and Ingestion
Agents and Tools:
Azure Monitor Agent (AMA): Unified data collection across resources.
Application Insights SDKs: Captures application-level telemetry.
Azure Diagnostics Extension: VM diagnostics and logs.
Real-Time Ingestion:
Azure Event Hubs or Azure Stream Analytics for streaming telemetry.
Data Storage
Log Analytics Workspace: Centralized repository for telemetry with high-performance querying capabilities.
Azure Blob Storage: Long-term archival for diagnostic data.
Azure Data Explorer: Optimized for large-scale log querying and analytics.
Data Enrichment
Kusto Query Language (KQL):
Powers queries and analytics within Log Analytics.
Enables advanced queries like trend analysis and anomaly detection.
Azure Stream Analytics:
Real-time telemetry processing for use cases like IoT monitoring.
Data Integration
Event Hubs: Streams telemetry to external SIEM tools.
Logic Apps: Automates responses to telemetry events.
Data Factory: For periodic export to big data platforms.
Consumption Layers
The Consumption Layer transforms processed telemetry into actionable insights for visualization, alerts, and automation.
Consumption Layers: Key Features of the Consumption Layer
3.1 Visualization and Dashboards
Azure Dashboards:
Aggregates key metrics and logs.
Customizable for monitoring at-a-glance.
Workbooks:
Advanced visualizations built using KQL queries.
Examples: Dependency maps, application performance dashboards.
Power BI Integration:
Business-ready reporting with real-time telemetry visualizations.
3.2 Alerting and Automation
Azure Alerts: Rules-based notifications for metrics or log thresholds.
Examples:
Alert when VM CPU exceeds 80% for 10 minutes.
Notify on anomalous API latency patterns.
Notification Channels: Email, SMS, webhooks, and ITSM tools like ServiceNow.
Automation Tools:
Azure Automation Runbooks for corrective actions (e.g., restart services).
Logic Apps for ticket creation and escalation.
3.3 Querying and Analysis
Log Analytics: Query telemetry data using KQL.
Example Query:
xxxxxxxxxx
31InsightsMetrics
2| where Name == "Percentage CPU"
3| summarize AvgCPU = avg(Val) by bin(TimeGenerated, 1h)
Application Insights Analytics: Optimized for application performance queries and distributed tracing.
3.4 Recommendations
Azure Insights: Provides workload-specific monitoring solutions:
VM Insights: Tracks health and dependencies for virtual machines.
Container Insights: Monitors Kubernetes clusters.
SQL Insights: Analyzes database performance.
Azure Advisor:
Optimization recommendations for cost, security, and performance.
Workflow Example
Data Sources: Applications and VMs emit logs and metrics via Azure Monitor Agent and Application Insights.
Data Platform: Metrics are processed in near real-time; logs are stored in Log Analytics for querying and correlation.
Consumption Layer:
Custom dashboards visualize application health.
Alerts notify admins of CPU spikes; Logic Apps trigger automated scale-out.
Let me know if you'd like further enhancements or a diagram!
Leave a Reply