Observability Agents¶
This document provides a comprehensive overview of Observability agents in the ConnectSoft AI Software Factory. It is written for DevOps engineers, platform owners, and architects who need to understand how the Factory ensures systems are observable, monitorable, and debuggable.
Observability agents configure monitoring, logging, tracing, and alerting for deployed systems. They operate post-deployment, ensuring that all systems have comprehensive observability from day one.
Important
Observability agents ensure that every deployed system is fully observable. They configure logging, tracing, metrics, dashboards, and alerts—enabling teams to understand system behavior, diagnose issues, and optimize performance.
Agent Cluster Composition¶
The Observability cluster consists of 1 specialized agent:
| Agent | Core Function | Primary Output |
|---|---|---|
| Observability Engineer Agent | Configures monitoring, logging, tracing, and alerting | Observability configurations, dashboards, alert rules |
Mission and Scope¶
Observability agents are responsible for:
- Monitoring Configuration - Setting up metrics collection and monitoring dashboards
- Logging Setup - Configuring structured logging and log aggregation
- Distributed Tracing - Setting up OpenTelemetry tracing and span collection
- Alert Configuration - Creating alert rules and notification channels
- Dashboard Creation - Building monitoring dashboards for system health
- SLO Definition - Defining Service Level Objectives and monitoring them
What They Do:
- Configure logging, tracing, and metrics
- Set up monitoring dashboards
- Define alert rules
- Integrate with observability platforms (Application Insights, Prometheus, Grafana)
- Analyze system health and performance
- Generate observability reports
What They Do NOT Do:
- Generate application code (Engineering agents)
- Deploy systems (DevOps agents)
- Fix bugs (Engineering/Bug Resolver agents)
Position in Factory Lifecycle¶
Observability agents operate post-deployment:
flowchart LR
Deployment[Deployment Orchestrator Agent] --> Observability[Observability Engineer Agent]
Observability --> Monitoring[Monitoring Active]
Observability --> Alerts[Alerts Configured]
Observability --> Dashboards[Dashboards Created]
style Deployment fill:#e1f5ff
style Observability fill:#fff4e1
style Monitoring fill:#e8f5e9
Observability Engineer Agent Details¶
Observability Engineer Agent¶
Role: Configures monitoring, logging, tracing, and alerting
Responsibilities: - Configure structured logging (Serilog, Application Insights) - Set up distributed tracing (OpenTelemetry) - Configure metrics collection (Prometheus, Application Insights) - Create monitoring dashboards (Grafana, Application Insights) - Define alert rules and notification channels - Set up health check endpoints - Configure log aggregation and retention - Define SLOs and monitor compliance - Generate observability documentation
Inputs: - Deployed services - Monitoring requirements - SLO definitions - Alert requirements - Observability platform preferences
Outputs:
- Observability configurations
- Dashboard definitions
- Alert rule configurations
- Metrics and tracing setup
- Observability documentation
- Event: ObservabilityConfigured
Key Deliverables: - Logging Configuration - Structured logging setup with correlation IDs - Tracing Configuration - OpenTelemetry instrumentation and span collection - Metrics Configuration - Performance and business metrics collection - Dashboards - Monitoring dashboards for system health - Alert Rules - Alert definitions for critical issues - SLO Monitoring - Service Level Objective tracking
Observability Pillars:
- Logs - Structured logging with correlation IDs, log levels, and context
- Traces - Distributed tracing with OpenTelemetry spans
- Metrics - Performance metrics, business metrics, and custom metrics
- Dashboards - Visual representations of system health
- Alerts - Proactive notifications for issues
See: Detailed Observability Engineer Agent specification in Factory Documentation
Observability Integration Flow¶
flowchart TD
Deploy[Deployment Completed] --> Obs[Observability Engineer Agent]
Obs --> Logs[Configure Logging]
Obs --> Traces[Configure Tracing]
Obs --> Metrics[Configure Metrics]
Obs --> Dashboards[Create Dashboards]
Obs --> Alerts[Configure Alerts]
Logs --> Platform[Observability Platform]
Traces --> Platform
Metrics --> Platform
Dashboards --> Platform
Alerts --> Platform
Platform --> Monitoring[Active Monitoring]
style Deploy fill:#e1f5ff
style Obs fill:#fff4e1
style Platform fill:#e8f5e9
style Monitoring fill:#f3e5f5
Typical Workflows¶
Workflow: Post-Deployment Observability Setup¶
- Deployment Orchestrator Agent completes deployment
- Observability Engineer Agent receives deployment event
- Configure Logging - Set up structured logging with correlation IDs
- Configure Tracing - Set up OpenTelemetry instrumentation
- Configure Metrics - Set up metrics collection
- Create Dashboards - Build monitoring dashboards
- Configure Alerts - Define alert rules and notifications
- Validate Setup - Verify observability is working
- Emit Event - Signal observability is configured
Workflow: Observability Enhancement¶
- Receive Feedback - Get observability requirements or issues
- Analyze Current Setup - Review existing observability configuration
- Enhance Configuration - Add missing metrics, logs, or traces
- Update Dashboards - Enhance or create new dashboards
- Refine Alerts - Adjust alert rules based on feedback
- Document Changes - Update observability documentation
Collaboration with Other Agents¶
With DevOps Agents¶
Receive: - Deployment events - Infrastructure metadata - Service endpoints
Provide: - Observability requirements - Monitoring configurations - Alert definitions
With Engineering Agents¶
Receive: - Service code structure - API endpoints - Business logic details
Provide: - Observability instrumentation requirements - Logging and tracing guidelines - Metrics collection patterns
With QA Agents¶
Receive: - Test execution results - Performance test data - Quality metrics
Use For: - Setting performance baselines - Defining SLOs - Configuring performance alerts
With Post-Production Agents¶
Provide: - System health metrics - Performance data - User behavior analytics
Receive: - Optimization requirements - Growth metrics needs - Customer success data requirements
Observability Standards¶
All observability configurations follow ConnectSoft standards:
- Structured Logging - JSON format with correlation IDs
- OpenTelemetry - Standard tracing and metrics
- Correlation IDs - Trace requests across services
- Health Checks - Standard health check endpoints
- Metrics Naming - Consistent metric naming conventions
- Alert Severity - Standardized alert severity levels
Related Documents¶
- Agent System Overview - How agents work together
- DevOps, Deployment, and Delivery Agents - Upstream agents
- Growth, Marketing, and Customer Success Agents - Post-production agents
- Observability-Driven Design - Observability principles
- Agent Execution Flow - Execution flow details