Skip to content

Observability Agents

This document provides a comprehensive overview of Observability agents in the ConnectSoft AI Software Factory. It is written for DevOps engineers, platform owners, and architects who need to understand how the Factory ensures systems are observable, monitorable, and debuggable.

Observability agents configure monitoring, logging, tracing, and alerting for deployed systems. They operate post-deployment, ensuring that all systems have comprehensive observability from day one.

Important

Observability agents ensure that every deployed system is fully observable. They configure logging, tracing, metrics, dashboards, and alerts—enabling teams to understand system behavior, diagnose issues, and optimize performance.

Agent Cluster Composition

The Observability cluster consists of 1 specialized agent:

Agent Core Function Primary Output
Observability Engineer Agent Configures monitoring, logging, tracing, and alerting Observability configurations, dashboards, alert rules

Mission and Scope

Observability agents are responsible for:

  • Monitoring Configuration - Setting up metrics collection and monitoring dashboards
  • Logging Setup - Configuring structured logging and log aggregation
  • Distributed Tracing - Setting up OpenTelemetry tracing and span collection
  • Alert Configuration - Creating alert rules and notification channels
  • Dashboard Creation - Building monitoring dashboards for system health
  • SLO Definition - Defining Service Level Objectives and monitoring them

What They Do:

  • Configure logging, tracing, and metrics
  • Set up monitoring dashboards
  • Define alert rules
  • Integrate with observability platforms (Application Insights, Prometheus, Grafana)
  • Analyze system health and performance
  • Generate observability reports

What They Do NOT Do:

  • Generate application code (Engineering agents)
  • Deploy systems (DevOps agents)
  • Fix bugs (Engineering/Bug Resolver agents)

Position in Factory Lifecycle

Observability agents operate post-deployment:

flowchart LR
    Deployment[Deployment Orchestrator Agent] --> Observability[Observability Engineer Agent]
    Observability --> Monitoring[Monitoring Active]
    Observability --> Alerts[Alerts Configured]
    Observability --> Dashboards[Dashboards Created]

    style Deployment fill:#e1f5ff
    style Observability fill:#fff4e1
    style Monitoring fill:#e8f5e9
Hold "Alt" / "Option" to enable pan & zoom

Observability Engineer Agent Details

Observability Engineer Agent

Role: Configures monitoring, logging, tracing, and alerting

Responsibilities: - Configure structured logging (Serilog, Application Insights) - Set up distributed tracing (OpenTelemetry) - Configure metrics collection (Prometheus, Application Insights) - Create monitoring dashboards (Grafana, Application Insights) - Define alert rules and notification channels - Set up health check endpoints - Configure log aggregation and retention - Define SLOs and monitor compliance - Generate observability documentation

Inputs: - Deployed services - Monitoring requirements - SLO definitions - Alert requirements - Observability platform preferences

Outputs: - Observability configurations - Dashboard definitions - Alert rule configurations - Metrics and tracing setup - Observability documentation - Event: ObservabilityConfigured

Key Deliverables: - Logging Configuration - Structured logging setup with correlation IDs - Tracing Configuration - OpenTelemetry instrumentation and span collection - Metrics Configuration - Performance and business metrics collection - Dashboards - Monitoring dashboards for system health - Alert Rules - Alert definitions for critical issues - SLO Monitoring - Service Level Objective tracking

Observability Pillars:

  1. Logs - Structured logging with correlation IDs, log levels, and context
  2. Traces - Distributed tracing with OpenTelemetry spans
  3. Metrics - Performance metrics, business metrics, and custom metrics
  4. Dashboards - Visual representations of system health
  5. Alerts - Proactive notifications for issues

See: Detailed Observability Engineer Agent specification in Factory Documentation

Observability Integration Flow

flowchart TD
    Deploy[Deployment Completed] --> Obs[Observability Engineer Agent]
    Obs --> Logs[Configure Logging]
    Obs --> Traces[Configure Tracing]
    Obs --> Metrics[Configure Metrics]
    Obs --> Dashboards[Create Dashboards]
    Obs --> Alerts[Configure Alerts]

    Logs --> Platform[Observability Platform]
    Traces --> Platform
    Metrics --> Platform
    Dashboards --> Platform
    Alerts --> Platform

    Platform --> Monitoring[Active Monitoring]

    style Deploy fill:#e1f5ff
    style Obs fill:#fff4e1
    style Platform fill:#e8f5e9
    style Monitoring fill:#f3e5f5
Hold "Alt" / "Option" to enable pan & zoom

Typical Workflows

Workflow: Post-Deployment Observability Setup

  1. Deployment Orchestrator Agent completes deployment
  2. Observability Engineer Agent receives deployment event
  3. Configure Logging - Set up structured logging with correlation IDs
  4. Configure Tracing - Set up OpenTelemetry instrumentation
  5. Configure Metrics - Set up metrics collection
  6. Create Dashboards - Build monitoring dashboards
  7. Configure Alerts - Define alert rules and notifications
  8. Validate Setup - Verify observability is working
  9. Emit Event - Signal observability is configured

Workflow: Observability Enhancement

  1. Receive Feedback - Get observability requirements or issues
  2. Analyze Current Setup - Review existing observability configuration
  3. Enhance Configuration - Add missing metrics, logs, or traces
  4. Update Dashboards - Enhance or create new dashboards
  5. Refine Alerts - Adjust alert rules based on feedback
  6. Document Changes - Update observability documentation

Collaboration with Other Agents

With DevOps Agents

Receive: - Deployment events - Infrastructure metadata - Service endpoints

Provide: - Observability requirements - Monitoring configurations - Alert definitions

With Engineering Agents

Receive: - Service code structure - API endpoints - Business logic details

Provide: - Observability instrumentation requirements - Logging and tracing guidelines - Metrics collection patterns

With QA Agents

Receive: - Test execution results - Performance test data - Quality metrics

Use For: - Setting performance baselines - Defining SLOs - Configuring performance alerts

With Post-Production Agents

Provide: - System health metrics - Performance data - User behavior analytics

Receive: - Optimization requirements - Growth metrics needs - Customer success data requirements

Observability Standards

All observability configurations follow ConnectSoft standards:

  • Structured Logging - JSON format with correlation IDs
  • OpenTelemetry - Standard tracing and metrics
  • Correlation IDs - Trace requests across services
  • Health Checks - Standard health check endpoints
  • Metrics Naming - Consistent metric naming conventions
  • Alert Severity - Standardized alert severity levels