Observability Guide
What is CVT Observability?
CVT provides comprehensive observability features to monitor the health, performance, and usage of your contract validation infrastructure. This includes Prometheus metrics for real-time monitoring, Grafana dashboards for visualization, and structured logging for debugging and audit trails.
For system architecture context including how observability integrates with the CVT server, see the Architecture Overview.
Overview
CVT provides comprehensive observability through:
- Prometheus Metrics: Real-time metrics collection and storage
- Grafana Dashboards: Visual monitoring and analytics
- Structured Logging: Detailed operation logs with Zap logger
Architecture
Quick Start
Start the Observability Stack
# Start CVT server, Prometheus, and Grafana
make up
# Check status
make observability-status
Access the UIs
- Grafana: http://localhost:3000 (admin/admin)
- Prometheus: http://localhost:9091
- Metrics Endpoint: http://localhost:9551/metrics
Quick Commands
# View metrics in terminal
make metrics
# Open Grafana dashboard
make grafana
# Open Prometheus UI
make prometheus
# View observability logs
make observability-logs
Metrics Collected
Schema Registration Metrics
| Metric | Type | Labels | Description |
|---|---|---|---|
cvt_schemas_registered_total | Counter | status (success/failure) | Total number of schemas registered |
cvt_schema_registration_errors_total | Counter | error_type | Schema registration errors by type |
Validation Metrics
| Metric | Type | Labels | Description |
|---|---|---|---|
cvt_validations_total | Counter | schema_id, method, result | Total validations performed |
cvt_validation_duration_seconds | Histogram | schema_id, method | Validation operation duration |
cvt_validation_errors_total | Counter | error_category | Validation errors by category |
Error Categories:
input_validation: Invalid request parametersschema_not_found: Schema not found in cachehttp_request_creation: Failed to create HTTP request for validationrouter_creation: Failed to create OpenAPI routerroute_not_found: Route not found in OpenAPI specrequest_invalid: HTTP request validation failedresponse_invalid: HTTP response validation failed
Cache Metrics
| Metric | Type | Description |
|---|---|---|
cvt_cache_hits_total | Counter | Total cache hits |
cvt_cache_misses_total | Counter | Total cache misses |
cvt_cache_size_bytes | Gauge | Current cache size in bytes |
cvt_cache_items_total | Gauge | Current number of cached items |
gRPC Metrics
| Metric | Type | Labels | Description |
|---|---|---|---|
cvt_grpc_requests_total | Counter | method, status | Total gRPC requests |
cvt_grpc_request_duration_seconds | Histogram | method | gRPC request duration |
Methods:
RegisterSchemaValidateInteractionGetSchemaListSchemasValidateProducerResponseCompareSchemasGenerateFixtureListEndpointsRegisterConsumerListConsumersDeregisterConsumerCanIDeploy
Schema Versioning Metrics
| Metric | Type | Labels | Description |
|---|---|---|---|
cvt_breaking_changes_detected_total | Counter | change_type | Breaking changes detected by type |
cvt_schema_versions_total | Gauge | schema_id | Number of versions per schema |
Change Types:
ENDPOINT_REMOVEDREQUIRED_FIELD_ADDEDTYPE_CHANGEDREQUIRED_PARAMETER_ADDEDRESPONSE_SCHEMA_CHANGEDENUM_VALUE_REMOVED
Authentication Metrics
| Metric | Type | Labels | Description |
|---|---|---|---|
cvt_auth_success_total | Counter | - | Total successful authentications |
cvt_auth_failure_total | Counter | reason | Authentication failures by reason |
Failure Reasons:
missing_key: No API key providedinvalid_key: Invalid API key
Governance Metrics
| Metric | Type | Labels | Description |
|---|---|---|---|
cvt_schemas_by_owner | Gauge | owner | Number of schemas per owner |
cvt_schemas_by_team | Gauge | team | Number of schemas per team |
cvt_read_only_violations_total | Counter | - | Attempts to modify read-only schemas |
Audit Metrics
| Metric | Type | Labels | Description |
|---|---|---|---|
cvt_audit_events_total | Counter | event_type | Total audit events by type |
Grafana Dashboard
The CVT Grafana dashboard provides real-time visualization of:
Panels
-
Validations per Second (Stat)
- Current rate of validations
- Threshold indicators for performance
-
Validation Results Over Time (Time Series)
- Valid vs Invalid vs Error validations
- Trends and patterns
-
Validation Latency (Percentiles) (Time Series)
- p50, p95, p99 latency
- Performance SLOs
-
Cache Hit Rate (Gauge)
- Visual indicator of cache effectiveness
- Color-coded thresholds:
- Red: < 50%
- Yellow: 50-80%
- Green: > 80%
-
Validation Errors by Category (Time Series)
- Error distribution over time
- Helps identify problem areas
-
gRPC Requests by Method (Time Series)
- Request distribution
- Usage patterns
-
Summary Stats (Stats)
- Total Schemas Registered
- Cache Hits/sec
- Cache Misses/sec
Accessing the Dashboard
- Open Grafana: http://localhost:3000
- Login with
admin/admin - Navigate to Dashboards > CVT - Contract Validator Toolkit
Prometheus Configuration
The Prometheus configuration scrapes metrics from the CVT server every 10 seconds:
scrape_configs:
- job_name: 'cvt-server'
scrape_interval: 10s
static_configs:
- targets: ['cvt-server:9551']
Useful Prometheus Queries
Validation Rate
sum(rate(cvt_validations_total[5m]))
Cache Hit Rate
sum(rate(cvt_cache_hits_total[5m])) /
(sum(rate(cvt_cache_hits_total[5m])) + sum(rate(cvt_cache_misses_total[5m])))
p95 Latency
histogram_quantile(0.95, sum(rate(cvt_validation_duration_seconds_bucket[5m])) by (le))
Error Rate by Category
sum by (error_category) (rate(cvt_validation_errors_total[5m]))
Breaking Changes by Type
sum by (change_type) (rate(cvt_breaking_changes_detected_total[5m]))
Authentication Failure Rate
sum by (reason) (rate(cvt_auth_failure_total[5m]))
Custom Metrics Endpoint
The CVT server exposes a Prometheus-compatible /metrics endpoint on port 9551:
curl http://localhost:9551/metrics
Example Output
# HELP cvt_validations_total Total number of validations performed
# TYPE cvt_validations_total counter
cvt_validations_total{method="POST",result="valid",schema_id="petstore-v3"} 42
# HELP cvt_validation_duration_seconds Duration of validation operations in seconds
# TYPE cvt_validation_duration_seconds histogram
cvt_validation_duration_seconds_bucket{le="0.001",method="POST",schema_id="petstore-v3"} 10
cvt_validation_duration_seconds_bucket{le="0.005",method="POST",schema_id="petstore-v3"} 35
Structured Logging
CVT uses Zap for structured logging.
Log Levels
Set via LOG_LEVEL environment variable:
debug: Verbose logging (development)info: Standard logging (production)warn: Warnings onlyerror: Errors only
Example Logs
{
"level": "info",
"ts": 1638360000.123,
"caller": "server/cvtservice/service.go:110",
"msg": "Schema registered successfully",
"schemaId": "petstore-v3"
}
{
"level": "info",
"ts": 1638360001.456,
"caller": "server/cvtservice/service.go:237",
"msg": "Interaction validated successfully",
"schemaId": "petstore-v3",
"method": "POST",
"path": "/pets"
}
Production Deployment
Recommended Configuration
# docker-compose.yml
services:
cvt-server:
environment:
- LOG_LEVEL=info
- CVT_PORT=9550
- CVT_METRICS_PORT=9551
prometheus:
volumes:
- prometheus-data:/prometheus
restart: unless-stopped
grafana:
volumes:
- grafana-data:/var/lib/grafana
environment:
- GF_SECURITY_ADMIN_PASSWORD=${GRAFANA_PASSWORD}
restart: unless-stopped
Security Considerations
- Change default credentials: Update Grafana admin password
- Network isolation: Use Docker networks to isolate services
- TLS encryption: Enable HTTPS for Grafana in production
- Authentication: Configure Grafana OAuth or LDAP
- Firewall rules: Restrict access to metrics and monitoring ports
Retention and Storage
- Prometheus: Default retention is 15 days
- Grafana: Stores dashboard configurations only
- Metrics cardinality: Monitor label cardinality to prevent explosion
# Increase Prometheus retention
prometheus:
command:
- '--storage.tsdb.retention.time=30d'
- '--storage.tsdb.retention.size=10GB'
Alerting
Sample Alert Rules
Create observability/alert-rules.yml:
groups:
- name: cvt_alerts
interval: 30s
rules:
- alert: HighErrorRate
expr: sum(rate(cvt_validation_errors_total[5m])) > 10
for: 5m
labels:
severity: warning
annotations:
summary: High validation error rate
description: Error rate is {{ $value }} errors/sec
- alert: LowCacheHitRate
expr: |
sum(rate(cvt_cache_hits_total[5m])) /
(sum(rate(cvt_cache_hits_total[5m])) + sum(rate(cvt_cache_misses_total[5m]))) < 0.5
for: 10m
labels:
severity: warning
annotations:
summary: Cache hit rate is low
description: Cache hit rate is {{ $value | humanizePercentage }}
- alert: HighLatency
expr: histogram_quantile(0.95, sum(rate(cvt_validation_duration_seconds_bucket[5m])) by (le)) > 0.1
for: 5m
labels:
severity: warning
annotations:
summary: High validation latency
description: p95 latency is {{ $value }}s
Troubleshooting
Metrics Not Appearing
-
Check CVT server is running:
make status -
Verify metrics endpoint is accessible:
curl http://localhost:9551/metrics -
Check Prometheus is scraping:
- Open http://localhost:9091
- Go to Status > Targets
- Verify
cvt-servertarget is UP
Grafana Dashboard Not Loading
-
Check Grafana is running:
docker ps | grep grafana -
Verify datasource configuration:
- Open http://localhost:3000
- Go to Configuration > Data Sources
- Verify Prometheus is configured and working
-
Check logs:
make observability-logs
High Memory Usage
Monitor cache size and adjust configuration:
const (
MaxSchemas = 1000 // Reduce if memory is constrained
SchemaTTL = 24 * time.Hour // Reduce to free memory faster
)
Further Reading
- Prometheus Best Practices
- Grafana Dashboards
- OpenTelemetry (future enhancement)
- Zap Logging
Related Documentation
- Configuration Reference - Environment variables
- Development Guide - Local development setup