Goal
Build an alert pipeline that evaluates metrics against thresholds, triggers notifications through multiple channels, and gracefully degrades when a notifier fails.
Step 1: Define a trigger
A trigger watches a metric and fires when a condition is met.
#include <kcenon/monitoring/alerts/trigger.h>
auto high_cpu = std::make_shared<threshold_trigger>(
"cpu.usage",
comparison::greater_than,
80.0,
std::chrono::seconds(30)
);
The trigger only fires if the condition holds for the sustained duration — this avoids pager fatigue from transient spikes.
Step 2: Attach notifiers
Notifiers deliver alerts to external systems (Slack, email, PagerDuty). Multiple notifiers can subscribe to the same trigger.
#include <kcenon/monitoring/alerts/notifiers/slack_notifier.h>
auto slack = std::make_shared<slack_notifier>("https://hooks.slack.com/...");
auto email = std::make_shared<email_notifier>("alerts@example.com");
high_cpu->add_notifier(slack);
high_cpu->add_notifier(email);
Step 3: Register with the pipeline
The alert pipeline evaluates all registered triggers against each incoming metric sample.
auto pipeline = std::make_shared<alert_pipeline>();
pipeline->add_trigger(high_cpu);
registry.register_sink(pipeline);
Graceful Degradation
If a notifier fails (e.g., Slack API is down), the pipeline records the failure via the circuit breaker and continues delivering via the remaining channels. Use on_failure_callback to record these events:
pipeline->on_notifier_failure([](const notifier& n, const error_info& err) {
logger->warn("Notifier {} failed: {}", n.name(), err.message);
});
Common Mistakes
- Alerting on noisy metrics without aggregation. Raw per-request latencies flap. Use p99 over a window instead.
- Too-sensitive thresholds. Alerts that fire dozens of times a day train operators to ignore them. Tune for signal, not noise.
- Single notifier channel. If the one channel is down, you miss the alert that it's down. Always have a fallback.
Next Steps