Collector Questions
- How do I add a custom collector?
- How often should collectors run?
Tracing Questions
- How do I configure OTLP export?
- Should I sample traces?
Alert Questions
- Circuit breaker vs alert — when should I use which?
- How do I prevent flapping alerts?
Integration Questions
- How does monitoring_system integrate with logger_system?
- Bidirectional DI — what is it?
Performance Questions
- What's the overhead of instrumentation?
- Is the registry thread-safe?
Plugin Questions
- Can I load collectors dynamically?
Storage Questions
- Which storage backends are supported?

Collector Questions

How do I add a custom collector?

Derive from metric_base, implement collect(), and register the type with the factory:

class my_metric : public metric_base {
    metric_sample collect() override { ... }
};
 
metric_factory::instance().register_type<my_metric>("my.metric");

See Metrics Tutorial for a full example.

How often should collectors run?

Depends on the metric cost and the signal frequency. System metrics (CPU, RSS) at 5–10s is usually fine. High-rate counters (request count) should use event-driven updates, not polling.

Tracing Questions

How do I configure OTLP export?

Create an otlp_exporter_config, set the endpoint and service name, construct an otlp_exporter, and register it with the tracer:

otlp_exporter_config cfg;
cfg.endpoint = "http://collector:4318/v1/traces";
cfg.service_name = "my-service";
tracer->register_exporter(std::make_shared<otlp_exporter>(cfg));

Should I sample traces?

Yes — at scale, exporting every span is expensive and mostly noise. Start with probabilistic sampling (e.g., 1%), and add rules to always sample errors and slow requests.

Alert Questions

Circuit breaker vs alert — when should I use which?

Circuit breaker: Automatically stops calls to a failing dependency so the caller doesn't cascade failure. Runs in-process, protects latency budgets.
Alert: Notifies a human that something is wrong. Runs out-of-process, drives investigation.

Use both. The breaker keeps the system running; the alert tells you to fix the root cause.

How do I prevent flapping alerts?

Use the sustained duration on threshold_trigger so transient spikes don't fire. Combine with the suppression_window to prevent re-alerts on the same condition within a cooldown period.

Integration Questions

How does monitoring_system integrate with logger_system?

Register the logger as a notification sink, and the alert pipeline will write alert events to the log in addition to external notifiers. See examples/logger_di_integration_example.cpp.

Bidirectional DI — what is it?

monitoring_system can be both a service that other systems consume (they push metrics into it) and a consumer of other services (it pulls context from logger_system, etc.). The DI container supports both directions without circular dependency. See examples/bidirectional_di_example.cpp.

Performance Questions

What's the overhead of instrumentation?

A well-tuned span creation is ~200ns and a metric sample push is ~50ns. The dominant cost is the exporter I/O, which runs async. Budget for 1–3% overhead in production workloads; more if you over-instrument.

Is the registry thread-safe?

Yes. Collectors, metrics registration, and trigger evaluation are all thread-safe. Registration during hot paths is discouraged for performance, not correctness.

Plugin Questions

Can I load collectors dynamically?

Yes — the plugin system supports dynamic loading. See examples/plugin_collector_example.cpp and examples/plugin_example/. Dynamic plugins are useful for optional telemetry that shouldn't link into the main binary.

Storage Questions

Which storage backends are supported?

In-memory ring buffer (default), file-backed ring buffer, and OTLP export. For long-term retention, export to a proper time-series database via OTLP and let the backend handle storage.

Table of Contents