Monitoring System 0.1.0
System resource monitoring with pluggable collectors and alerting
Loading...
Searching...
No Matches
kcenon::monitoring::health_monitor Class Reference

Health monitor with dependency management, auto-recovery, and statistics. More...

#include <health_monitor.h>

Collaboration diagram for kcenon::monitoring::health_monitor:
Collaboration graph

Public Member Functions

 health_monitor ()=default
 Default constructor with default configuration.
 
 health_monitor (const health_monitor_config &config)
 Construct with custom configuration.
 
virtual ~health_monitor ()
 Destructor. Stops the monitoring loop if running.
 
common::Result< bool > register_check (const std::string &name, std::shared_ptr< health_check > check)
 Register a named health check.
 
common::Result< bool > unregister_check (const std::string &name)
 Remove a previously registered health check.
 
common::Result< health_check_resultcheck (const std::string &name)
 Execute a single named health check (with dependency verification).
 
std::unordered_map< std::string, health_check_resultcheck_all ()
 Execute all registered health checks.
 
common::Result< bool > add_dependency (const std::string &dependent, const std::string &dependency)
 Add a dependency between two registered health checks.
 
common::VoidResult start ()
 Start the periodic health monitoring background thread.
 
common::VoidResult stop ()
 Stop the periodic health monitoring background thread.
 
bool is_running () const
 Check whether the monitoring background thread is running.
 
void refresh ()
 Manually refresh all health checks and trigger recovery if needed.
 
void register_recovery_handler (const std::string &check_name, std::function< bool()> handler)
 Register a recovery handler for a named health check.
 
health_status get_overall_status ()
 Get the aggregate health status across all cached results.
 
health_monitor_stats get_stats () const
 Get accumulated health monitoring statistics.
 
std::string get_health_report ()
 Generate a human-readable health report.
 
health_check_result check_health () const
 Quick self-check of the health monitor itself.
 

Private Member Functions

void run_monitoring_loop ()
 
void update_stats (const health_check_result &result)
 

Private Attributes

health_monitor_config config_
 
health_monitor_stats stats_
 
health_dependency_graph dependency_graph_
 
std::shared_mutex mutex_
 
std::mutex lifecycle_mutex_
 
std::mutex cv_mutex_
 
std::condition_variable cv_
 
std::unordered_map< std::string, std::shared_ptr< health_check > > checks_
 
std::unordered_map< std::string, std::function< bool()> > recovery_handlers_
 
std::unordered_map< std::string, health_check_resultcached_results_
 
std::atomic< bool > running_ {false}
 
std::thread monitor_thread_
 

Detailed Description

Health monitor with dependency management, auto-recovery, and statistics.

Manages a collection of named health checks, runs them periodically on a background thread, maintains cached results, and optionally invokes recovery handlers when checks fail.

Lifecycle

  1. Create monitor with optional configuration
  2. Register health checks and optional recovery handlers
  3. Call start() to begin periodic monitoring
  4. Query status via get_overall_status(), check(), or get_health_report()
  5. Call stop() or let the destructor handle cleanup

Thread Safety

All public methods are thread-safe. Internal state is protected by std::shared_mutex (data) and std::mutex (lifecycle).

See also
health_check For the check interface
health_check_builder For creating checks
health_dependency_graph For dependency management
Examples
health_reliability_example.cpp, and production_monitoring_example.cpp.

Definition at line 696 of file health_monitor.h.

Constructor & Destructor Documentation

◆ health_monitor() [1/2]

kcenon::monitoring::health_monitor::health_monitor ( )
default

Default constructor with default configuration.

◆ health_monitor() [2/2]

kcenon::monitoring::health_monitor::health_monitor ( const health_monitor_config & config)
inlineexplicit

Construct with custom configuration.

Parameters
configHealth monitor configuration settings

Definition at line 705 of file health_monitor.h.

705: config_(config) {}

◆ ~health_monitor()

virtual kcenon::monitoring::health_monitor::~health_monitor ( )
inlinevirtual

Destructor. Stops the monitoring loop if running.

Definition at line 708 of file health_monitor.h.

708{ stop(); }
common::VoidResult stop()
Stop the periodic health monitoring background thread.

References stop().

Here is the call graph for this function:

Member Function Documentation

◆ add_dependency()

common::Result< bool > kcenon::monitoring::health_monitor::add_dependency ( const std::string & dependent,
const std::string & dependency )
inline

Add a dependency between two registered health checks.

Parameters
dependentName of the check that depends on another
dependencyName of the check being depended upon
Returns
Ok(true) on success, or error if checks not found or cycle detected

Definition at line 795 of file health_monitor.h.

795 {
796 std::lock_guard<std::shared_mutex> lock(mutex_);
797 return dependency_graph_.add_dependency(dependent, dependency);
798 }
common::Result< bool > add_dependency(const std::string &dependent, const std::string &dependency)
Add a dependency edge: dependent depends on dependency.
health_dependency_graph dependency_graph_

References kcenon::monitoring::health_dependency_graph::add_dependency(), dependency_graph_, and mutex_.

Here is the call graph for this function:

◆ check()

common::Result< health_check_result > kcenon::monitoring::health_monitor::check ( const std::string & name)
inline

Execute a single named health check (with dependency verification).

Parameters
nameName of the check to execute
Returns
Ok(result) on success, or error if the check was not found

Definition at line 756 of file health_monitor.h.

756 {
757 std::lock_guard<std::shared_mutex> lock(mutex_);
758
759 auto it = checks_.find(name);
760 if (it == checks_.end()) {
762 "Check '" + name + "' not found");
763 return common::Result<health_check_result>::err(err.to_common_error());
764 }
765
766 auto result = dependency_graph_.check_with_dependencies(name);
767 update_stats(result);
768 cached_results_[name] = result;
769 return common::ok(result);
770 }
health_check_result check_with_dependencies(const std::string &name)
Execute a health check after verifying all its dependencies are healthy.
std::unordered_map< std::string, std::shared_ptr< health_check > > checks_
void update_stats(const health_check_result &result)
std::unordered_map< std::string, health_check_result > cached_results_

References cached_results_, kcenon::monitoring::health_dependency_graph::check_with_dependencies(), checks_, dependency_graph_, mutex_, kcenon::monitoring::not_found, kcenon::monitoring::error_info::to_common_error(), and update_stats().

Referenced by check_all(), refresh(), and register_check().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ check_all()

std::unordered_map< std::string, health_check_result > kcenon::monitoring::health_monitor::check_all ( )
inline

Execute all registered health checks.

Returns
Map of check name to result for every registered check

Definition at line 776 of file health_monitor.h.

776 {
777 std::lock_guard<std::shared_mutex> lock(mutex_);
778
779 std::unordered_map<std::string, health_check_result> results;
780 for (const auto& [name, check] : checks_) {
781 auto result = check->check();
782 results[name] = result;
783 cached_results_[name] = result;
784 update_stats(result);
785 }
786 return results;
787 }
common::Result< health_check_result > check(const std::string &name)
Execute a single named health check (with dependency verification).

References cached_results_, check(), checks_, mutex_, and update_stats().

Referenced by demonstrate_health_monitoring().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ check_health()

health_check_result kcenon::monitoring::health_monitor::check_health ( ) const
inline

Quick self-check of the health monitor itself.

Returns
Always returns healthy with "Health monitor operational" message

Definition at line 961 of file health_monitor.h.

961 {
962 health_check_result result;
963 result.status = health_status::healthy;
964 result.message = "Health monitor operational";
965 result.timestamp = std::chrono::system_clock::now();
966 return result;
967 }

References kcenon::monitoring::healthy, kcenon::monitoring::health_check_result::message, kcenon::monitoring::health_check_result::status, and kcenon::monitoring::health_check_result::timestamp.

Referenced by main().

Here is the caller graph for this function:

◆ get_health_report()

std::string kcenon::monitoring::health_monitor::get_health_report ( )
inline

Generate a human-readable health report.

Returns
Multi-line string summarizing the status of all cached checks

Definition at line 925 of file health_monitor.h.

925 {
926 std::shared_lock<std::shared_mutex> lock(mutex_);
927
928 std::string report = "Health Report:\n";
929
930 if (cached_results_.empty()) {
931 report += " No health checks have been performed yet.\n";
932 return report;
933 }
934
935 for (const auto& [name, result] : cached_results_) {
936 report += " " + name + ": ";
937 switch (result.status) {
939 report += "HEALTHY";
940 break;
942 report += "DEGRADED";
943 break;
945 report += "UNHEALTHY";
946 break;
947 default:
948 report += "UNKNOWN";
949 break;
950 }
951 report += " - " + result.message + "\n";
952 }
953
954 return report;
955 }

References cached_results_, kcenon::monitoring::degraded, kcenon::monitoring::healthy, mutex_, and kcenon::monitoring::unhealthy.

Referenced by demonstrate_health_monitoring().

Here is the caller graph for this function:

◆ get_overall_status()

health_status kcenon::monitoring::health_monitor::get_overall_status ( )
inline

Get the aggregate health status across all cached results.

Returns
healthy if all checks pass, degraded if any are degraded, unhealthy if any are unhealthy, unknown if no results exist

Definition at line 889 of file health_monitor.h.

889 {
890 std::shared_lock<std::shared_mutex> lock(mutex_);
891
892 if (checks_.empty()) {
894 }
895
896 bool has_unhealthy = false;
897 bool has_degraded = false;
898
899 for (const auto& [name, result] : cached_results_) {
900 if (result.status == health_status::unhealthy) {
901 has_unhealthy = true;
902 } else if (result.status == health_status::degraded) {
903 has_degraded = true;
904 }
905 }
906
907 if (has_unhealthy) return health_status::unhealthy;
908 if (has_degraded) return health_status::degraded;
910 }

References cached_results_, checks_, kcenon::monitoring::degraded, kcenon::monitoring::healthy, mutex_, kcenon::monitoring::unhealthy, and kcenon::monitoring::unknown.

Referenced by demonstrate_health_monitoring().

Here is the caller graph for this function:

◆ get_stats()

health_monitor_stats kcenon::monitoring::health_monitor::get_stats ( ) const
inline

Get accumulated health monitoring statistics.

Returns
Copy of the current statistics

Definition at line 916 of file health_monitor.h.

916 {
917 std::shared_lock<std::shared_mutex> lock(mutex_);
918 return stats_;
919 }

References mutex_, and stats_.

◆ is_running()

bool kcenon::monitoring::health_monitor::is_running ( ) const
inline

Check whether the monitoring background thread is running.

Returns
true if the monitoring loop is active

Definition at line 841 of file health_monitor.h.

841 {
842 return running_.load();
843 }

References running_.

◆ refresh()

void kcenon::monitoring::health_monitor::refresh ( )
inline

Manually refresh all health checks and trigger recovery if needed.

Runs every registered check, updates cached results and statistics, and invokes recovery handlers for unhealthy checks when auto-recovery is enabled.

Definition at line 851 of file health_monitor.h.

851 {
852 std::lock_guard<std::shared_mutex> lock(mutex_);
853
854 for (const auto& [name, check] : checks_) {
855 auto result = check->check();
856 cached_results_[name] = result;
857 update_stats(result);
858
859 if (result.status == health_status::unhealthy) {
860 auto it = recovery_handlers_.find(name);
863 if (it->second()) {
865 }
866 }
867 }
868 }
869
870 stats_.last_check_time = std::chrono::system_clock::now();
871 }
std::unordered_map< std::string, std::function< bool()> > recovery_handlers_
bool enable_auto_recovery
Whether to invoke recovery handlers on failure.
std::chrono::system_clock::time_point last_check_time
Timestamp of the last check cycle.
size_t successful_recoveries
Number of successful recovery attempts.
size_t recovery_attempts
Number of auto-recovery attempts made.

References cached_results_, check(), checks_, config_, kcenon::monitoring::health_monitor_config::enable_auto_recovery, kcenon::monitoring::health_monitor_stats::last_check_time, mutex_, kcenon::monitoring::health_monitor_stats::recovery_attempts, recovery_handlers_, stats_, kcenon::monitoring::health_monitor_stats::successful_recoveries, kcenon::monitoring::unhealthy, and update_stats().

Referenced by demonstrate_health_monitoring(), and run_monitoring_loop().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ register_check()

common::Result< bool > kcenon::monitoring::health_monitor::register_check ( const std::string & name,
std::shared_ptr< health_check > check )
inline

Register a named health check.

Parameters
nameUnique name for this check
checkThe health check implementation
Returns
Ok(true) on success, or error if the name already exists

Definition at line 716 of file health_monitor.h.

716 {
717 std::lock_guard<std::shared_mutex> lock(mutex_);
718
719 if (checks_.find(name) != checks_.end()) {
720 return common::Result<bool>::err(error_info(monitoring_error_code::already_exists, "Check '" + name + "' already registered").to_common_error());
721 }
722
723 checks_[name] = std::move(check);
724 auto graph_result = dependency_graph_.add_node(name, checks_[name]);
725 if (graph_result.is_err()) {
726 checks_.erase(name);
727 return common::Result<bool>::err(graph_result.error());
728 }
729 return common::ok(true);
730 }
common::Result< bool > add_node(const std::string &name, std::shared_ptr< health_check > check)
Add a health check node to the graph.

References kcenon::monitoring::health_dependency_graph::add_node(), kcenon::monitoring::already_exists, check(), checks_, dependency_graph_, and mutex_.

Referenced by demonstrate_health_monitoring(), and main().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ register_recovery_handler()

void kcenon::monitoring::health_monitor::register_recovery_handler ( const std::string & check_name,
std::function< bool()> handler )
inline

Register a recovery handler for a named health check.

Parameters
check_nameName of the health check this handler is for
handlerCallable that attempts recovery; returns true on success

Definition at line 878 of file health_monitor.h.

879 {
880 std::lock_guard<std::shared_mutex> lock(mutex_);
881 recovery_handlers_[check_name] = std::move(handler);
882 }

References mutex_, and recovery_handlers_.

Referenced by demonstrate_health_monitoring().

Here is the caller graph for this function:

◆ run_monitoring_loop()

void kcenon::monitoring::health_monitor::run_monitoring_loop ( )
inlineprivate

Definition at line 970 of file health_monitor.h.

970 {
971 while (running_.load()) {
972 refresh();
973
974 std::unique_lock<std::mutex> lock(cv_mutex_);
975 cv_.wait_for(lock, config_.check_interval, [this]() {
976 return !running_.load();
977 });
978 }
979 }
void refresh()
Manually refresh all health checks and trigger recovery if needed.
std::chrono::milliseconds check_interval
Interval between automatic health check cycles.

References kcenon::monitoring::health_monitor_config::check_interval, config_, cv_, cv_mutex_, refresh(), and running_.

Referenced by start().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ start()

common::VoidResult kcenon::monitoring::health_monitor::start ( )
inline

Start the periodic health monitoring background thread.

Returns
Ok on success; no-op if already running

Definition at line 804 of file health_monitor.h.

804 {
805 std::lock_guard<std::mutex> lock(lifecycle_mutex_);
806
807 if (running_.load()) {
808 return common::ok();
809 }
810
811 running_.store(true);
812 monitor_thread_ = std::thread([this]() { run_monitoring_loop(); });
813 return common::ok();
814 }

References lifecycle_mutex_, monitor_thread_, run_monitoring_loop(), and running_.

Referenced by demonstrate_health_monitoring(), and main().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ stop()

common::VoidResult kcenon::monitoring::health_monitor::stop ( )
inline

Stop the periodic health monitoring background thread.

Returns
Ok on success; no-op if not running. Blocks until the thread joins.

Definition at line 820 of file health_monitor.h.

820 {
821 std::lock_guard<std::mutex> lock(lifecycle_mutex_);
822
823 if (!running_.load()) {
824 return common::ok();
825 }
826
827 running_.store(false);
828 cv_.notify_all();
829
830 if (monitor_thread_.joinable()) {
831 monitor_thread_.join();
832 }
833
834 return common::ok();
835 }

References cv_, lifecycle_mutex_, monitor_thread_, and running_.

Referenced by demonstrate_health_monitoring(), main(), HealthMonitoringTest::SetUp(), HealthMonitoringTest::TearDown(), and ~health_monitor().

Here is the caller graph for this function:

◆ unregister_check()

common::Result< bool > kcenon::monitoring::health_monitor::unregister_check ( const std::string & name)
inline

Remove a previously registered health check.

Parameters
nameName of the check to remove
Returns
Ok(true) on success, or error if the check was not found

Definition at line 737 of file health_monitor.h.

737 {
738 std::lock_guard<std::shared_mutex> lock(mutex_);
739
740 if (checks_.find(name) == checks_.end()) {
742 "Check '" + name + "' not found");
743 return common::Result<bool>::err(err.to_common_error());
744 }
745
746 checks_.erase(name);
747 recovery_handlers_.erase(name);
748 return common::ok(true);
749 }

References checks_, mutex_, kcenon::monitoring::not_found, recovery_handlers_, and kcenon::monitoring::error_info::to_common_error().

Here is the call graph for this function:

◆ update_stats()

void kcenon::monitoring::health_monitor::update_stats ( const health_check_result & result)
inlineprivate

Definition at line 981 of file health_monitor.h.

981 {
983 switch (result.status) {
986 break;
989 break;
992 break;
993 default:
994 break;
995 }
996 }
size_t degraded_checks
Number of checks that returned degraded.
size_t total_checks
Total number of health checks performed.
size_t unhealthy_checks
Number of checks that returned unhealthy.
size_t healthy_checks
Number of checks that returned healthy.

References kcenon::monitoring::degraded, kcenon::monitoring::health_monitor_stats::degraded_checks, kcenon::monitoring::healthy, kcenon::monitoring::health_monitor_stats::healthy_checks, stats_, kcenon::monitoring::health_check_result::status, kcenon::monitoring::health_monitor_stats::total_checks, kcenon::monitoring::unhealthy, and kcenon::monitoring::health_monitor_stats::unhealthy_checks.

Referenced by check(), check_all(), and refresh().

Here is the caller graph for this function:

Member Data Documentation

◆ cached_results_

std::unordered_map<std::string, health_check_result> kcenon::monitoring::health_monitor::cached_results_
private

Definition at line 1009 of file health_monitor.h.

Referenced by check(), check_all(), get_health_report(), get_overall_status(), and refresh().

◆ checks_

std::unordered_map<std::string, std::shared_ptr<health_check> > kcenon::monitoring::health_monitor::checks_
private

◆ config_

health_monitor_config kcenon::monitoring::health_monitor::config_
private

Definition at line 998 of file health_monitor.h.

Referenced by refresh(), and run_monitoring_loop().

◆ cv_

std::condition_variable kcenon::monitoring::health_monitor::cv_
private

Definition at line 1005 of file health_monitor.h.

Referenced by run_monitoring_loop(), and stop().

◆ cv_mutex_

std::mutex kcenon::monitoring::health_monitor::cv_mutex_
private

Definition at line 1004 of file health_monitor.h.

Referenced by run_monitoring_loop().

◆ dependency_graph_

health_dependency_graph kcenon::monitoring::health_monitor::dependency_graph_
private

Definition at line 1000 of file health_monitor.h.

Referenced by add_dependency(), check(), and register_check().

◆ lifecycle_mutex_

std::mutex kcenon::monitoring::health_monitor::lifecycle_mutex_
private

Definition at line 1003 of file health_monitor.h.

Referenced by start(), and stop().

◆ monitor_thread_

std::thread kcenon::monitoring::health_monitor::monitor_thread_
private

Definition at line 1012 of file health_monitor.h.

Referenced by start(), and stop().

◆ mutex_

std::shared_mutex kcenon::monitoring::health_monitor::mutex_
mutableprivate

◆ recovery_handlers_

std::unordered_map<std::string, std::function<bool()> > kcenon::monitoring::health_monitor::recovery_handlers_
private

Definition at line 1008 of file health_monitor.h.

Referenced by refresh(), register_recovery_handler(), and unregister_check().

◆ running_

std::atomic<bool> kcenon::monitoring::health_monitor::running_ {false}
private

Definition at line 1011 of file health_monitor.h.

1011{false};

Referenced by is_running(), run_monitoring_loop(), start(), and stop().

◆ stats_

health_monitor_stats kcenon::monitoring::health_monitor::stats_
private

Definition at line 999 of file health_monitor.h.

Referenced by get_stats(), refresh(), and update_stats().


The documentation for this class was generated from the following file: