Comprehensive diagnostics API for thread pool monitoring. More...

#include <thread_pool_diagnostics.h>

Collaboration diagram for kcenon::thread::diagnostics::thread_pool_diagnostics:

Public Member Functions
	thread_pool_diagnostics (thread_pool &pool, const diagnostics_config &config={})
	Constructs diagnostics for a thread pool.

	~thread_pool_diagnostics ()
	Destructor.

	thread_pool_diagnostics (const thread_pool_diagnostics &)=delete

thread_pool_diagnostics &	operator= (const thread_pool_diagnostics &)=delete

	thread_pool_diagnostics (thread_pool_diagnostics &&)=delete

thread_pool_diagnostics &	operator= (thread_pool_diagnostics &&)=delete

auto	dump_thread_states () const -> std::vector< thread_info >
	Gets current state of all worker threads.

auto	format_thread_dump () const -> std::string
	Gets formatted thread dump (human-readable).

auto	get_active_jobs () const -> std::vector< job_info >
	Gets currently executing jobs.

auto	get_pending_jobs (std::size_t limit=100) const -> std::vector< job_info >
	Gets pending jobs in queue.

auto	get_recent_jobs (std::size_t limit=100) const -> std::vector< job_info >
	Gets recent completed/failed jobs.

void	record_job_completion (const job_info &info)
	Records a job completion for history tracking.

auto	detect_bottlenecks () const -> bottleneck_report
	Analyzes for bottlenecks.

auto	health_check () const -> health_status
	Performs comprehensive health check.

auto	is_healthy () const -> bool
	Quick check if pool is healthy.

void	enable_tracing (bool enable, std::size_t history_size=1000)
	Enables or disables job execution tracing.

auto	is_tracing_enabled () const -> bool
	Checks if tracing is enabled.

void	add_event_listener (std::shared_ptr< execution_event_listener > listener)
	Adds an event listener.

void	remove_event_listener (std::shared_ptr< execution_event_listener > listener)
	Removes an event listener.

void	record_event (const job_execution_event &event)
	Records a job execution event.

auto	get_recent_events (std::size_t limit=100) const -> std::vector< job_execution_event >
	Gets recent execution events.

auto	to_json () const -> std::string
	Exports diagnostics as JSON.

auto	to_string () const -> std::string
	Exports diagnostics as formatted string.

auto	to_prometheus () const -> std::string
	Exports diagnostics as Prometheus-compatible metrics.

auto	get_config () const -> diagnostics_config
	Gets the current configuration.

void	set_config (const diagnostics_config &config)
	Updates the configuration.

Private Member Functions
auto	get_worker_info (const thread_worker &worker, std::size_t index) const -> thread_info
	Gets thread info for a single worker.

void	notify_listeners (const job_execution_event &event)
	Notifies all event listeners.

void	generate_recommendations (bottleneck_report &report) const
	Generates recommendations for a bottleneck.

auto	check_worker_health () const -> component_health
	Checks worker component health.

auto	check_queue_health () const -> component_health
	Checks queue component health.

auto	check_metrics_health (double avg_latency_ms, double success_rate) const -> component_health
	Checks metrics component health.

Private Attributes
thread_pool &	pool_
	Reference to the monitored thread pool.

diagnostics_config	config_
	Configuration for diagnostics.

std::atomic< bool >	tracing_enabled_ {false}
	Whether event tracing is enabled.

std::mutex	events_mutex_
	Mutex for event history access.

std::deque< job_execution_event >	event_history_
	Ring buffer for event history.

std::mutex	jobs_mutex_
	Mutex for recent jobs access.

std::deque< job_info >	recent_jobs_
	Ring buffer for recent job completions.

std::mutex	listeners_mutex_
	Mutex for event listeners.

std::vector< std::shared_ptr< execution_event_listener > >	listeners_
	Event listeners.

std::atomic< std::uint64_t >	next_event_id_ {0}
	Counter for event IDs.

std::chrono::steady_clock::time_point	start_time_
	Time when the pool was started.

Detailed Description

Comprehensive diagnostics API for thread pool monitoring.

Provides thread dump capabilities, job tracing, bottleneck detection, and health check integration for thread pools.

Design Principles

Non-intrusive: Minimal overhead when not actively used
Thread-safe: All methods can be called from any thread
Read-only: Never modifies thread pool state
Snapshot-based: Returns point-in-time snapshots

Thread Safety

All public methods are thread-safe and can be called concurrently. Internal state is protected by appropriate synchronization.

Performance Considerations

Thread dump: O(n) where n is worker count
Job inspection: O(1) for active jobs, O(n) for history
Bottleneck detection: O(n) where n is worker count
Health check: O(n) including all component checks
Event tracing: < 1μs overhead per event when enabled

Usage Example

auto pool = std::make_shared<thread_pool>("MyPool");
pool->start();
 
// Get thread dump
std::cout << pool->diagnostics().format_thread_dump() << std::endl;
 
// Check for bottlenecks
auto report = pool->diagnostics().detect_bottlenecks();
if (report.has_bottleneck) {
    LOG_WARN("Bottleneck: {}", report.description);
}
 
// Health check for HTTP endpoint
auto health = pool->diagnostics().health_check();
return http_response(health.http_status_code(), health.to_json());

Definition at line 142 of file thread_pool_diagnostics.h.

Constructor & Destructor Documentation

◆ thread_pool_diagnostics() [1/3]

kcenon::thread::diagnostics::thread_pool_diagnostics::thread_pool_diagnostics	(	thread_pool &	pool,
		const diagnostics_config &	config = {} )

explicit

Constructs diagnostics for a thread pool.

Parameters

pool	Reference to the thread pool to diagnose.
config	Optional configuration for diagnostics.

Definition at line 21 of file thread_pool_diagnostics.cpp.

        : pool_(pool)
        , config_(config)
        , tracing_enabled_(config.enable_tracing)
        , start_time_(std::chrono::steady_clock::now())
    {
    }

◆ ~thread_pool_diagnostics()

kcenon::thread::diagnostics::thread_pool_diagnostics::~thread_pool_diagnostics ( )

default

Destructor.

◆ thread_pool_diagnostics() [2/3]

kcenon::thread::diagnostics::thread_pool_diagnostics::thread_pool_diagnostics ( const thread_pool_diagnostics & )

delete

◆ thread_pool_diagnostics() [3/3]

kcenon::thread::diagnostics::thread_pool_diagnostics::thread_pool_diagnostics ( thread_pool_diagnostics && )

delete

Member Function Documentation

◆ add_event_listener()

void kcenon::thread::diagnostics::thread_pool_diagnostics::add_event_listener ( std::shared_ptr< execution_event_listener > listener )

Adds an event listener.

Parameters

listener Listener to add.

Definition at line 620 of file thread_pool_diagnostics.cpp.

    {
        if (!listener) return;
 
        std::lock_guard<std::mutex> lock(listeners_mutex_);
        listeners_.push_back(std::move(listener));
    }

References listeners_, and listeners_mutex_.

◆ check_metrics_health()

auto kcenon::thread::diagnostics::thread_pool_diagnostics::check_metrics_health	(	double	avg_latency_ms,
		double	success_rate ) const -> component_health

nodiscardprivate

Checks metrics component health.

Parameters

avg_latency_ms	Current average latency.
success_rate	Current success rate.

Returns: Component health status for metrics.

Definition at line 546 of file thread_pool_diagnostics.cpp.

    {
        component_health health;
        health.name = "metrics";
 
        health.details["avg_latency_ms"] = std::format("{:.3f}", avg_latency_ms);
        health.details["success_rate"] = std::format("{:.4f}", success_rate);
 
        const auto& thresholds = config_.health_thresholds_config;
 
        // Check success rate first (more critical)
        if (success_rate < thresholds.unhealthy_success_rate)
        {
            health.state = health_state::unhealthy;
            health.message = "Success rate critically low: " +
                             std::format("{:.1f}%", success_rate * 100.0);
        }
        else if (success_rate < thresholds.min_success_rate)
        {
            health.state = health_state::degraded;
            health.message = "Success rate below threshold: " +
                             std::format("{:.1f}%", success_rate * 100.0);
        }
        // Check latency
        else if (avg_latency_ms > thresholds.degraded_latency_ms)
        {
            health.state = health_state::degraded;
            health.message = "High average latency: " +
                             std::format("{:.2f}ms", avg_latency_ms);
        }
        else if (avg_latency_ms > thresholds.max_healthy_latency_ms)
        {
            health.state = health_state::degraded;
            health.message = "Elevated latency: " +
                             std::format("{:.2f}ms", avg_latency_ms);
        }
        else
        {
            health.state = health_state::healthy;
            health.message = "Performance metrics within normal range";
        }
 
        return health;
    }

References kcenon::thread::diagnostics::degraded, kcenon::thread::diagnostics::component_health::details, kcenon::thread::diagnostics::healthy, kcenon::thread::diagnostics::component_health::message, kcenon::thread::diagnostics::component_health::name, kcenon::thread::diagnostics::component_health::state, and kcenon::thread::diagnostics::unhealthy.

Referenced by health_check().

Here is the caller graph for this function:

◆ check_queue_health()

auto kcenon::thread::diagnostics::thread_pool_diagnostics::check_queue_health ( ) const -> component_health

nodiscardprivate

Checks queue component health.

Returns: Component health status for queue.

Definition at line 491 of file thread_pool_diagnostics.cpp.

    {
        component_health health;
        health.name = "queue";
 
        auto depth = pool_.get_pending_task_count();
        health.details["depth"] = std::to_string(depth);
 
        // Get queue capacity and calculate saturation
        auto queue = pool_.get_job_queue();
        double saturation = 0.0;
        if (queue)
        {
            auto max_size = queue->get_max_size();
            if (max_size.has_value() && max_size.value() > 0)
            {
                health.details["capacity"] = std::to_string(max_size.value());
                saturation = static_cast<double>(depth) / static_cast<double>(max_size.value());
                health.details["saturation"] = std::format("{:.2f}", saturation);
            }
        }
 
        // Note: Job rejection tracking requires backpressure queue
        // For basic queue, assume no rejections
        std::uint64_t rejected = 0;
        health.details["rejected"] = std::to_string(rejected);
 
        const auto& thresholds = config_.health_thresholds_config;
 
        if (saturation >= thresholds.queue_saturation_critical)
        {
            health.state = health_state::unhealthy;
            health.message = "Queue at critical capacity";
        }
        else if (saturation >= thresholds.queue_saturation_warning || rejected > 0)
        {
            health.state = health_state::degraded;
            if (rejected > 0)
            {
                health.message = std::to_string(rejected) + " jobs rejected due to backpressure";
            }
            else
            {
                health.message = "Queue saturation above warning threshold";
            }
        }
        else
        {
            health.state = health_state::healthy;
            health.message = "Queue operational";
        }
 
        return health;
    }

Referenced by health_check().

Here is the call graph for this function:

Here is the caller graph for this function:

◆ check_worker_health()

auto kcenon::thread::diagnostics::thread_pool_diagnostics::check_worker_health ( ) const -> component_health

nodiscardprivate

Checks worker component health.

Returns: Component health status for workers.

Definition at line 450 of file thread_pool_diagnostics.cpp.

    {
        component_health health;
        health.name = "workers";
 
        std::size_t total;
        {
            std::scoped_lock<std::mutex> lock(pool_.workers_mutex_);
            total = pool_.workers_.size();
        }
        auto active = pool_.get_active_worker_count();
        auto idle = pool_.get_idle_worker_count();
 
        health.details["total"] = std::to_string(total);
        health.details["active"] = std::to_string(active);
        health.details["idle"] = std::to_string(idle);
 
        if (!pool_.is_running())
        {
            health.state = health_state::unhealthy;
            health.message = "Thread pool is not running";
        }
        else if (total == 0)
        {
            health.state = health_state::unhealthy;
            health.message = "No workers available";
        }
        else if (active == total)
        {
            health.state = health_state::degraded;
            health.message = "All workers are busy";
        }
        else
        {
            health.state = health_state::healthy;
            health.message = std::to_string(idle) + " workers available";
        }
 
        return health;
    }

Referenced by health_check().

Here is the call graph for this function:

Here is the caller graph for this function:

◆ detect_bottlenecks()

auto kcenon::thread::diagnostics::thread_pool_diagnostics::detect_bottlenecks ( ) const -> bottleneck_report

nodiscard

Analyzes for bottlenecks.

Returns: Bottleneck analysis report.

Definition at line 161 of file thread_pool_diagnostics.cpp.

    {
        bottleneck_report report;
 
        // Gather metrics
        auto metrics_snap = pool_.metrics().snapshot();
        std::size_t worker_count;
        {
            std::scoped_lock<std::mutex> lock(pool_.workers_mutex_);
            worker_count = pool_.workers_.size();
        }
        auto active_count = pool_.get_active_worker_count();
        auto idle_count = pool_.get_idle_worker_count();
        auto queue_depth = pool_.get_pending_task_count();
 
        report.queue_depth = queue_depth;
        report.idle_workers = idle_count;
        report.total_workers = worker_count;
 
        // Calculate queue saturation
        auto queue = pool_.get_job_queue();
        if (queue)
        {
            auto max_size = queue->get_max_size();
            if (max_size.has_value() && max_size.value() > 0)
            {
                report.queue_saturation = static_cast<double>(queue_depth) /
                                          static_cast<double>(max_size.value());
            }
            else if (queue_depth > 0)
            {
                // For unbounded queues, use heuristic: saturation based on queue depth vs workers
                // High queue depth relative to workers indicates potential saturation
                report.queue_saturation = std::min(1.0,
                    static_cast<double>(queue_depth) / static_cast<double>(worker_count * 10));
            }
        }
 
        // Calculate worker utilization (instantaneous)
        if (worker_count > 0)
        {
            report.worker_utilization = static_cast<double>(active_count) /
                                        static_cast<double>(worker_count);
        }
 
        // Get per-worker utilization for variance calculation
        auto thread_states = pool_.collect_worker_diagnostics();
        if (!thread_states.empty())
        {
            // Calculate mean utilization from worker stats
            double sum_utilization = 0.0;
            for (const auto& t : thread_states)
            {
                sum_utilization += t.utilization;
            }
            double mean_utilization = sum_utilization / static_cast<double>(thread_states.size());
 
            // Calculate variance
            double variance_sum = 0.0;
            for (const auto& t : thread_states)
            {
                double diff = t.utilization - mean_utilization;
                variance_sum += diff * diff;
            }
            report.utilization_variance = variance_sum / static_cast<double>(thread_states.size());
 
            // Use mean utilization from actual worker stats if available
            if (mean_utilization > 0.0)
            {
                report.worker_utilization = mean_utilization;
            }
        }
 
        // Calculate average wait time from metrics
        auto total_jobs = metrics_snap.tasks_executed + metrics_snap.tasks_failed;
        if (total_jobs > 0)
        {
            // Estimate wait time from idle time (approximation)
            auto avg_idle_ns = metrics_snap.total_idle_time_ns / total_jobs;
            report.avg_wait_time_ms = static_cast<double>(avg_idle_ns) / 1e6;
 
            // Calculate estimated backlog time
            // Average execution time per job
            double avg_exec_time_ms = 0.0;
            if (metrics_snap.total_busy_time_ns > 0 && total_jobs > 0)
            {
                avg_exec_time_ms = static_cast<double>(metrics_snap.total_busy_time_ns) /
                                   static_cast<double>(total_jobs) / 1e6;
            }
 
            // Estimated time to clear backlog = (queue_depth * avg_exec_time) / active_workers
            if (active_count > 0 && avg_exec_time_ms > 0)
            {
                report.estimated_backlog_time_ms = static_cast<std::size_t>(
                    (static_cast<double>(queue_depth) * avg_exec_time_ms) /
                    static_cast<double>(active_count));
            }
            else if (worker_count > 0 && avg_exec_time_ms > 0)
            {
                report.estimated_backlog_time_ms = static_cast<std::size_t>(
                    (static_cast<double>(queue_depth) * avg_exec_time_ms) /
                    static_cast<double>(worker_count));
            }
        }
 
        // Jobs rejected tracking not available in basic metrics
        report.jobs_rejected = 0;
 
        // Detect bottleneck type (ordered by severity)
        // 1. Queue full - most critical
        if (report.queue_saturation > 0.95 || report.jobs_rejected > 0)
        {
            report.has_bottleneck = true;
            report.type = bottleneck_type::queue_full;
            report.description = "Queue is at or near capacity, jobs are being rejected";
        }
        // 2. Worker starvation - high utilization with growing backlog
        else if (report.worker_utilization > 0.95 && queue_depth > worker_count * 2)
        {
            report.has_bottleneck = true;
            report.type = bottleneck_type::worker_starvation;
            report.description = "Not enough workers to handle the workload";
        }
        // 3. Slow consumer - high wait time with high utilization
        else if (report.avg_wait_time_ms > config_.wait_time_threshold_ms &&
                 report.worker_utilization > config_.utilization_high_threshold)
        {
            report.has_bottleneck = true;
            report.type = bottleneck_type::slow_consumer;
            report.description = "Workers cannot keep up with job submission rate";
        }
        // 4. Uneven distribution - high variance in worker utilization
        else if (report.utilization_variance > 0.1 && worker_count > 1)
        {
            // Variance > 0.1 means standard deviation > ~0.32 which is significant
            report.has_bottleneck = true;
            report.type = bottleneck_type::uneven_distribution;
            report.description = "Work is not evenly distributed across workers";
        }
        // 5. Lock contention - high wait time but low utilization (workers waiting on locks)
        else if (report.avg_wait_time_ms > config_.wait_time_threshold_ms * 2 &&
                 report.worker_utilization < 0.5 && active_count > 0)
        {
            report.has_bottleneck = true;
            report.type = bottleneck_type::lock_contention;
            report.description = "High wait times with low utilization suggests lock contention";
        }
        // 6. Memory pressure - check queue memory usage
        else if (queue)
        {
            auto mem_stats = queue->get_memory_stats();
            // Consider memory pressure if queue uses more than 100MB
            constexpr std::size_t memory_threshold = 100 * 1024 * 1024;
            if (mem_stats.queue_size_bytes > memory_threshold)
            {
                report.has_bottleneck = true;
                report.type = bottleneck_type::memory_pressure;
                report.description = "Excessive memory usage in job queue";
            }
        }
 
        // Generate recommendations if bottleneck detected
        if (report.has_bottleneck)
        {
            generate_recommendations(report);
        }
 
        return report;
    }

Referenced by to_json().

Here is the call graph for this function:

Here is the caller graph for this function:

◆ dump_thread_states()

auto kcenon::thread::diagnostics::thread_pool_diagnostics::dump_thread_states ( ) const -> std::vector<thread_info>

nodiscard

Gets current state of all worker threads.

Returns: Vector of thread information.

Thread-safe: Can be called from any thread.

Definition at line 36 of file thread_pool_diagnostics.cpp.

    {
        // Delegate to thread_pool's collect_worker_diagnostics for actual worker info
        return pool_.collect_worker_diagnostics();
    }

References kcenon::thread::thread_pool::collect_worker_diagnostics(), and pool_.

Referenced by format_thread_dump(), and get_active_jobs().

Here is the call graph for this function:

Here is the caller graph for this function:

◆ enable_tracing()

void kcenon::thread::diagnostics::thread_pool_diagnostics::enable_tracing	(	bool	enable,
		std::size_t	history_size = 1000 )

Enables or disables job execution tracing.

Parameters

enable	Enable or disable tracing.
history_size	Number of events to retain.

Definition at line 596 of file thread_pool_diagnostics.cpp.

    {
        tracing_enabled_.store(enable, std::memory_order_relaxed);
 
        if (enable)
        {
            std::lock_guard<std::mutex> lock(events_mutex_);
            // Clear and resize if needed
            while (event_history_.size() > history_size)
            {
                event_history_.pop_front();
            }
        }
 
        // Update config
        config_.event_history_size = history_size;
        config_.enable_tracing = enable;
    }

References config_, kcenon::thread::diagnostics::diagnostics_config::enable_tracing, event_history_, kcenon::thread::diagnostics::diagnostics_config::event_history_size, events_mutex_, and tracing_enabled_.

◆ format_thread_dump()

auto kcenon::thread::diagnostics::thread_pool_diagnostics::format_thread_dump ( ) const -> std::string

nodiscard

Gets formatted thread dump (human-readable).

Returns: Multi-line string with thread dump.

Output format:

=== Thread Pool Dump: MyPool ===
Time: 2025-01-08T10:30:00Z
Workers: 8, Active: 5, Idle: 3
 
Worker-0 [tid:12345] ACTIVE (2.5s)
  Current Job: ProcessOrder#1234 (running 150ms)
  Jobs: 1523 completed, 2 failed
  Utilization: 87.3%
...

Definition at line 42 of file thread_pool_diagnostics.cpp.

    {
        std::ostringstream oss;
 
        auto threads = dump_thread_states();
        auto now = std::chrono::system_clock::now();
        auto time_t = std::chrono::system_clock::to_time_t(now);
 
        std::size_t worker_count;
        {
            std::scoped_lock<std::mutex> lock(pool_.workers_mutex_);
            worker_count = pool_.workers_.size();
        }
        auto active_count = pool_.get_active_worker_count();
        auto idle_count = pool_.get_idle_worker_count();
 
        // Header
        oss << "=== Thread Pool Dump: " << pool_.to_string() << " ===\n";
        oss << "Time: " << std::put_time(std::gmtime(&time_t), "%Y-%m-%dT%H:%M:%SZ") << "\n";
        oss << "Workers: " << worker_count << ", Active: " << active_count
            << ", Idle: " << idle_count << "\n\n";
 
        // Worker details
        for (const auto& t : threads)
        {
            auto state_duration = t.state_duration();
            auto duration_sec = std::chrono::duration<double>(state_duration).count();
 
            oss << t.thread_name << " [tid:" << t.thread_id << "] "
                << worker_state_to_string(t.state)
                << " (" << std::fixed << std::setprecision(1) << duration_sec << "s)\n";
 
            if (t.current_job.has_value())
            {
                const auto& job = t.current_job.value();
                auto exec_time_ms = std::chrono::duration<double, std::milli>(
                    job.execution_time).count();
                oss << "  Current Job: " << job.job_name << "#" << job.job_id
                    << " (running " << std::fixed << std::setprecision(0)
                    << exec_time_ms << "ms)\n";
            }
 
            oss << "  Jobs: " << t.jobs_completed << " completed, "
                << t.jobs_failed << " failed\n";
            oss << "  Utilization: " << std::fixed << std::setprecision(1)
                << (t.utilization * 100.0) << "%\n\n";
        }
 
        return oss.str();
    }

References dump_thread_states(), kcenon::thread::thread_pool::get_active_worker_count(), kcenon::thread::thread_pool::get_idle_worker_count(), pool_, kcenon::thread::thread_pool::to_string(), kcenon::thread::diagnostics::worker_state_to_string(), kcenon::thread::thread_pool::workers_, and kcenon::thread::thread_pool::workers_mutex_.

Referenced by to_string().

Here is the call graph for this function:

Here is the caller graph for this function:

◆ generate_recommendations()

void kcenon::thread::diagnostics::thread_pool_diagnostics::generate_recommendations ( bottleneck_report & report ) const

private

Generates recommendations for a bottleneck.

Parameters

report The bottleneck report to add recommendations to.

Definition at line 331 of file thread_pool_diagnostics.cpp.

    {
        switch (report.type)
        {
            case bottleneck_type::queue_full:
                report.recommendations.push_back("Consider increasing queue capacity");
                report.recommendations.push_back("Enable backpressure with adaptive policy");
                report.recommendations.push_back("Add more worker threads if CPU permits");
                break;
 
            case bottleneck_type::slow_consumer:
                report.recommendations.push_back("Add more worker threads");
                report.recommendations.push_back("Optimize job execution time");
                report.recommendations.push_back("Consider job batching for small tasks");
                break;
 
            case bottleneck_type::worker_starvation:
                report.recommendations.push_back("Increase worker thread count");
                report.recommendations.push_back("Consider scaling based on hardware cores");
                report.recommendations.push_back("Enable autoscaling for dynamic adjustment");
                break;
 
            case bottleneck_type::uneven_distribution:
                report.recommendations.push_back("Enable work stealing if not already");
                report.recommendations.push_back("Review job distribution patterns");
                report.recommendations.push_back("Consider using priority-based scheduling");
                break;
 
            case bottleneck_type::lock_contention:
                report.recommendations.push_back("Review shared resource access patterns");
                report.recommendations.push_back("Consider using lock-free data structures");
                report.recommendations.push_back("Reduce critical section scope");
                report.recommendations.push_back("Use finer-grained locking strategies");
                break;
 
            case bottleneck_type::memory_pressure:
                report.recommendations.push_back("Reduce queue capacity or enable backpressure");
                report.recommendations.push_back("Optimize job object size");
                report.recommendations.push_back("Add more workers to process jobs faster");
                report.recommendations.push_back("Consider job prioritization to clear backlog");
                break;
 
            case bottleneck_type::none:
            default:
                break;
        }
    }

References kcenon::thread::diagnostics::lock_contention, kcenon::thread::diagnostics::memory_pressure, kcenon::thread::diagnostics::none, kcenon::thread::diagnostics::queue_full, kcenon::thread::diagnostics::bottleneck_report::recommendations, kcenon::thread::diagnostics::slow_consumer, kcenon::thread::diagnostics::bottleneck_report::type, kcenon::thread::diagnostics::uneven_distribution, and kcenon::thread::diagnostics::worker_starvation.

Referenced by detect_bottlenecks().

Here is the caller graph for this function:

◆ get_active_jobs()

auto kcenon::thread::diagnostics::thread_pool_diagnostics::get_active_jobs ( ) const -> std::vector<job_info>

nodiscard

Gets currently executing jobs.

Returns: Vector of active job information.

Definition at line 97 of file thread_pool_diagnostics.cpp.

    {
        std::vector<job_info> result;
 
        // Get thread states which include current job info
        auto threads = dump_thread_states();
 
        for (const auto& thread : threads)
        {
            if (thread.current_job.has_value())
            {
                result.push_back(thread.current_job.value());
            }
        }
 
        return result;
    }

References dump_thread_states().

Here is the call graph for this function:

◆ get_config()

auto kcenon::thread::diagnostics::thread_pool_diagnostics::get_config ( ) const -> diagnostics_config

nodiscard

Gets the current configuration.

Returns: Current diagnostics configuration.

Definition at line 758 of file thread_pool_diagnostics.cpp.

    {
        return config_;
    }

References config_.

◆ get_pending_jobs()

auto kcenon::thread::diagnostics::thread_pool_diagnostics::get_pending_jobs ( std::size_t limit = 100 ) const -> std::vector<job_info>

nodiscard

Gets pending jobs in queue.

Parameters

limit Maximum number to return (0 = all).

Returns: Vector of pending job information.

Definition at line 115 of file thread_pool_diagnostics.cpp.

    {
        // Delegate to job_queue's inspect_pending_jobs
        auto queue = pool_.get_job_queue();
        if (!queue)
        {
            return {};
        }
 
        return queue->inspect_pending_jobs(limit);
    }

◆ get_recent_events()

auto kcenon::thread::diagnostics::thread_pool_diagnostics::get_recent_events ( std::size_t limit = 100 ) const -> std::vector<job_execution_event>

nodiscard

Gets recent execution events.

Parameters

limit Maximum events to return.

Returns: Vector of recent events.

Definition at line 680 of file thread_pool_diagnostics.cpp.

    {
        std::lock_guard<std::mutex> lock(events_mutex_);
 
        std::vector<job_execution_event> result;
        auto count = std::min(limit, event_history_.size());
        result.reserve(count);
 
        auto it = event_history_.rbegin();
        for (std::size_t i = 0; i < count && it != event_history_.rend(); ++i, ++it)
        {
            result.push_back(*it);
        }
 
        return result;
    }

◆ get_recent_jobs()

auto kcenon::thread::diagnostics::thread_pool_diagnostics::get_recent_jobs ( std::size_t limit = 100 ) const -> std::vector<job_info>

nodiscard

Gets recent completed/failed jobs.

Parameters

limit Maximum number to return.

Returns: Vector of recent job information.

Definition at line 128 of file thread_pool_diagnostics.cpp.

    {
        std::lock_guard<std::mutex> lock(jobs_mutex_);
 
        std::vector<job_info> result;
        auto count = std::min(limit, recent_jobs_.size());
        result.reserve(count);
 
        auto it = recent_jobs_.rbegin();
        for (std::size_t i = 0; i < count && it != recent_jobs_.rend(); ++i, ++it)
        {
            result.push_back(*it);
        }
 
        return result;
    }

◆ get_worker_info()

auto kcenon::thread::diagnostics::thread_pool_diagnostics::get_worker_info	(	const thread_worker &	worker,
		std::size_t	index ) const -> thread_info

nodiscardprivate

Gets thread info for a single worker.

Parameters

worker	The worker to query.
index	Worker index in the pool.

Returns: Thread information.

Definition at line 769 of file thread_pool_diagnostics.cpp.

    {
        thread_info info;
        info.worker_id = worker.get_worker_id();
        info.thread_name = "Worker-" + std::to_string(index);
        info.state = worker.is_idle() ? worker_state::idle : worker_state::active;
        info.state_since = std::chrono::steady_clock::now();
        return info;
    }

References kcenon::thread::diagnostics::active, kcenon::thread::diagnostics::idle, kcenon::thread::info, and kcenon::thread::diagnostics::thread_info::worker_id.

◆ health_check()

auto kcenon::thread::diagnostics::thread_pool_diagnostics::health_check ( ) const -> health_status

nodiscard

Performs comprehensive health check.

Returns: Health status with all component states.

Definition at line 383 of file thread_pool_diagnostics.cpp.

    {
        health_status status;
        status.check_time = std::chrono::steady_clock::now();
 
        // Calculate uptime
        auto uptime = status.check_time - start_time_;
        status.uptime_seconds = std::chrono::duration<double>(uptime).count();
 
        // Get metrics
        auto metrics_snap = pool_.metrics().snapshot();
        status.total_jobs_processed = metrics_snap.tasks_executed +
                                      metrics_snap.tasks_failed;
 
        if (status.total_jobs_processed > 0)
        {
            status.success_rate = static_cast<double>(metrics_snap.tasks_executed) /
                                  static_cast<double>(status.total_jobs_processed);
 
            // Calculate average latency (total execution time / total jobs)
            // busy_time represents total execution time across all workers
            double total_exec_time_ms = static_cast<double>(metrics_snap.total_busy_time_ns) / 1e6;
            status.avg_latency_ms = total_exec_time_ms /
                                    static_cast<double>(status.total_jobs_processed);
        }
 
        // Worker stats
        {
            std::scoped_lock<std::mutex> lock(pool_.workers_mutex_);
            status.total_workers = pool_.workers_.size();
        }
        status.active_workers = pool_.get_active_worker_count();
        status.queue_depth = pool_.get_pending_task_count();
 
        // Get queue capacity
        auto queue = pool_.get_job_queue();
        if (queue)
        {
            auto max_size = queue->get_max_size();
            if (max_size.has_value())
            {
                status.queue_capacity = max_size.value();
            }
        }
 
        // Check components
        status.components.push_back(check_worker_health());
        status.components.push_back(check_queue_health());
        status.components.push_back(check_metrics_health(status.avg_latency_ms,
                                                          status.success_rate));
 
        // Calculate overall status
        status.calculate_overall_status();
 
        return status;
    }

Referenced by to_json(), and to_prometheus().

Here is the call graph for this function:

Here is the caller graph for this function:

◆ is_healthy()

auto kcenon::thread::diagnostics::thread_pool_diagnostics::is_healthy ( ) const -> bool

nodiscard

Quick check if pool is healthy.

Returns: true if pool is operational.

Definition at line 440 of file thread_pool_diagnostics.cpp.

    {
        std::size_t worker_count;
        {
            std::scoped_lock<std::mutex> lock(pool_.workers_mutex_);
            worker_count = pool_.workers_.size();
        }
        return pool_.is_running() && worker_count > 0;
    }

References kcenon::thread::thread_pool::is_running(), pool_, kcenon::thread::thread_pool::workers_, and kcenon::thread::thread_pool::workers_mutex_.

Here is the call graph for this function:

◆ is_tracing_enabled()

auto kcenon::thread::diagnostics::thread_pool_diagnostics::is_tracing_enabled ( ) const -> bool

nodiscard

Checks if tracing is enabled.

Returns: true if tracing is enabled.

Definition at line 615 of file thread_pool_diagnostics.cpp.

    {
        return tracing_enabled_.load(std::memory_order_relaxed);
    }

References tracing_enabled_.

◆ notify_listeners()

void kcenon::thread::diagnostics::thread_pool_diagnostics::notify_listeners ( const job_execution_event & event )

private

Notifies all event listeners.

Parameters

event The event to broadcast.

Definition at line 663 of file thread_pool_diagnostics.cpp.

    {
        std::vector<std::shared_ptr<execution_event_listener>> listeners_copy;
        {
            std::lock_guard<std::mutex> lock(listeners_mutex_);
            listeners_copy = listeners_;
        }
 
        for (const auto& listener : listeners_copy)
        {
            if (listener)
            {
                listener->on_event(event);
            }
        }
    }

References listeners_, and listeners_mutex_.

Referenced by record_event().

Here is the caller graph for this function:

◆ operator=() [1/2]

thread_pool_diagnostics & kcenon::thread::diagnostics::thread_pool_diagnostics::operator= ( const thread_pool_diagnostics & )

delete

◆ operator=() [2/2]

thread_pool_diagnostics & kcenon::thread::diagnostics::thread_pool_diagnostics::operator= ( thread_pool_diagnostics && )

delete

◆ record_event()

void kcenon::thread::diagnostics::thread_pool_diagnostics::record_event ( const job_execution_event & event )

Records a job execution event.

Parameters

event The event to record.

Called internally by the thread pool on job lifecycle events.

Definition at line 642 of file thread_pool_diagnostics.cpp.

    {
        if (!tracing_enabled_.load(std::memory_order_relaxed))
        {
            return;
        }
 
        // Store in history
        {
            std::lock_guard<std::mutex> lock(events_mutex_);
            event_history_.push_back(event);
            while (event_history_.size() > config_.event_history_size)
            {
                event_history_.pop_front();
            }
        }
 
        // Notify listeners
        notify_listeners(event);
    }

References config_, event_history_, kcenon::thread::diagnostics::diagnostics_config::event_history_size, events_mutex_, notify_listeners(), and tracing_enabled_.

Here is the call graph for this function:

◆ record_job_completion()

void kcenon::thread::diagnostics::thread_pool_diagnostics::record_job_completion ( const job_info & info )

Records a job completion for history tracking.

Parameters

info	The job information to record.

Called internally by the thread pool when jobs complete.

Definition at line 146 of file thread_pool_diagnostics.cpp.

    {
        std::lock_guard<std::mutex> lock(jobs_mutex_);
 
        recent_jobs_.push_back(info);
        while (recent_jobs_.size() > config_.recent_jobs_capacity)
        {
            recent_jobs_.pop_front();
        }
    }

References config_, kcenon::thread::info, jobs_mutex_, recent_jobs_, and kcenon::thread::diagnostics::diagnostics_config::recent_jobs_capacity.

◆ remove_event_listener()

void kcenon::thread::diagnostics::thread_pool_diagnostics::remove_event_listener ( std::shared_ptr< execution_event_listener > listener )

Removes an event listener.

Parameters

listener Listener to remove.

Definition at line 629 of file thread_pool_diagnostics.cpp.

    {
        if (!listener) return;
 
        std::lock_guard<std::mutex> lock(listeners_mutex_);
        auto it = std::find(listeners_.begin(), listeners_.end(), listener);
        if (it != listeners_.end())
        {
            listeners_.erase(it);
        }
    }

References listeners_, and listeners_mutex_.

◆ set_config()

void kcenon::thread::diagnostics::thread_pool_diagnostics::set_config ( const diagnostics_config & config )

Updates the configuration.

Parameters

config New configuration to apply.

Definition at line 763 of file thread_pool_diagnostics.cpp.

    {
        config_ = config;
        tracing_enabled_.store(config.enable_tracing, std::memory_order_relaxed);
    }

References config_, kcenon::thread::diagnostics::diagnostics_config::enable_tracing, and tracing_enabled_.

◆ to_json()

auto kcenon::thread::diagnostics::thread_pool_diagnostics::to_json ( ) const -> std::string

nodiscard

Exports diagnostics as JSON.

Returns: JSON string with all diagnostic data.

Definition at line 702 of file thread_pool_diagnostics.cpp.

    {
        std::ostringstream oss;
        oss << "{\n";
 
        // Health status
        auto health = health_check();
        oss << "  \"health\": {\n";
        oss << "    \"status\": \"" << health_state_to_string(health.overall_status) << "\",\n";
        oss << "    \"message\": \"" << health.status_message << "\",\n";
        oss << "    \"uptime_seconds\": " << std::fixed << std::setprecision(2)
            << health.uptime_seconds << ",\n";
        oss << "    \"total_jobs_processed\": " << health.total_jobs_processed << ",\n";
        oss << "    \"success_rate\": " << std::fixed << std::setprecision(4)
            << health.success_rate << "\n";
        oss << "  },\n";
 
        // Workers
        oss << "  \"workers\": {\n";
        oss << "    \"total\": " << health.total_workers << ",\n";
        oss << "    \"active\": " << health.active_workers << ",\n";
        oss << "    \"idle\": " << (health.total_workers - health.active_workers) << "\n";
        oss << "  },\n";
 
        // Queue
        oss << "  \"queue\": {\n";
        oss << "    \"depth\": " << health.queue_depth << "\n";
        oss << "  },\n";
 
        // Bottleneck
        auto bottleneck = detect_bottlenecks();
        oss << "  \"bottleneck\": {\n";
        oss << "    \"detected\": " << (bottleneck.has_bottleneck ? "true" : "false") << ",\n";
        oss << "    \"type\": \"" << bottleneck_type_to_string(bottleneck.type) << "\",\n";
        oss << "    \"severity\": \"" << bottleneck.severity_string() << "\"\n";
        oss << "  }\n";
 
        oss << "}";
        return oss.str();
    }

References kcenon::thread::diagnostics::bottleneck_type_to_string(), detect_bottlenecks(), health_check(), and kcenon::thread::diagnostics::health_state_to_string().

Here is the call graph for this function:

◆ to_prometheus()

auto kcenon::thread::diagnostics::thread_pool_diagnostics::to_prometheus ( ) const -> std::string

nodiscard

Exports diagnostics as Prometheus-compatible metrics.

Returns: Prometheus exposition format string.

Produces metrics suitable for scraping by Prometheus or compatible monitoring systems. Includes health status, worker metrics, queue metrics, and job statistics.

Definition at line 748 of file thread_pool_diagnostics.cpp.

    {
        auto health = health_check();
        return health.to_prometheus(pool_.to_string());
    }

References health_check(), pool_, and kcenon::thread::thread_pool::to_string().

Here is the call graph for this function:

◆ to_string()

auto kcenon::thread::diagnostics::thread_pool_diagnostics::to_string ( ) const -> std::string

nodiscard

Exports diagnostics as formatted string.

Returns: Human-readable string.

Definition at line 743 of file thread_pool_diagnostics.cpp.

    {
        return format_thread_dump();
    }

References format_thread_dump().

Here is the call graph for this function:

Member Data Documentation

◆ config_

diagnostics_config kcenon::thread::diagnostics::thread_pool_diagnostics::config_

private

Configuration for diagnostics.

Definition at line 351 of file thread_pool_diagnostics.h.

Referenced by check_queue_health(), detect_bottlenecks(), enable_tracing(), get_config(), record_event(), record_job_completion(), and set_config().

◆ event_history_

std::deque<job_execution_event> kcenon::thread::diagnostics::thread_pool_diagnostics::event_history_

private

Ring buffer for event history.

Definition at line 366 of file thread_pool_diagnostics.h.

Referenced by enable_tracing(), and record_event().

◆ events_mutex_

std::mutex kcenon::thread::diagnostics::thread_pool_diagnostics::events_mutex_

mutableprivate

Mutex for event history access.

Definition at line 361 of file thread_pool_diagnostics.h.

Referenced by enable_tracing(), and record_event().

◆ jobs_mutex_

std::mutex kcenon::thread::diagnostics::thread_pool_diagnostics::jobs_mutex_

mutableprivate

Mutex for recent jobs access.

Definition at line 371 of file thread_pool_diagnostics.h.

Referenced by record_job_completion().

◆ listeners_

std::vector<std::shared_ptr<execution_event_listener> > kcenon::thread::diagnostics::thread_pool_diagnostics::listeners_

private

Event listeners.

Definition at line 386 of file thread_pool_diagnostics.h.

Referenced by add_event_listener(), notify_listeners(), and remove_event_listener().

◆ listeners_mutex_

std::mutex kcenon::thread::diagnostics::thread_pool_diagnostics::listeners_mutex_

mutableprivate

Mutex for event listeners.

Definition at line 381 of file thread_pool_diagnostics.h.

Referenced by add_event_listener(), notify_listeners(), and remove_event_listener().

◆ next_event_id_

std::atomic<std::uint64_t> kcenon::thread::diagnostics::thread_pool_diagnostics::next_event_id_ {0}

private

Counter for event IDs.

Definition at line 391 of file thread_pool_diagnostics.h.

391{0};

◆ pool_

thread_pool& kcenon::thread::diagnostics::thread_pool_diagnostics::pool_

private

Reference to the monitored thread pool.

Definition at line 346 of file thread_pool_diagnostics.h.

Referenced by check_queue_health(), check_worker_health(), detect_bottlenecks(), dump_thread_states(), format_thread_dump(), health_check(), is_healthy(), and to_prometheus().

◆ recent_jobs_

std::deque<job_info> kcenon::thread::diagnostics::thread_pool_diagnostics::recent_jobs_

private

Ring buffer for recent job completions.

Definition at line 376 of file thread_pool_diagnostics.h.

Referenced by record_job_completion().

◆ start_time_

std::chrono::steady_clock::time_point kcenon::thread::diagnostics::thread_pool_diagnostics::start_time_

private

Time when the pool was started.

Definition at line 396 of file thread_pool_diagnostics.h.

Referenced by health_check().

◆ tracing_enabled_

std::atomic<bool> kcenon::thread::diagnostics::thread_pool_diagnostics::tracing_enabled_ {false}

private

Whether event tracing is enabled.

Definition at line 356 of file thread_pool_diagnostics.h.

356{false};

Referenced by enable_tracing(), is_tracing_enabled(), record_event(), and set_config().

The documentation for this class was generated from the following files:

include/kcenon/thread/diagnostics/thread_pool_diagnostics.h
src/diagnostics/thread_pool_diagnostics.cpp

Public Member Functions

Private Member Functions

Private Attributes

Detailed Description

Design Principles

Thread Safety

Performance Considerations

Usage Example

Constructor & Destructor Documentation

◆ thread_pool_diagnostics() [1/3]

◆ ~thread_pool_diagnostics()

◆ thread_pool_diagnostics() [2/3]

◆ thread_pool_diagnostics() [3/3]

Member Function Documentation

◆ add_event_listener()

◆ check_metrics_health()

◆ check_queue_health()

◆ check_worker_health()

◆ detect_bottlenecks()

◆ dump_thread_states()

◆ enable_tracing()

◆ format_thread_dump()

◆ generate_recommendations()

◆ get_active_jobs()

◆ get_config()

◆ get_pending_jobs()

◆ get_recent_events()

◆ get_recent_jobs()

◆ get_worker_info()

◆ health_check()

◆ is_healthy()

◆ is_tracing_enabled()

◆ notify_listeners()

◆ operator=() [1/2]

◆ operator=() [2/2]

◆ record_event()

◆ record_job_completion()

◆ remove_event_listener()

◆ set_config()

◆ to_json()

◆ to_prometheus()

◆ to_string()

Member Data Documentation

◆ config_

◆ event_history_

◆ events_mutex_

◆ jobs_mutex_

◆ listeners_

◆ listeners_mutex_

◆ next_event_id_

◆ pool_

◆ recent_jobs_

◆ start_time_

◆ tracing_enabled_