1. Deadlock detection

Symptoms**: The pool stops making progress, queue depth grows, futures never complete.

Common causes**:

A job calls future.get() on another job submitted to the same pool, but every worker is blocked waiting on a future, so nothing can run the awaited job.
Two jobs acquire the same locks in different orders.
A callback blocks on a mutex held elsewhere in user code.

Diagnostic steps**:

Use kcenon::thread::diagnostics::dump_pool_state() (or your debugger) to list worker stack traces. If every worker is parked in std::future::wait, you have an in-pool dependency cycle.
Run under gdb / lldb and inspect each worker's backtrace. Look for matching mutex addresses across two threads.
Enable thread sanitizer (cmake –preset tsan) and re-run the failing test. TSan detects most lock-order inversions automatically.

Fix**: Use the DAG scheduler for in-pool dependencies. Adopt std::scoped_lock to lock multiple mutexes atomically. Never hold a mutex across submit_task.

2. Memory leak with futures

Symptoms**: RSS grows steadily under load even though jobs complete.

Common causes**:

The pool returns a future from submit_task and the caller never reads or drops it. The shared state stays alive until the future is destroyed.
Lambdas capture large objects by value; the lambda lives until the job finishes, pinning the captures.
The hazard pointer retire list never reaches the reclamation threshold because one thread rarely runs.

Diagnostic steps**:

Run under valgrind –tool=memcheck or AddressSanitizer (cmake –preset asan).
Use heaptrack or jemalloc statistics to find the call site that allocates the leaked memory.
Check job lifetimes — long-running jobs delay cleanup of everything they capture.

Fix**: Drop futures you do not need (or call .wait() and let them go). Capture large state by std::shared_ptr or move it. For hazard pointer buildup, occasionally call hazard_domain::scan_now() from a maintenance thread.

Symptoms**: The code runs on Linux but hangs on Windows, or works on x86_64 but crashes on AArch64.

Common causes**:

Relying on a particular memory ordering that is enforced on x86 but not on weakly ordered architectures.
Using thread-local storage that is destroyed in a different order on different platforms during shutdown.
Differences in std::thread::hardware_concurrency() reporting (cgroups on Linux containers, hybrid cores on Windows).

Diagnostic steps**:

Reproduce on the failing platform under TSan if available.
Audit any custom std::atomic usage; default to std::memory_order_seq_cst until proven slow.
On Linux containers, check /sys/fs/cgroup limits — they affect what hardware_concurrency() returns.

Fix**: Use std::memory_order_seq_cst by default, then relax with care after profiling. Pin thread-local cleanup order using explicit thread_local destructors. Configure the pool's worker count from a runtime setting instead of trusting platform defaults.

Symptoms**: Throughput is lower than expected, tail latency spikes, CPU utilization is high without forward progress.

Common causes**:

Workers spend most time in the kernel waking up from condition variables.
A single hot lock inside a callback bottlenecks every worker.
The job queue is mutex-backed under high contention; switching to lock-free helps.
False sharing between counters or job control blocks placed on the same cache line.

Diagnostic steps**:

Run the benchmarks/thread_pool_benchmark and compare against the shipped baseline numbers.
Use perf record -F 999 -g to find the hottest functions.
Check thread_pool_diagnostics::queue_depth and worker idle counters — if workers are idle while the queue is non-empty, there is a wakeup or stealing problem.
Run the autoscaler in observation-only mode and compare its recommendation to your static configuration.

Fix**: Switch to adaptive_job_queue, enable work stealing for skewed loads, split CPU and I/O work into separate pools, and align hot counters to cache lines (alignas(64)).

Symptoms**: The program reaches pool->stop() but never returns.

Common causes**:

Pending jobs that wait on a cancellation token nobody cancelled.
Futures still held by the caller; the pool destructor blocks until the shared state is released.
A worker thread holds a hazard pointer to a node and never reaches the reclamation point.
Nested pools — pool A's worker waits on a future from pool B, which is already shutting down.

Diagnostic steps**:

Attach a debugger and dump every thread's backtrace. Workers parked in condition_variable::wait inside stop() indicate jobs that never completed.
Confirm cancellation tokens are actually cancelled before stop().
Look for futures captured by other long-lived objects.

Fix**: Cancel any cancellation tokens before stopping the pool. Use stop(std::chrono::seconds(N)) to bound the wait. Shut down dependent pools in reverse dependency order — outermost first.

If these steps do not isolate the issue, open a GitHub issue with:

See also Frequently Asked Questions for quick answers and Tutorial: Thread Pool for usage patterns that prevent these issues in the first place.