|
Thread System 0.3.1
High-performance C++20 thread pool with work stealing and DAG scheduling
|
This FAQ collects the most common questions from developers integrating Thread System into their projects. For longer walkthroughs, see the Tutorial: Thread Pool, Tutorial: DAG Scheduling, and Tutorial: Lock-Free Queue Patterns pages.
Start with std::thread::hardware_concurrency() for CPU-bound work. For I/O-bound or mixed workloads, oversubscribe by a factor proportional to the average wait time over compute time — a 2x to 4x multiplier is a reasonable starting point. Avoid setting a fixed worker count without measuring; the optimal number depends on the workload, the host, and other processes contending for the same cores. The autoscaler can adjust the count over time based on queue depth and observed latency.
Use a kcenon::thread::cancellation_token. Pass the token to long-running jobs and check is_cancellation_requested() between work units. Cancellation is cooperative — the runtime cannot interrupt arbitrary code, so jobs must poll periodically. Tokens form a hierarchy: cancelling a parent cancels all linked children, which is useful for shutting down a request and all of its background fan-out.
std::async on most implementations either spawns a new thread per call or uses an opaque process-wide pool with no scheduling controls. A dedicated thread pool gives you:
typed_thread_pool.Use std::async only for one-off background work in small programs. Anything that runs in production with predictable load should use a thread pool.
Thread System exposes its diagnostic counters through the diagnostics and metrics modules. monitoring_system depends on Thread System and consumes these counters directly — you do not need to wire anything by hand. If you want to publish custom metrics, register a callback with thread_pool_diagnostics::observe(). Counters include queue depth, worker utilization, completed jobs, and per-priority latency histograms.
A few rules prevent the most common cases:
future.get() from inside a job submitted to the same pool if the awaited job depends on a worker from that pool — this can starve the pool. Use the DAG scheduler for in-pool dependencies instead.std::scoped_lock to lock multiple mutexes atomically.submit_task — release first, then submit.adaptive_job_queue (the default) before forcing a specific queue.typed_thread_pool to give latency-sensitive work a dedicated worker.perf or vtune to spot accidentally shared state inside callbacks.Thread System works on Linux, macOS, and Windows. Notable differences:
std::format support.std::thread, but the thread pool itself does not depend on GCD.Jobs are heap-allocated by the queue (typically as std::unique_ptr nodes). Lock-free queues defer node deletion until hazard pointer scanning confirms no thread is reading them. As a user you do not need to manage queue node lifetimes; just pass a callable into submit_task or build a typed job and the framework owns it from there. Avoid capturing very large objects by value in the lambda — capture by std::shared_ptr or move into the job.
Jobs return kcenon::thread::result_void or result<T>, which wrap the common_system Result type. Failures surface through the future returned by submit_task — calling .get() rethrows nothing; instead inspect the result and call get_error() (note: not error()) to read the failure detail. The DAG scheduler aggregates failures across nodes and exposes them through the run result.
minimal_thread_pool example as a lightweight fixture: it has predictable startup and shutdown costs.submit_task; never sleep-and-hope.cmake –preset tsan) and inside Valgrind on Linux (valgrind –tool=helgrind).