This tutorial walks through the practical use of Thread System's two thread pool flavors: the basic thread_pool and the priority-aware typed_thread_pool. It covers when to choose each, how to configure work-stealing, and how to size the pool appropriately for the workload.
Introduction
A thread pool amortizes the cost of thread creation by reusing a fixed (or adaptive) set of worker threads. Submitting a task to the pool is significantly cheaper than spawning a new thread per task, and the pool can apply scheduling strategies that improve throughput, fairness, and cache locality.
Thread System provides two pool flavors:
kcenon::thread::thread_pool — FIFO scheduling with an adaptive job queue.
kcenon::thread::typed_thread_pool_t<JobType> — Priority/type-aware scheduling with per-type workers and aging to prevent starvation.
When to Use thread_pool
Reach for the basic thread_pool when:
- All tasks are roughly equal in importance and latency requirements.
- You need a simple submit-and-forget API with
std::future return values.
- The workload does not benefit from priority differentiation.
- Throughput is the primary metric, not tail latency for hot work classes.
The basic pool is the right default for most applications.
When to Use typed_thread_pool
Use typed_thread_pool_t when:
- Tasks have distinct priority classes (e.g., interactive vs. background).
- Some workers should be dedicated to a single priority to bound latency.
- You want priority aging to prevent starvation of low-priority work.
- You need explicit job type routing for instrumentation or rate limiting.
The cost is a slightly more complex API and per-type queues, which trade throughput for predictable latency.
Work-Stealing Configuration
Work stealing lets idle workers pull jobs from busy peers. It is opt-in because the implementation introduces additional synchronization and is most useful for highly imbalanced workloads.
Build with THREAD_ENABLE_WORK_STEALING=ON, then configure the pool:
#include <kcenon/thread/stealing/work_stealing_pool.h>
auto pool = kcenon::thread::stealing::work_stealing_pool::builder{}
.worker_count(std::thread::hardware_concurrency())
.numa_aware(true)
.steal_attempts(4)
.build();
pool->start();
Guidelines:
- Enable NUMA awareness only on multi-socket Linux hosts; the topology probe is cheap but unnecessary on consumer hardware.
- Keep
steal_attempts in the 2 to 8 range; higher values waste cycles, lower values miss steal opportunities.
- Profile both modes (mutex queue vs. work stealing) — work stealing is not always faster for short, evenly distributed jobs.
Sizing the Pool
The optimal worker count depends on whether the workload is CPU-bound or I/O-bound.
- CPU-bound: Use
std::thread::hardware_concurrency(). Adding more workers than physical (or logical) cores rarely helps and increases context switching.
- I/O-bound: A larger pool is fine. A common starting point is
hardware_concurrency() * (1 + average_wait_time / average_compute_time).
- Mixed: Split the workload across two pools — one CPU-sized for compute tasks and one larger pool for blocking I/O — to keep the compute pool free of blocked workers.
When in doubt, measure with your actual workload. The autoscaler can adjust the worker count over time based on queue depth and latency.
Example 1: Basic Thread Pool
#include <iostream>
auto pool = kcenon::thread::thread_pool::create();
pool->start();
std::cout << "Hello from a worker thread\n";
return {};
});
fut.wait();
pool->stop();
return 0;
}
Stable public include for thread_pool and numa_thread_pool.
Example 2: Typed Thread Pool with Priorities
auto pool = std::make_shared<
pool->add_worker(job_types::High);
pool->add_worker({job_types::Normal, job_types::Background});
pool->enqueue(
return {};
}, job_types::High));
pool->enqueue(
return {};
}, job_types::Background));
pool->stop();
return 0;
}
Callback-based typed job template.
Typed thread pool template.
auto start(void) -> common::VoidResult
Starts the thread pool by creating worker threads and initializing internal structures.
job_types
Defines different types of jobs for a typed thread pool.
Stable public include for typed_thread_pool_t.
Example 3: Sizing for an I/O-Bound Workload
#include <thread>
const std::size_t cores = std::thread::hardware_concurrency();
const std::size_t workers = std::max<std::size_t>(cores * 4, 8);
auto pool = kcenon::thread::thread_pool::builder{}
.worker_count(workers)
.build();
pool->start();
for (int i = 0; i < 100; ++i) {
return {};
});
}
pool->stop();
return 0;
}
Next Steps