autotoc_md1555
doc_id: "LOG-GUID-002b" doc_title: "Crash-Safe Logger Guide" doc_version: "1.0.0" doc_date: "2026-04-04" doc_status: "Released" project: "logger_system"
category: "GUID"
Crash-Safe Logger Guide
Version: 1.0.0 Last Updated: 2025-02-09 Status: Production Ready Split from: LOG_SERVER_AND_CRASH_SAFETY.md
Table of Contents
- Overview
- Crash Safety Mechanism
- Configuration
- API Reference
- Performance Overhead
- When to Use
- Combined Usage
- Local Crash Safety + Network Forwarding
- Server-Side Crash Safety
- Complete Production Topology
- Best Practices
- Troubleshooting
- Related Documentation
Overview
The crash-safe logger guarantees log persistence during application crashes, providing zero data loss during application failures.
| Feature | Purpose | Key Benefit |
| Crash-Safe Logger | Guaranteed log persistence during crashes | Zero data loss during application failures |
Key Benefits:
- Crash Resilience: Logs survive application crashes via signal handlers
- Zero Data Loss: Emergency flush ensures critical logs reach persistent storage
- Production Ready: Thread-safe, performant, and battle-tested
Crash Safety Mechanism
The crash-safe logger guarantees log persistence during application crashes through signal-based emergency flushing.
How It Works
┌─────────────────────────────────────────────────────────┐
│ Normal Operation Flow │
│ │
│ Application Code │
│ │ │
│ ▼ │
│ logger->log(...) │
│ │ │
│ ▼ │
│ crash_safe_logger │
│ │ │
│ ▼ │
│ underlying_logger (buffered) │
│ │ │
│ ▼ │
│ Periodic Flush (auto_flush_interval) │
│ │ │
│ ▼ │
│ Disk Storage │
└─────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────┐
│ Crash Scenario Flow │
│ │
│ Application Code │
│ │ │
│ ▼ │
│ CRASH (SIGSEGV/SIGABRT) │
│ │ │
│ ▼ │
│ Signal Handler (async-signal-safe) │
│ │ │
│ ▼ │
│ emergency_flush() │
│ │ │
│ ├─ Set emergency flag (atomic) │
│ │ │
│ ├─ Try flush_mutex_.try_lock() │
│ │ │
│ └─ Flush buffered logs (best-effort) │
│ │ │
│ ▼ │
│ Disk Storage (logs saved!) │
│ │ │
│ ▼ │
│ Chain to original signal handler or exit │
└─────────────────────────────────────────────────────────┘
Header File: include/kcenon/logger/safety/crash_safe_logger.h
Signal Handlers
The crash-safe logger installs handlers for the following signals:
| Signal | Description | Action |
| SIGSEGV | Segmentation fault (invalid memory access) | Emergency flush + chain to old handler |
| SIGABRT | Abort signal (assert failures, std::terminate()) | Emergency flush + chain to old handler |
| SIGTERM | Termination request (graceful shutdown) | Emergency flush + exit |
| SIGINT | Interrupt signal (Ctrl+C) | Emergency flush + exit |
Async-Signal-Safe Constraints
Signal handlers must follow strict async-signal-safe rules:
Allowed Operations:
- Atomic operations (
std::atomic)
try_lock() (non-blocking mutex attempt)
- Direct system calls (
write(), fsync())
Forbidden Operations:
- Memory allocation (
new, malloc())
lock() (blocking mutex)
- Exception throwing
- Non-reentrant library calls
Rationale: The application may crash while holding locks or during memory allocation, so the signal handler cannot rely on these mechanisms.
Recovery Procedure
After an application restart:
- Detection: Check for emergency flush flag or incomplete log files
- Recovery: Read any crash logs from emergency flush location
- Analysis: Parse crash logs to determine failure cause
- Cleanup: Remove emergency markers
Note: The current implementation focuses on emergency flushing. Full recovery API would be added in future versions.
Configuration
Basic Setup
auto safe_logger = crash_safe_logger::create();
if (!safe_logger->install_crash_handlers()) {
std::cerr << "Failed to install crash handlers\n";
return 1;
}
safe_logger->start();
safe_logger->log(log_level::info, "Application started with crash protection");
Logger with crash recovery and emergency flush capabilities.
Advanced Configuration
auto base_logger = std::make_shared<logger>(
true,
32768
);
auto safe_logger = crash_safe_logger::create(base_logger);
safe_logger->install_crash_handlers();
safe_logger->set_auto_flush_interval(std::chrono::seconds(5));
safe_logger->set_min_level(log_level::debug);
safe_logger->start();
API Reference
crash_safe_logger Class
class crash_safe_logger {
public:
static std::shared_ptr<crash_safe_logger>
create(
std::shared_ptr<logger> underlying_logger = nullptr
);
void log(log_level level,
const std::string& message);
void log(log_level level,
const std::string& message,
const std::string& file,
int line,
const std::string& function);
};
}
std::shared_ptr< logger > get_underlying_logger()
Get the underlying logger.
bool flush_with_timeout(std::chrono::milliseconds timeout)
Flush with timeout to prevent deadlocks.
common::VoidResult start()
Start the underlying logger.
void set_min_level(log_level level)
Set minimum log level (thread-safe)
void set_auto_flush_interval(std::chrono::milliseconds interval)
Enable auto-flush at regular intervals.
void log(log_level level, const std::string &message)
Log message (delegates to underlying logger)
common::VoidResult stop()
Stop the underlying logger.
void uninstall_crash_handlers()
Remove signal handlers.
void emergency_flush()
Emergency flush (async-signal-safe)
bool install_crash_handlers()
Install signal handlers for crash detection.
log_level get_min_level() const
Get minimum log level (thread-safe)
static std::shared_ptr< crash_safe_logger > create(std::shared_ptr< logger > underlying_logger=nullptr)
Create crash-safe logger.
Method Details
create()
Description: Factory method to create a crash-safe logger instance.
Parameters:
underlying_logger – Base logger to wrap (creates default if nullptr)
Returns: std::shared_ptr<crash_safe_logger>
Default Underlying Logger:
- Async mode:
true
- Buffer size:
16384 bytes (16KB)
Example:
auto safe_logger = crash_safe_logger::create();
auto custom_logger = std::make_shared<logger>(true, 65536);
auto safe_logger = crash_safe_logger::create(custom_logger);
install_crash_handlers()
Description: Installs signal handlers for crash detection.
Returns: bool – true if all handlers installed successfully, false otherwise.
Signals Handled: SIGSEGV, SIGABRT, SIGTERM, SIGINT
Idempotent: Safe to call multiple times (subsequent calls return true without reinstalling).
Example:
auto safe_logger = crash_safe_logger::create();
if (!safe_logger->install_crash_handlers()) {
std::cerr << "Failed to install signal handlers\n";
return 1;
}
uninstall_crash_handlers()
Description: Removes signal handlers and restores original handlers.
Note: Automatically called in destructor.
Example:
safe_logger->uninstall_crash_handlers();
flush_with_timeout()
Description: Flushes logs with a timeout to prevent deadlocks.
Parameters:
timeout – Maximum time to wait for flush completion
Returns: bool – true if flushed successfully, false on timeout.
Use Case: Graceful shutdown where you want to ensure logs are flushed but don't want to wait indefinitely.
Example:
if (!safe_logger->flush_with_timeout(std::chrono::seconds(2))) {
std::cerr << "Warning: Flush timed out\n";
}
emergency_flush()
Description: Performs best-effort flush in signal handler context.
Async-Signal-Safe: Yes (no allocations, non-blocking locks only).
Returns: void (always succeeds, but flush may be incomplete if locks held).
Note: Typically called automatically by signal handler, not by user code.
Example:
safe_logger->emergency_flush();
set_auto_flush_interval()
Description: Configures automatic background flushing for data durability.
Parameters:
interval – Time between auto-flushes (0 disables auto-flush)
Spawns Thread: Yes (background thread for periodic flushing).
Example:
safe_logger->set_auto_flush_interval(std::chrono::seconds(5));
safe_logger->set_auto_flush_interval(std::chrono::milliseconds(0));
Trade-off: More frequent flushes increase durability but may reduce performance.
set_min_level() / get_min_level()
Description: Thread-safe log level management using atomic operations.
Parameters:
level – Minimum log level to record
Thread-Safe: Yes (uses std::atomic<log_level>).
Example:
safe_logger->set_min_level(log_level::warning);
auto current_level = safe_logger->get_min_level();
Performance Overhead
Benchmarks
| Configuration | Throughput | Latency | Overhead vs Standard Logger |
| Standard logger | 4.34M msg/sec | 148ns | Baseline |
| Crash-safe (no auto-flush) | 4.28M msg/sec | 151ns | +2% (signal handler check) |
| Crash-safe (5s auto-flush) | 4.15M msg/sec | 155ns | +4.4% (periodic flush) |
| Crash-safe (1s auto-flush) | 3.89M msg/sec | 166ns | +10.4% (frequent flush) |
Test Environment:
- CPU: Intel Xeon E5-2686 v4 (2.3 GHz)
- Compiler: GCC 11.4.0
- Optimization:
-O3 -march=native
- Buffer size: 16KB
Overhead Sources
- Signal Handler Check (~2%): Atomic operations for emergency flush flag
- Auto-Flush Thread (~2-8%): Depends on flush frequency and buffer size
- Flush Timeout Logic (~1%): Try-lock and deadline checking
Optimization Tips
Minimize Overhead:
auto base_logger = std::make_shared<logger>(true, 65536);
auto safe_logger = crash_safe_logger::create(base_logger);
safe_logger->set_auto_flush_interval(std::chrono::seconds(10));
Maximize Durability:
auto base_logger = std::make_shared<logger>(true, 8192);
auto safe_logger = crash_safe_logger::create(base_logger);
safe_logger->set_auto_flush_interval(std::chrono::seconds(1));
When to Use
Use Crash-Safe Logger When:
Critical Financial Applications
- Trading systems
- Payment gateways
- Accounting software
Reason: Zero tolerance for data loss; logs required for auditing and compliance.
Medical/Healthcare Systems
- Patient monitoring
- Medical device control
- Electronic health records
Reason: Regulatory requirements (FDA, HIPAA) mandate log completeness.
Debugging Intermittent Crashes
- Hard-to-reproduce bugs
- Production crash analysis
- Post-mortem debugging
Reason: Logs may reveal crash cause that would otherwise be lost.
Compliance Requirements
- SOC 2 auditing
- PCI-DSS logging
- GDPR data processing records
Reason: Audit logs must survive system failures.
Do NOT Use When:
Low-Latency Requirements
- High-frequency trading (sub-microsecond latency)
- Real-time control systems
Reason: 2-10% overhead may be unacceptable; use specialized logging instead.
Development/Testing Environments
- Local development
- Unit tests
Reason: No crash safety needed; standard logger is simpler and faster.
Immutable Infrastructure
- Containers that restart on crash
- Serverless functions
Reason: Logs are externalized before crash via centralized logging; crash safety redundant.
Combined Usage
Local Crash Safety + Network Forwarding
Combine crash-safe logging locally with network forwarding to a central server.
auto local_file_writer = std::make_unique<rotating_file_writer>(
"/var/log/app/local.log",
10 * 1024 * 1024,
5
);
auto local_logger = std::make_shared<logger>(true, 16384);
local_logger->add_writer(std::move(local_file_writer));
auto safe_logger = crash_safe_logger::create(local_logger);
safe_logger->install_crash_handlers();
safe_logger->set_auto_flush_interval(std::chrono::seconds(5));
auto network_writer = std::make_unique<network_writer>(
"log-server.internal",
9999
);
local_logger->add_writer(std::move(network_writer));
safe_logger->start();
safe_logger->log(log_level::info, "Application started with dual logging");
Network writer for sending logs over TCP/UDP.
Rotating file writer with size and time-based rotation.
Benefits:
- Local logs survive crashes (emergency flush to disk)
- Centralized logs for aggregation and analysis
- Redundancy: If network fails, local logs still captured
Deployment Topology:
┌─────────────────────────────────────────────────────────┐
│ Application Node │
│ │
│ ┌──────────────────────────────────────────────────┐ │
│ │ crash_safe_logger │ │
│ │ │ │
│ │ ┌────────────────────────────────────────────┐ │ │
│ │ │ underlying logger │ │ │
│ │ │ │ │ │
│ │ │ ┌─────────────────┐ ┌─────────────────┐ │ │ │
│ │ │ │rotating_file │ │network_writer │ │ │ │
│ │ │ │_writer │ │ │ │ │ │
│ │ │ └────────┬────────┘ └────────┬────────┘ │ │ │
│ │ └───────────┼────────────────────┼──────────┘ │ │
│ └──────────────┼────────────────────┼────────────┘ │
│ │ │ │
│ ▼ │ │
│ /var/log/app/local.log │ │
│ (CRASH-SAFE: emergency_flush) │ │
└──────────────────────────────────────┼─────────────────┘
│
│ TCP (network)
│
▼
┌──────────────────┐
│ log_server │
│ (Centralized) │
└────────┬─────────┘
│
▼
/var/log/aggregated/app.log
Server-Side Crash Safety
Protect the log server itself from crashes.
auto server_logger = crash_safe_logger::create();
server_logger->install_crash_handlers();
server_logger->set_auto_flush_interval(std::chrono::seconds(5));
server_logger->start();
server_config config;
config.host = "0.0.0.0";
config.port = 9999;
config.max_connections = 500;
auto server = log_server_factory::create_basic(config);
if (!server->start()) {
server_logger->log(log_level::error, "Failed to start log server");
return 1;
}
server_logger->log(log_level::info,
"Log server started on port " + std::to_string(config.port));
while (server->is_running()) {
server_logger->log(log_level::debug, "Server health check: OK");
std::this_thread::sleep_for(std::chrono::minutes(1));
}
Log server for distributed logging.
Benefits:
- Server crash logs preserved for debugging
- Server health monitoring with guaranteed persistence
- Operational visibility even during server failures
Complete Production Topology
Full production deployment with all features enabled.
auto base_logger = std::make_shared<logger>(true, 32768);
auto local_writer = std::make_unique<rotating_file_writer>(
"/var/log/app/critical.log",
50 * 1024 * 1024,
10
);
auto filtered_local = std::make_unique<filtered_writer>(
std::move(local_writer),
log_level::error
);
base_logger->add_writer(std::move(filtered_local));
auto network = std::make_unique<network_writer>(
"log-server.prod.internal",
9999
);
base_logger->add_writer(std::move(network));
auto app_logger = crash_safe_logger::create(base_logger);
app_logger->install_crash_handlers();
app_logger->set_auto_flush_interval(std::chrono::seconds(3));
app_logger->set_min_level(log_level::info);
app_logger->start();
auto server_logger = crash_safe_logger::create();
server_logger->install_crash_handlers();
server_logger->set_auto_flush_interval(std::chrono::seconds(5));
server_logger->start();
server_config server_conf;
server_conf.host = "0.0.0.0";
server_conf.port = 9999;
server_conf.max_connections = 1000;
server_conf.buffer_size = 65536;
server_conf.enable_compression = true;
auto log_srv = log_server_factory::create_basic(server_conf);
if (!log_srv->start()) {
server_logger->log(log_level::fatal, "Failed to start log server");
return 1;
}
server_logger->log(log_level::info,
"Production log server started with crash protection");
while (log_srv->is_running()) {
server_logger->log(log_level::info, "Server operational");
std::this_thread::sleep_for(std::chrono::minutes(5));
}
Decorator that applies filtering to wrapped log writers.
Topology Diagram:
┌─────────────────────────────────────────────────────────┐
│ Application Node 1 │
│ │
│ crash_safe_logger │
│ ├─ rotating_file_writer (ERROR+) → /var/log/app/ │
│ └─ network_writer (ALL) → log-server:9999 │
└──────────────────────────────────────┼──────────────────┘
│
┌──────────────────────────────────────┼──────────────────┐
│ Application Node 2 │ │
│ │ │
│ crash_safe_logger │ │
│ ├─ rotating_file_writer (ERROR+) │ │
│ └─ network_writer (ALL) ──────────┘ │
└──────────────────────────────────────────────────────────┘
│
▼
┌──────────────────┐
│ log_server │
│ (crash_safe_logger│
│ protection) │
└────────┬─────────┘
│
┌──────────────┼──────────────┐
│ │ │
▼ ▼ ▼
rotating_file json_writer syslog
(hourly) (for ELK) (legacy)
│ │ │
▼ ▼ ▼
/var/log/ /var/log/ syslog
aggregated/ json/ daemon
Features:
- Client-side crash safety (local ERROR+ logs)
- Server-side crash safety (server operational logs)
- Centralized aggregation (all client logs)
- Multi-tier filtering (local ERROR+, remote ALL)
- High availability (local logs if network fails)
- Compression (reduced network bandwidth)
Best Practices
1. Choose Appropriate Auto-Flush Interval
| Scenario | Recommended Interval | Rationale |
| Financial transactions | 1-2 seconds | Minimal data loss acceptable |
| Web applications | 5-10 seconds | Balance durability and performance |
| Batch processing | 30-60 seconds | Performance priority |
| Development | Disabled (0) | Maximize performance |
2. Test Signal Handlers
Crash Test Program:
#include <thread>
void crash_test() {
auto safe_logger = crash_safe_logger::create();
safe_logger->install_crash_handlers();
safe_logger->set_auto_flush_interval(std::chrono::seconds(5));
safe_logger->start();
for (int i = 0; i < 100; ++i) {
safe_logger->log(log_level::info, "Message before crash: " + std::to_string(i));
std::this_thread::sleep_for(std::chrono::milliseconds(10));
}
safe_logger->log(log_level::fatal, "About to crash!");
int* ptr = nullptr;
*ptr = 42;
}
crash_test();
return 0;
}
Verify Emergency Flush:
$ ./crash_test
# Application crashes
$ tail /var/log/app.log
...
2025-02-09 14:35:22 [FATAL] About to crash!
# Last log message was flushed before crash
3. Avoid Signal Handler Conflicts
Check Existing Handlers:
#include <csignal>
void check_existing_handlers() {
struct sigaction sa;
sigaction(SIGSEGV, nullptr, &sa);
if (sa.sa_handler != SIG_DFL) {
std::cerr << "Warning: SIGSEGV handler already installed\n";
}
}
Multiple Libraries: If other libraries (e.g., sanitizers, profilers) install signal handlers, crash_safe_logger will chain to them.
4. Graceful Shutdown
Always Flush Before Exit:
void graceful_shutdown(std::shared_ptr<crash_safe_logger>
logger) {
logger->log(log_level::info,
"Shutting down...");
if (!
logger->flush_with_timeout(std::chrono::seconds(5))) {
std::cerr << "Warning: Flush timeout during shutdown\n";
}
logger->uninstall_crash_handlers();
}
auto logger = crash_safe_logger::create();
logger->install_crash_handlers();
return 0;
}
Troubleshooting
Problem: Emergency flush not working
Symptoms: Logs missing after crash.
Debugging:
safe_logger->log(log_level::info, "Testing crash protection");
int* ptr = nullptr;
*ptr = 42;
$ tail /var/log/app.log
# Missing "Testing crash protection" message
Possible Causes:
- Signal handlers not installed
auto safe_logger = crash_safe_logger::create();
Solution: Call install_crash_handlers() after creation.
- Buffer not flushed
auto base_logger = std::make_shared<logger>(true, 1048576);
Solution: Use smaller buffers (16KB-64KB) for critical logs.
- Lock held during crash Solution: This is inherent limitation; auto-flush mitigates risk.
Problem: High CPU usage from auto-flush
Symptoms: CPU usage spikes every N seconds.
Debugging:
$ top -p $(pgrep -f my_app)
PID USER CPU% COMMAND
1234 user 35.0 my_app
# Periodic spikes align with auto-flush interval
Possible Causes:
- Auto-flush interval too short
safe_logger->set_auto_flush_interval(std::chrono::milliseconds(100));
Solution: Increase interval to 5-10 seconds.
- Large buffer size
auto base_logger = std::make_shared<logger>(true, 1048576);
Solution: Reduce buffer size or increase interval.
Problem: Signal handler conflicts
Symptoms: Application crashes without emergency flush.
Debugging:
#include <csignal>
void verify_handlers() {
struct sigaction sa;
sigaction(SIGSEGV, nullptr, &sa);
if (sa.sa_handler != &crash_safe_logger::signal_handler) {
std::cerr << "Warning: Signal handler overwritten\n";
}
}
Possible Causes:
- Another library installed handlers after crash_safe_logger
safe_logger->install_crash_handlers();
some_library_init();
Solution: Install crash-safe handlers LAST.
- Sanitizers (ASan, TSan) override handlers
$ ASAN_OPTIONS=handle_segv=1 ./my_app
# AddressSanitizer overrides SIGSEGV handler
Solution: Use handle_segv=0 to let crash-safe logger handle signals.
Related Documentation
Header Files
External Resources
Document Information: