2025-02-14 · Jonas Meyer
WAL growth spikes: a field notebook
What we log first when archives balloon, and how to separate checkpoint noise from genuine write amplification.
When archives climb faster than forecasts, teams often jump to checkpoint tuning. That can help, but it can also hide a burstier problem: long transactions pinning old segments or an accidental full-table rewrite during business hours.
Start with a simple timeline: archive bytes per hour, checkpoint distance, and concurrent write sessions. Layer application deploy markers so you can correlate code releases with slope changes. In regulated environments we still favor plain language incident notes — finance readers care about customer impact, not jargon.
Finally, document what you chose not to do. Skipping a hasty max_wal_size bump can be the right call if it trades away predictability. Your future self will appreciate the rationale when the next spike arrives.
Tags: PostgreSQL, Operations, WAL