
Taming large datasets in PostgreSQL: archiving and retention without the pain
PostgreSQL databases tend to grow silently. One day, you notice queries slowing down, backups taking forever, and no one remembering what is in that 500 GB log table. If this sounds familiar, you’re not alone.
Partitioning makes things manageable
If your database has one massive table that keeps growing, partitioning can be a lifesaver. PostgreSQL has supported native declarative partitioning since version 10, and recent versions (13 and up) have made it much easier to work with.
-
You don’t need any extensions — you can define a parent table and split it by range, list, or hash. For example, for time-based data:
CREATE TABLE events (
id serial,
created_at timestamptz,
payload jsonb
) PARTITION BY RANGE (created_at);
What makes partitioning useful is how it improves query planning and makes deleting old data trivial (dropping a partition is faster and cleaner than a DELETE).
To migrate an existing table, you can:
-
Create a partitioned version of the table
-
Move data in batches using scripts
-
Add triggers to route new inserts into the partitioned version
-
Swap the old and new tables once the migration is complete
Note: Define indexes on the partitioned table itself — PostgreSQL will automatically create corresponding indexes on all partitions, including ones added in the future. You can’t use concurrently, though.
Compression: saving space on cold data
PostgreSQL uses a mechanism called TOAST (The Oversized-Attribute Storage Technique) to handle large field values such as text, jsonb, and bytea. By default, it compresses these using the pglz
algorithm.
Since PostgreSQL 14, you can explicitly choose a compression algorithm per column — if your build includes LZ4 support (–with-lz4), you can use the faster and often more efficient lz4 algorithm:
CREATE TABLE logs_lz4 (
id serial PRIMARY KEY,
message text COMPRESSION lz4
);
Bonus: compressing backups
While not related to table storage, keep in mind that pg_dump also supports compression. The latest method is zstd
and the basic ones are gzip
and lz4
:
pg_dump --format=custom --compress=zstd:3 > dump.zstd
This is a great option for archiving cold data outside the database while saving disk space.
Archiving old data outside PostgreSQL
You don’t have to store everything in your main PostgreSQL instance forever. Cold data can be moved to cheaper, slower systems — and PostgreSQL can still query it.
The most common approach is to use foreign data wrappers (FDWs), such as:
-
postgres_fdw
to offload old partitions to another PostgreSQL instance file_fdw
to access data in CSV-
pg_parquet
to query parquet files on local or cloud storage.
These let you retain the structure of your data while keeping your main instance lean. You can also export data to S3 or cloud storage and use external tools to access it later if needed.
Automating retention policies
A solid retention strategy:
-
Defines how long data is kept (e.g. 3 years)
-
Uses tools like
pg_cron
,pg_timetable
, or external schedulers to automate cleanup -
Aligns with legal and business requirements — especially in regulated industries
Combine partitions + retention logic to drop old data efficiently without scanning huge tables.
Note: If you’re using DELETE
instead of dropping old partitions, remember to run VACUUM
— not to shrink the table on disk, but to mark the deleted rows’ space as reusable for future inserts and updates.
If you’re in the cloud
Cloud deployments bring some extra considerations:
-
Managed services like RDS or Cloud SQL may restrict tuning parameters or disallow certain extensions
-
Use storage tiers (e.g., AWS Glacier, GCP Archive) for low-cost long-term archiving
-
Leverage built-in automation like Cloud Scheduler or AWS Lambda where available
In some setups, you can use logical replication to stream old data to a separate instance dedicated to cold storage or analytics.
Wrapping up
Archiving and retention aren’t just about saving disk space. They’re about keeping your PostgreSQL instance fast, maintainable, and legally compliant. Partitioning, compression, and automation are powerful tools when used together — and none of them require fancy enterprise solutions or huge migrations.
If you’re working with a bloated schema or legacy system, you’re not alone. There’s a clear and practical path forward. I’ll explore some of these techniques with practical examples during my talk at PostgreSQL Conference Germany 2025 next week. Hope to see you there!