In Anchore, PostgreSQL is used to store the catalog of image contents (SBOMs), policy information, and data from vulnerability feeds. Over time, Anchore databases continuously increase in size. This growth, if allowed to continue unchecked, may eventually cause PostgreSQL to exhaust available storage. Therefore, it is important to perform regular maintenance to ensure PostgreSQL consumes no more storage than necessary.
Update, delete, and insert operations may leave behind dead rows (tuples) which, although no longer required, continue to occupy storage space. PostgreSQL provides a process known as vacuuming which, among other functions, recovers space occupied by dead rows.
PostgreSQL provides two vacuum methods: VACUUM and VACUUM FULL. VACUUM FULL can reclaim more disk space but runs much slower than VACUUM and requires an exclusive lock on the table it is working on. In addition, VACUUM FULL requires extra disk space, as it creates a new copy of the table and does not release the old copy until the operation is complete.
In contrast, VACUUM can run in parallel with most other database operations. Nonetheless, VACUUM is fairly I/O intensive and may have a negative effect on overall performance.
The ANALYZE command is often used with VACUUM command. ANALYZE collects database statistics and is useful for identifying tables in need of vacuuming.
PostgreSQL has an optional but highly recommended feature called autovacuum, whose purpose is to automate the execution of VACUUM and ANALYZE commands. When enabled, autovacuum checks for tables that have had a large number of inserted, updated or deleted tuples and vacuums tables as needed.
Scheduling autovacuum to run on a regular basis (such as daily) is highly recommended in order to reduce the amount of storage occupied by dead rows. However, running VACUUM manually may eventually be required.
Identify tables in need of vacuuming
The following database queries will help you determine which tables should be vacuumed to reclaim storage space.
# check the last vacuum/autovacuum
SELECT relname, last_vacuum, last_autovacuum FROM pg_stat_user_tables;
# check autovacuum settings to determine whether the threshold should be adjusted
SELECT * FROM pg_settings WHERE name LIKE '%autovacuum%';
# find dead rows in the tables
SELECT relname, n_dead_tup FROM pg_stat_user_tables;
You can also list out the 30 biggest tables by size; this may give you a better idea of where space is being used.
# list top 30 tables by amount of space used
SELECT nspname || '.' || relname AS "relation",
pg_size_pretty(pg_total_relation_size(C.oid)) AS "total_size"
FROM pg_class C
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE nspname NOT IN ('pg_catalog', 'information_schema')
AND C.relkind <> 'i'
AND nspname !~ '^pg_toast'
ORDER BY pg_total_relation_size(C.oid) DESC
Finally, this query will give you all the tables and their sizes:
SELECT *, pg_size_pretty(total_bytes) AS total
, pg_size_pretty(index_bytes) AS INDEX
, pg_size_pretty(toast_bytes) AS toast
, pg_size_pretty(table_bytes) AS TABLE
SELECT *, total_bytes-index_bytes-COALESCE(toast_bytes,0) AS table_bytes FROM (
SELECT c.oid,nspname AS table_schema, relname AS TABLE_NAME
, c.reltuples AS row_estimate
, pg_total_relation_size(c.oid) AS total_bytes
, pg_indexes_size(c.oid) AS index_bytes
, pg_total_relation_size(reltoastrelid) AS toast_bytes
FROM pg_class c
LEFT JOIN pg_namespace n ON n.oid = c.relnamespace
WHERE relkind = 'r'
After identifying a table with large number of dead rows, run the following command:
VACUUM (VERBOSE, ANALYZE) tablename;
We recommend that you stop all Anchore services before running the command VACUUM to minimize database activity during the vacuuming process.
Run VACUUM FULL
If running VACUUM was not sufficient to reclaim storage space, you will need to run VACUUM FULL. Stop all Anchore services before running the command.
VACUUM (FULL, ANALYZE, VERBOSE) tablename;
Because running VACUUM FULL is time consuming and requires an exclusive lock on affected tables you should schedule down time while the process is running. You must also ensure sufficient disk space is available for new copies of tables to be created.
Prevent the storage of unnecessary data to help keep database storage requirements to a minimum.
- Delete any vulnerability feeds you do not need.
- Delete image data that is no longer required. Image deletion can be done manually or scheduled so that data is deleted automatically once it has reached a specified age. Note that archiving images, rather than deleting them, removes them from the working set but not from the database. Archiving images is likely to benefit Anchore performance but has no effect on database size.