We have some cron jobs, to remove old data; but recently, as the amount of data increased, they have been causing io spikes. The internet suggested the problem was caused by deleting everything in one transaction:
DELETE FROM foo WHERE some condition;
We found an example of chunking deletes, in T-SQL, but porting the loop to PL/pgSQL proved… problematic.
It would be nice to simply write:
DELETE FROM foo WHERE <condition> LIMIT 100;
But that syntax doesn’t exist. The easiest thing seems to be using a CTE, to find the ids of the rows to delete:
DELETE FROM foo WHERE id = any(array(SELECT id FROM foo WHERE <condition> LIMIT 100));
And of course it’s useful to know how many rows were actually deleted (rather than the hopeful batch size):
WITH deleted AS ( DELETE FROM foo WHERE id = any(array(SELECT id FROM foo WHERE <condition> limit 100)) RETURNING id ) SELECT count(*) FROM deleted;
It’s easier to do the loop in bash, if you can persuade psql to return that info:
while : do VALUE=$(psql -qtA -d $DBNAME -c "WITH deleted AS...") echo "deleted $VALUE" if [ $VALUE -eq 0 ] then break fi sleep 1 done
There’s 2 params to play with: the batch size, and the delay between loops. It’s pretty straightforward to identify how fast you can delete, without incurring a noticeable io penalty.
Just remember that you won’t necessarily get that disk space back: it’s just made available for re-use, unless you perform some compaction.