You may be familiar with the use of a database upsert of MERGE operation to insert a record into a table or update an existing record, if that record already exists. This evaluates the condition for finding the record only once, and is therefore more efficient than other alternatives. How can you efficiently handle a reverse operation of updating a record and deleting it if some condition holds?
The concrete problem involves reducing a counter in a database record, and deleting the record when the counter reaches zero. In most cases the counter is one; think of goods in a shopping basket. The naive way involves two SQL statements and two searches in the table: a
SELECT to fetch the counter value, followed by a
DELETE or an
One alternative involves an
UPDATE … RETURNING, followed by a
DELETE only if the returned value of the counter is zero. However, this performs an unneeded record change in the case of a counter having a value of one, optimizing the uncommon case (counter has a value higher than one) at the expense of the expected common case.
Another alternative, which indeed optimizes the expected common case, involves a
DELETE … WHERE … AND counter = 1, followed by an
UPDATE when no deletion takes place.
Both alternatives may require a wasteful second search for the record in the table. I thought there would be a method for this operation with an efficiency equivalent to that of an upsert operation, but a search, followed by a question on StackExchange didn't turn out the answer I was looking for. One comment suggested streamlining the two operations with an SQL trigger, while another cleverly recommended running an
IF after a conditional
DELETE for the common case. Both again involved duplicate searching, which was what I thought would be wasteful.
In the end, I dove into SQL cursors and came up with the following solution (for PostgreSQL).
DO $$ -- Create a cursor pointing to the record DECLARE curs CURSOR FOR SELECT counter FROM elements WHERE … FOR UPDATE; value integer; BEGIN -- Find the record OPEN curs; FETCH FROM curs into value; -- Decide whether to decrement or delete IF value > 1 THEN UPDATE elements SET counter = counter - 1 WHERE CURRENT OF curs; ELSE DELETE FROM elements WHERE CURRENT OF curs; END IF; CLOSE curs; END $$;
In response to this idea, Erwin Brandstetter, a top-rated database expert on StackExchange and author of the clever answer optimizing the common case, suggested that the cost of creating cursors would drown the cost of twice-searching for the record to update/delete. I therefore set out to measure the performance of the two alternatives.
First I generated a synthetic table of one million records where 80% of them had a counter with a value of one, to match Erwin's optimized case.
DROP TABLE IF EXISTS elements; CREATE TABLE elements( id integer PRIMARY KEY NOT NULL, description CHAR(32), counter integer); INSERT INTO elements SELECT generate_series(1, 1e6) AS id, md5(random()::text) AS description, 3 - least((random()*3 + 1)::integer, 2) as counter;
Then, I created two loops (one for each case) for performing the operation on a set of randomly selected records (about 630k).
DO $$ DECLARE r record; BEGIN FOR r in SELECT DISTINCT id FROM (SELECT generate_series(1, 1e6) AS x, (random() * 1e6)::integer AS id) as ids LOOP DELETE FROM elements WHERE id = r.id AND counter = 1; -- common case first! IF NOT FOUND THEN UPDATE elements SET counter = counter - 1 WHERE id = r.id; END IF; END LOOP; END $$;
DO $$ DECLARE r record; value integer; curs CURSOR FOR SELECT counter FROM elements WHERE id = r.id FOR UPDATE; BEGIN FOR r in SELECT DISTINCT id FROM (SELECT generate_series(1, 1e6) AS x, (random() * 1e6)::integer AS id) as ids LOOP OPEN curs; FETCH FROM curs into value; IF value > 1 THEN UPDATE elements SET counter = counter - 1 WHERE CURRENT OF curs; ELSE DELETE FROM elements WHERE CURRENT OF curs; END IF; CLOSE curs; END LOOP; END $$;
The results were not what I expected them to be. The delete/update method took 11.5s to run, whereas the cursor method took 15.0s. I also experimented with other compositions of the counter values that were less favourable to the delete/update method, but even then it beat the cursor method. Specifically, with 1/2/3 values composed in a 62.5%/25%/12.5% mix the delete/update method took 8.4s, whereas the cursor method took 13.4s. Even with 1/2/3 values composed in 25%/50%/25% mix the delete/update method took 10.9s, whereas the cursor method took 15.0s. These numbers come from a Xeon E5-2690 v4 CPU running at 2.60GHz with PostgreSQL 12.2 using a ZFS SSD disk.
To me the results demonstrate the blindingly fast record access speed that PostgreSQL can achieve. One might think that on storage media with worse performance the numbers could turn out different. Indeed, I saw the cursor method outperform the delete/update one on a Raspberry Pi with an SSD disk, and we can assume that rotational media would also perform badly. However, in realistic scenarios the performance of indices stored on magnetic disks would be boosted through ample RAM caching, which would diminish their disadvantage.Read and post comments, or share through
Last modified: Friday, December 11, 2020 0:35 am
Unless otherwise expressly stated, all original material on this page created by Diomidis Spinellis is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.