blog dds: 2020-12-10 — Fast database UPDATE/DELETE operations

You may be familiar with the use of a database upsert of MERGE operation to insert a record into a table or update an existing record, if that record already exists. This evaluates the condition for finding the record only once, and is therefore more efficient than other alternatives. How can you efficiently handle a reverse operation of updating a record and deleting it if some condition holds?

The concrete problem involves reducing a counter in a database record, and deleting the record when the counter reaches zero. In most cases the counter is one; think of goods in a shopping basket. The naive way involves two SQL statements and two searches in the table: a SELECT to fetch the counter value, followed by a DELETE or an UPDATE.

One alternative involves an UPDATE … RETURNING, followed by a DELETE only if the returned value of the counter is zero. However, this performs an unneeded record change in the case of a counter having a value of one, optimizing the uncommon case (counter has a value higher than one) at the expense of the expected common case.

Another alternative, which indeed optimizes the expected common case, involves a DELETE … WHERE … AND counter = 1, followed by an UPDATE when no deletion takes place.

Both alternatives may require a wasteful second search for the record in the table. I thought there would be a method for this operation with an efficiency equivalent to that of an upsert operation, but a search, followed by a question on StackExchange didn’t turn out the answer I was looking for. One comment suggested streamlining the two operations with an SQL trigger, while another cleverly recommended running an IF after a conditional DELETE for the common case. Both again involved duplicate searching, which was what I thought would be wasteful.

A faster solution?

In the end, I dove into SQL cursors and came up with the following solution (for PostgreSQL).

DO
$$
-- Create a cursor pointing to the record
DECLARE
  curs CURSOR FOR SELECT counter FROM elements WHERE … FOR UPDATE;
  value integer;
BEGIN
  -- Find the record
  OPEN curs;
  FETCH FROM curs into value;
  -- Decide whether to decrement or delete
  IF value > 1 THEN
    UPDATE elements
      SET counter = counter - 1
      WHERE CURRENT OF curs;
  ELSE
    DELETE FROM elements
      WHERE CURRENT OF curs;
  END IF;
  CLOSE curs;
END
$$;

Measuring performance

In response to this idea, Erwin Brandstetter, a top-rated database expert on StackExchange and author of the clever answer optimizing the common case, suggested that the cost of creating cursors would drown the cost of twice-searching for the record to update/delete. I therefore set out to measure the performance of the two alternatives.

First I generated a synthetic table of one million records where 80% of them had a counter with a value of one, to match Erwin’s optimized case.

DROP TABLE IF EXISTS elements;
CREATE TABLE elements(
  id integer PRIMARY KEY NOT NULL,
  description CHAR(32),
  counter integer);

INSERT INTO elements
  SELECT generate_series(1, 1e6) AS id,
    md5(random()::text) AS description,
    3 - least((random()*3 + 1)::integer, 2) as counter;

Then, I created two loops (one for each case) for performing the operation on a set of randomly selected records (about 630k).

DELETE/UPDATE method

DO
$$
DECLARE r record;
BEGIN
  FOR r in SELECT DISTINCT id FROM
    (SELECT generate_series(1, 1e6) AS x, (random() * 1e6)::integer AS id) as ids
  LOOP
    DELETE FROM elements WHERE id = r.id AND counter = 1;  -- common case first!

    IF NOT FOUND THEN
      UPDATE elements SET counter = counter - 1 WHERE id = r.id;
    END IF;
  END LOOP;
END
$$;

Cursor method

DO
$$
DECLARE
  r record;
  value integer;
  curs CURSOR FOR SELECT counter FROM elements WHERE id = r.id FOR UPDATE;
BEGIN
  FOR r in SELECT DISTINCT id FROM
    (SELECT generate_series(1, 1e6) AS x, (random() * 1e6)::integer AS id) as ids
  LOOP
    OPEN curs;
    FETCH FROM curs into value;
    IF value > 1 THEN
      UPDATE elements
        SET counter = counter - 1
        WHERE CURRENT OF curs;
    ELSE
      DELETE FROM elements
        WHERE CURRENT OF curs;
    END IF;
    CLOSE curs;
  END LOOP;
END
$$;

Results

The results were not what I expected them to be. The delete/update method took 11.5s to run, whereas the cursor method took 15.0s. I also experimented with other compositions of the counter values that were less favourable to the delete/update method, but even then it beat the cursor method. Specifically, with 1/2/3 values composed in a 62.5%/25%/12.5% mix the delete/update method took 8.4s, whereas the cursor method took 13.4s. Even with 1/2/3 values composed in 25%/50%/25% mix the delete/update method took 10.9s, whereas the cursor method took 15.0s. These numbers come from a Xeon E5-2690 v4 CPU running at 2.60GHz with PostgreSQL 12.2 using a ZFS SSD disk.

To me the results demonstrate the blindingly fast record access speed that PostgreSQL can achieve. One might think that on storage media with worse performance the numbers could turn out different. Indeed, I saw the cursor method outperform the delete/update one on a Raspberry Pi with an SSD disk, and we can assume that rotational media would also perform badly. However, in realistic scenarios the performance of indices stored on magnetic disks would be boosted through ample RAM caching, which would diminish their disadvantage.

Comments Post Toot! Tweet Share