Are all your UPDATE useful ?
I’m working close to modern development teams for more than 10 years, these teams are used to develop with ORM to accelerate and simplify the hard work of software development. From a DBA point of view as powerful as theses tools are they introduced some inconvenient like taking the database far away from developers, only if they are not enough curious to dig in it. One of the problems that I have very often come across is what I called useless UPDATEs. Without realising it the software can generate UPDATE on the database taht do not change anything.
To be very explicit, you UPDATE one or more rows with the values they contains.
id | name | length
----+---------------+--------
3 | Pen Duick II | 13.6
2 | Pen Duick III | 17.45
1 | Pen Duick IV | 20.8
UPDATE boats SET length = 13.6 where id = 3;
UPDATE 1
SELECT * FROM boats order by id desc;
id | name | length
----+---------------+--------
3 | Pen Duick II | 13.6
2 | Pen Duick III | 17.45
1 | Pen Duick IV | 20.8
(3 rows)
In the above exemple I do an UPDATE that change nothing, even if this not a problem from a feature or product point of view, if it’s not a problem on low traffic and volume application you will encounter hard effect when the charge increase a lot. For PostgreSQL a new row is created and an old one will become invisible, for details have a look on how MVCC works and keep in mind that UPDATE table SET i = 1 WHERE i=1 has effect.
So to help teams work on this kind of issue I wrote a very basic extension pg_upless that helps doing audit these issues. It was released these week, even if it’s a fresh work it is issued from years of work on this kind of performance issue, and I’m a big fan of Eric S. Raymond who said “Release early, release often”.
The extension is a set of functions that set up triggers on table you want to audit, compute and store statistics in a table. It’s available on pgxn for on premise installation and easy to install on AWS RDS with pg_tle.
Once the extension is installed, the different methods are documented in README.
You install the extension as usual :
CREATE EXTENSION pg_upless ;
You can audit all the tables in a schema or just a single table. For a single table, call the function :
SELECT pg_upless_start('public','boats');
pg_upless_start
-----------------------------------
Trigger installed on public.boats
For all the tables in the schema public, the command will be
SELECT pg_upless_start('public');
pg_upless_start
- - - - - - - - - - - - - - - - - - - - - - - - -
Trigger installed on all tables in schema public
Once you have called the pg_upless_start function you can have a look at the table pg_upless_stats. In our example we did previously an UPDATE that not change the length of Pen Duick II, so we have in our stats 1 useless UPDATE.
SELECT * FROM pg_upless_stats ;
relnamespace | relname | useful | useless
--------------+---------+--------+---------
public | boats | 0 | 1
(1 row)
Its layout is simple and easy to understand: the useful column contains the number of UPDATEs that have modified the data. The useless column counts the number of UPDATEs that you shouldn’t have done, as they have no effect.
Now we will change all rows by putting name in upper case :
UPDATE boats SET name=lower(name);
UPDATE 3
SELECT * from pg_upless_stats;
relnamespace | relname | useful | useless
--------------+---------+--------+---------
public | boats | 3 | 1
(1 row)
UPDATE boats SET name=lower(name);
UPDATE 3
SELECT * from pg_upless_stats;
relnamespace | relname | useful | useless
--------------+---------+--------+---------
public | boats | 3 | 4
(1 row)
The first time we change all the rows so now useful is equal to 3, if we do it a second time, so no rows will be changed the useless increase by 3.
A number of 0 in useless column is your goal.
If you want to build precise statistics you can know when you start to compute them in the table pg_upless_start_time
SELECT * FROM pg_upless_start_time;
relnamespace | relname | start_time
--------------+--------------------------+-------------------------------
public | boats | 2024-12-16 16:13:49.805461+01
public | ddl_history | 2024-12-16 16:59:00.023369+01
public | rodo | 2024-12-16 16:59:00.023369+01
If you can start a process you need to be able to stop it, this is done with the pg_upless_stop(name,name) and pg_upless_stop(name) functions.
The functions are callable on a regular query, in case you want to audit a subset of your tables you can do :
SELECT pg_upless_start('public', relname) FROM pg_class
WHERE relnamespace='public'::regnamespace
AND relname like 'bo%' AND relkind='r';
pg_upless_start
-------------------------------------------
Trigger installed on public.boats
Trigger installed on public.boats_skipper
You may need to exclude some technical columns that have changed but do not represent a real change, an example is the updated_at column we often find in application schemas. The application code does not change any information, but when the ORM calls an Object.Save() like method, it changes the updated_at value even though nothing has actually been updated. You can exclude this column for a single table or for all tables.
-- Exclude the column update_at in table public.boats
SELECT pg_upless_exclude_column('public', 'boats', 'updated_at');
-- Exclude the column update_at in all tables
SELECT pg_upless_exclude_column('updated_at');
I hope you’ll find this tool useful, please take time to share your ideas on how to improve it on it’s public repository or directy make a pull request, and indeed report any bug you’ll encounter.