Well this escalated quickly into a bigger issue than I thought. Recording here in case anyone else discovers the same issue.
So, I tried to run the following sql to find what records were not meeting this constraint.
SELECT old_state_id FROM states WHERE old_state_id NOT IN (SELECT state_id FROM states)
After a few seconds phpmyadmin gave me an error that it has lost connection to the server. HA logs were also complaining the same. Seems this query crashed my MariaDB database.
Now, I am not DBA but even I know this simple query causing a DB crash is a bad sign!
So, looking in the MariaDB addon logs, this showed the crash and some pointers. Whilst, unfortunately I didn’t keep a copy of the error, it was basically pointing me to some corruption in a number of indexes.
So, that got me looking at my indexes on my HomeAssistant DB and also noticed that there were many duplicate indexes (some of them were the ones seemingly causing the crash)
I deleted the older version of each duplicate index (ie the one without a suffix number or the lowest numbers in the case of more than 1 duplicate).
Also restarted the whole HAOS server after this to allow the server to restart cleanly and do integraity checks when it comes up. These all said OK.
I was now able to run queries but noticed some odd behaviour in that when I browsed the states table, it would show a different count than if I just did a:
SELECT count(*) from states
So, after a bit of googling, it seemed the best thing to do was to rebuild the indexes. I did this by running an:
OPTIMIZE TABLE table_name
on each table in turn. As a note, this caused HA to log issues that it was not able to write to the DB due to a deadlock (which is fine as the optimize routine locks the table until it has rebuilt it and rebuilt all the indexes). I may have lost some state changes for the stats tables but think that it a minor issue compared to the one I am fixing.
I have now run a manual purge and it, so far seems to be running fine and has deleted over 2M records from the states table.
So, in the end, not the actual problem I thought, or intimated by the error, but a much more serious db corruption issue.