The suggested approach is difficult to sustain as one can get overwhelmed by the number of entities to classify.
Instead of making a decision for each and every entity_id, why not look at the ‘worst offenders’ and exclude those, only. By identifying the most frequently updated entities we can significantly reduce I/O without having to micro-manage the list of exclusions.
To do so, you’d need SQL access to the database, and then use SQL queries similar to the following:
For the states table:
SELECT
entity_id,
COUNT(*) AS cnt
FROM states
GROUP BY
entity_id
ORDER BY
COUNT(*) DESC
The events table seems to take quite a bit longer with:
SELECT
JSON_VALUE(event_data, '$.entity_id') AS entity_id,
COUNT(*) AS cnt
FROM events
WHERE
JSON_VALUE(event_data, '$.entity_id') IS NOT NULL
GROUP BY
JSON_VALUE(event_data, '$.entity_id')
ORDER BY
COUNT(*) DESC