Backup filesize jumped to 10x normal

johnBoy · December 11, 2024, 6:57pm

I noticed the HA core backup file size suddenly jumped from around 180Mb to 1800Mb. Since around April.
Searched for info on the forum.
Advice:- Download the file. It’s a .Tar file so can be opened to see what is the cause of the large size.
Did this and this is the directory:-

How to open the home-assistant_v2.db file ?

chairstacker · December 11, 2024, 8:12pm

I use these two addons that allow me to take a look at live the database content in HA:

For the latter I use this code (I’m using MariaDB!) to look at the entities that have the most state records:

WITH total_rows AS (
    SELECT COUNT(*) AS total FROM states
)
SELECT sm.entity_id,
       s.metadata_id,
       COUNT(*) AS count,
       COUNT(*) * 100.0 / total_rows.total AS percentage
FROM states s
JOIN states_meta sm ON s.metadata_id = sm.metadata_id
JOIN total_rows ON 1=1
GROUP BY s.metadata_id, sm.entity_id
ORDER BY count DESC
LIMIT 100;

NathanCu · December 11, 2024, 8:42pm

What’s in your /media folder.

Full backup includes /media, are you saving security camera snapshots perhaps? (that was my problem)

johnBoy · December 11, 2024, 11:25pm

There’s nothing in the media folder and there are no cameras attached.
I’ve installed a logviewer addon. Every couple of seconds I can see
024-12-11 23:16:14.560 WARNING (MainThread) [homeassistant.components.http.ban] Login attempt or request with invalid authentication from 192.168.0.93 (192.168.0.93). Requested URL: ‘/api/config’. (None)

I know that has been logged more or less for ever, but can’t stop it.
It’s constantly in the Notifications panel.
But that wouldn’t cause that huge jump in file size.
I’m trying to figure out ‘DB Browser for SQLite’ which I believe is the default DB used. I’ll have to read up on it a bit, 'cos I’ve no idea about databases.

Thanks for the ideas.

johnBoy · December 12, 2024, 3:33am

A little more info…
(Thanks to Chairstacker for steering me in the right direction).
I’d forgotten I had SQLiteStudio installed, which I’ve now used.
The culprit seems to be from a PID in my ESP32 wine temperature controller. The PID Heat Demand is swamping the logging. So I’ve changed the logging from default to ERROR.

Now. Is there a function to delete or purge the log easily?

chairstacker · December 12, 2024, 8:22pm

I saw this code snippet in another thread:

- service: recorder.purge_entities
    data:
      keep_days: 5
    target:
      entity_id: sensor.your_entity

Haven’t tried it myself yet, but I’d appreciate your feedback in case you do - I assume it keeps the long-term stats for the entity, but it would be nice to get confirmation

CaptTom · December 12, 2024, 9:31pm

I’m confused. Are we talking about the Recorder database (home-assistant_v2.db) being the problem, or the log files? The image in the OP shows what I’d consider a large database, but that depends on what hardware you have available and how comfortable you are with a .db file that big.

@chairstacker, purging the short-term data with the purge action (what used to be called a “service” is now an “action”) will not delete the long-term statistics. In fact there’s no defined way to do that, even if you want to. For that you need to do your own database hammer in SQL while HA is off-line.

A lot has been written here about keeping the recorder database size manageable. This would be a good place to start.

chairstacker · December 12, 2024, 11:38pm

Thanks - appreciate the confirmation!

johnBoy · December 13, 2024, 12:38am

I’m confused with the terminology.

The Recorder database (home-assistant_v2.db) is what I was referring to.
But I also assumed the Log (file) entries were in that file?

Also:
Selecting ESPHome, then selecting LOGS on any of the ESP32 devices (connecting wirelessly), then shows the data that I also assume to be what is being saved in the Recorder database? Hence the ESP32 with the PID function running is / (was) the problem, as it was logging PID HEAT demand.

Meta data id 409 - (PID Heat demand) attribute id 3096994

There are thousands of these in that DB.

I did mention the ‘Login attempt failed’ occurances - which I also assume is in that DB somewhere? Or is that a different log / db file?

The hardware is a Pi4 with a 64GB SSD.
There are six ESP32’s mainly logging temperatures.
Drayton Wiser integration, GEO Home for energy monitoring and around 20 Zigbee SONOFF devices with a few automations.
Brother printer.
MQQT which has now been stopped as part of looking for what’s causing the large backup file.

About sums it up.

I’ll have a look at the link you noted for the database management.
Thanks.

CaptTom · December 13, 2024, 2:40am

Don’t feel bad. I’m still not sure on all the jargon. The “Recorder” is the process which saves data about changes to entities. For example, a light switch “state” changing from off to on, or an “event” indicating that an automation ran. These are saved for as long as you indicate in your configuration.yaml file:

recorder:
  purge_keep_days: #

There are other tables in the same database which save long-term statistics. These are kept forever, but they are summarized each day (I think) or the database size would be unworkable. These tables don’t have any “purge” function or “keep days” option.

The logs, on the other hand, are just what they sound like. Text files to which sequential lines are written as things happen on the system. Errors, warnings, or other levels you can specify. These are used for debugging when there’s a problem. They can grow if you have a particularly “chatty” integration, but that problem doesn’t seem to come up here as much as a bloated database.

The core HA log is home-assistant.log, in the config folder. It gets created when HA starts, at which time the old log is re-named to home-assistant.log.1. It’s often that old log which is most helpful in debugging serious issues like HA crashing. But it doesn’t contain much about routine state changes and events going on when things are running normally.

Likewise, ESPHome devices will send you an ongoing log of what they’re doing, if you connect to them. I don’t think that’s saved anywhere. It will show you what kind of information it’s seeing, including whatever it’s sending to HA via the API. These may be stored in the HA database by the Recorder, providing they represent an event or state change, and the entity isn’t excluded in configuration.yaml.

If the entity changes a lot it can certainly bloat the database. Adding to this is the way Recorder not only saves the value which changed, but can also save a bunch of other “attributes” about the entity in each record, sometimes even those which haven’t changed. I think the thread about keeping the database size down has some SQL which would allow you to quantify which entities are the worst offenders in bloating the database. If not there are other threads here where it has been discussed.

I suspect any logon attempt failures will show up in one of the logs, not in the database.