Issues as an inexperienced new user

I wasn’t sure where to put this since it covers a few different things, hopefully it doesn’t end up too long as to trigger a tl;dr response (or come off as a rant).

I started out with a couple of tp-link power monitoring plugs and a guide which showed me how to use a raspberry pi to gather the data from these plugs and put them in a mysql database and then use grafana to view the data. It became too hard to manage as I added more plugs and eventually I found home assistant.

After the first install it found a lot of sensors, my plugs, lights, phones and bluetooth devices. Through the use of addons I also had data from my energy company as well as readings from my water meter. I increased the purge date to 365 as I wanted long term data for a bunch of the sensors.

Fast forward a couple of months and backups no longer work, after a few days investigating (have to go through the hoops of learning all the basics to progress as a novice), eventually finding my database was over 10GB. I needed to clear this out so I could get a backup otherwise I was running the risk of losing everything done so far. This is the first major issue, why is there no data for each sensor on how much space is being used by its history data? I ended up learning some SQL and found which sensors had the most entries in the database, i don’t know if this is tied perfectly to the space usage but its a start at least.

I learn that excluding sensors stops them being saved, but it didn’t prune the existing data during the daily purge? I had to add the exclude and then manually run the purge for each sensor one at a time. I had to do this with about 50 sensors so it was pretty tedious.

I have now come to the conclusion that if I want to control the data properly, I need to exclude everything and manually write up a filter for every single sensor. It feels like it is a bit pointless having sensors be auto added if they don’t provide information about how much space they use or have enough control per sensor.

I did try asking on the discord for help at various stages in this but other than setting up the water meter sensor I have been ignored. I ended up using chatGPT to create some filters and although it gave a lot of wrong information, if I kept feeding the errors from its own code back into it then eventually it created something that worked. The major issue here is the data it is based on is from a few years ago and like a lot of the home assistant guides found through google, it’s mostly depreciated configuration structures, extremely confusing for a novice :frowning:

So the tl;dr part:
Why don’t sensors show how space they are using or how many points of data are attached to them?
Why isn’t the exclude/include part of the ui for when sensors are first added or for when modifying existing ones?
Why isn’t the filtering part of the ui sensor settings?

From my point of view, it would be nice if I could view a table of sensor sizes and then select a bunch to exclude+filter (applying the same type of filter to all of them). With a tick box to apply the filter to all the old data collected. Examples here are “previous month” data for electric/gas/water, the value stays the same for the whole month, yet it gets logged in the database every couple of minutes, thousands of entries all with the same value and just the time/date slowly increasing, a filter for this would stop a new entry being added if its the same value as the previous one.

Is it worth me adding these kind of requests to the feature request section? Nearly all the other feature requests seem to be much higher level stuff.

It would also be nice if the documentation had a lot more examples for what each settings does.

When you exclude sensors the database doesn’t start to shrink immediately - earlier data will still be retained. If your system has not yet been running for a year you will have to wait until it has…

365 days is a very long time. The default is 10 days. Most people set it to less (I have five). Perhaps you should be looking at long term statistics?

I did look at the long term statistics, but it did not seem to state anywhere that it would ignore the purge. I got the impression it was for a running total that does not get reset when history is removed. It might be useful for some of my sensors, but it still seems to require excluding the sensor it is taking the value from and manually setting up the long term statistic sensor in the configuration.yaml file.

None of all that.

Long term statistics do not purge at all, they’re retained infinitely. All they require is a state_class of either measurement (for independent samples like temperature, humidity, power, etc) or total_increasing (for accumulated data like energy, water usage, rain, etc).

Long term statistics is exactly what you need.

Or just call the recorder.purge service manually with the number of days you’d like to keep. Optionally repack the DB while you’re at it.

It looks like i have been doing things a bit backwards then, thanks for the information. I will greatly reduce the purge delay and add what I want as long term statistics.

Once the long term statistics are setup, does all the historical data get processed or does it only start from the moment it was added?

I went through the purge + repack enough to get a working backup (75% database size reduction), it took a long time on a pi3 using an sd card though (~1hr for each purge with the repack taking a day). I have since moved to a pi4 using a usb nvme drive.