In-memory MariaDB (MySQL) add-on for recorder/history integration

Thanks for this add-on!
I am running latest HassOS directly on x86-64 laptop installed on the hard drive.

Add-on version:
You are running the latest version of this add-on.
System: Home Assistant OS 11.1 (amd64 / generic-x86-64)
Home Assistant Core: 2023.11.3
Home Assistant Supervisor: 2023.11.3
After 1 night of implementing it, I can hear the hard drive is still spinning and being accessed.

Could I expect the hard drive is spinned down?

I think HA writes the drive quite regularly on it’s own:

  • HA core log file (the home-assistant.log file)
  • each add-on has it’s own log, that goes into the host’s log system, and some of the add-ons are quite chatty, or at least the regular healthcheck causes some logging
  • HA saves it’s state in a lot of file under the .storage folder

So I think it is impossible to have a HA installation that never writes the hard drive or SD or eMMC.

If you configured this in-memory DB add-on to never export it’s in-memory DB to storage (SD card, hard drive), it shouldn’t write the log and the storage at all, so I think it is something else, that writes the drive regularly.

Here I tried to do a little comparison using In-memory MariaDB:

Wow! Nice work! As I understand:

  • SQLite: 1GB
  • MariaDB: 420MB
  • In-memory MariaDB: 130MB
    … SD card writes during the 3 day long test periods.

I would like to add:

  • By default In-memory MariaDB do not write anything at 02:00 to the SD card.
  • Though even if it is turned on, with your approx. 100MB database that would be around 30MB (5+10+15MB) during the 3 days, but I don’t see any uptick in the “eMMC written” chart around 02:00.
  • So my asumption is that the default settings means that HA on it’s own has written 130MB to SD card, and MariaDB added 290MB, and SQLite added 870MB wear to the SD card during these 3 days with your setup.

Yeah, HA saves it’s internal state as text files in the /homeassistant/.storage folder as state changes, plus a lot of log is stored by the OS, but I thought it is less than 40MB/day. Let’s say, if we have a 32GB SD card/eMMC, it is written all over approx once a year. Hmmm, it should work, even if I thought it takes more time. Nice new info, thank you!

I did redo the stats with the included commit to archive every day at 2:00 (still the same top link under item 3 two spoilers for comparison)
There are almost no differences at all.

I would also like to say a few words of wishes:

  1. I use the “In-memory MariaDB Memory usage” sensor as a command:
  - sensor:
      name: In-memory MariaDB Memory usage
      unique_id: inmemory_mariadb_memory_usage
      command: printf `ssh deoptim@localhost docker exec -i addon_45207088_mariadb /bin/bash -c "df\ -h\ /tmp" | awk 'NR == 2{print $5}' | tr -d '%'` %d
      value_template: "{{ value | round(1) }}"
      scan_interval: 600
      unit_of_measurement: '%'
      device_class: data_size

Since I don’t know if the allocated space of tmpfs (200MB by default) for the database is running out or not. It would be great to display this kind of statistics separately as entities.
Say here:

  1. As I understand, if we are going to make a daily commit to the archive, there are limitations in the size of the database (the larger the database, the more we need to increase the period of the commit to the archive - to make it make any sense).
    In this case it would be good to split the commit file into periods (say weeks) - what do you think?

Yeah! :slight_smile: I see the small eMMC writes at 02:00, approx. 1, 5, 10MB, that is the wear of the in-memory add-on on the SD card! :slight_smile: My assumption was correct, all the other 100++ MB is HA alone. :frowning:

  1. tmpfs usage sensor:

    • It was on my wish list also.
    • The current sensors (CPU percent, etc.) are provided by the supervisor and hard-wired, can’t be extended.
    • Hacky solution is to use the REST API of HA to periodically inject sensor states. This is what I implemented, see: Report tmpfs usage diagnostic values as sensors · lmagyar/homeassistant-addon-mariadb-inmemory@37dbe3f · GitHub. I test it for a few days in my beta repo then I will release it. I tried to follow the strange naming of the sensors created by the Supervisor.
    • A better solution would be to create an integration (with devices, entities), giving unique_id for these sensors, making it possible that users can edit their names, icon, etc. But I will not do this. The only argument for this would be, that the REST API doesn’t support unique_id-s, see: HA.core REST API POST unique_id for entities?, but I hope there will be some standard solution in the future.
  2. incremental exports:

    • It is not completely clear, what did you mean by: ‘As I understand, if we are going to make a daily commit to the archive, there are limitations in the size of the database (the larger the database, the more we need to increase the period of the commit to the archive - to make it make any sense). In this case it would be good to split the commit file into periods (say weeks) - what do you think?’
    • The add-on uses sqldump+gzip to save all the databases regularly, and it overwrites the previous dump.
    • So, incremental dumps can decrease the wear on the SD card, but it doesn’t worth the effort. The HA database has 2 main parts:
      • States and Events tables: these are purged regularly, these are usually the bigger part of the DB. But you have only a few days of data in these tables, so the content will be really different when the next dump runs.
      • Statistics tables: HA never deletes any data in them, the add-on deletes (because we have limited storage space), let say after 6 monthes, so we can decrease the SD card wear dumping only - let say - the last day of them, but recovery from 180 dumps in a reliable way would be a nightmare, and we would save let say 1-10MB wear of the SD card per day, meanwhile HA writes 100++MB daily whatever we do. So I won’t implement this.
    • UPDATE: Currently the add-on have to keep 2 consecutive copies of the dump, because it is possible that a dump is not written to the SD card’s memory completely before a power failure, even if the OS reports that it gave the data to the SD card completely, it happened with me due to a faulty capacitor on my eMMC->SD adapter board, so there is an elaborate logic how to handle even these 1+1 sqldump files, I’m sure an incremental version of this would cause more problem than solve.

And the new version (with the sensors) is out. :sunglasses:

This is an awesome addin - I wanted to thank you for creating and maintaining it to start.

I’m assuming this thread is for general inquiry so please direct me where I can understand a couple of general questions:

Is there a set of documentation that discusses the general technical improvements made with this fork? I’m starting with the assumption that this addin makes some general config improvements to store the db on a tmpfs block device…is that fundamentally it or are there features enabled that go beyond tmpfs redirection? I get that this can be found on the repository, but a summary is so much nicer to start :grin:.

How does this work in concept? You start with a db installed into tmpfs…how do you persist between reboots when configuring the periodic update interval? Is the gziped file just a raw data export into a delimited file format or is it a binary representation? What’s the command used to do the export? What about then the import? Does it block HA load during startup or is it parallel? What happens to the data if in parallel if the addon has not started? What about the data load…if I had 32gb in memory does it suck it in through multiple files to create the new db? Does this use internal capabilities of MariaDB to accomplish this task?

I’m digging deeper into this as I am more of your fringe case. I log everything. Always. I am a heavy IoT-centric user with over 500 wifi enabled sensors and switches. I need to fundamentally understand what happens when I run out of space on tmpfs as I will need to balance the substantial performant increases in reads when visualizing sensor data over a long period of time using the stock analytics of HA vs. the cost of additional RAM added to multiple physical hosts that run the HA guest. I virtualize HaOS with libvirt and also use the live migration features to move the guest to other hosts in my home.

I have around 95GB of sensor data over a year in the stock sqllite db. My HA runs as a VM on the amd64 architecture - I care more about blazing performance vs write cycles. Has there been consideration to creating a “Store 8gb in-memory and the rest, 80gb, on disk”?

Thank you again for your time on this project - it’s exciting to see something you can throw some hardware at to make a fun experience in a UI with a crazy amount of data that you get to collect on your property.

The add-on has a documentation tab, everything is there (OK, should be, I’ve updated it now a bit). And the differences compared to the official MariaDB add-on is highlighted at the beginning.

All the other differences compared to the official version is not so important, eg. it uses a different base image, some DB config tuning, that’s all. There were some s6-overlay optimizations, but it got merged in the official add-on already.

To answer your specific questions:

  • There are some tweaks in mariadb/rootfs/etc/my.cnf.d/mariadb-server.cnf, basically to configure the smallest values for anything that can take up space on the file-system, or not really needed when all the data are in the RAM. But due to my very limited MariaDB experience/knowledge, these are not optimal, I’m sure.
  • The gzip-ed file is an sqldump sql file, can be executed anywhere. I’ve added this to the docs. But it is slow to export, slow to import. It has the advantage, that the deleted statistics values can be imported anywhere if somebody needs them.
  • Yes it blocks HA’s DB access during import (not HA completely). Otherwise HA starts to work on statistics calculation immediately, during data import, and that leads to a catastrophic mess. That is the reason that some DB related warnings should be missed during startup (see logger config in the docs).
  • Do not try it with 32GB database, even if you have enough memory. :slight_smile: Importing SQL and dumping it is quite slow, but it’s not a problem on a <100MB DB (approx. 1 minute on a rPI 3), so I’ve never thought about using other technics.
  • Running out of space. Huhhh, I can’t remember, but nothing good happens. I can’t remember that whether InnoDB was able to answer queries when it was out of disk space. As I remember it required a restart to function properly, but an add-on restart is required after changing config values, like the tmpfs size. But better not to try this. :slight_smile:
  • No, it can’t split DB between mem and disk, if you need this level of performance tuning, you need a real MariaDB (or any DB) installation and fine tuning the settings. I have zero expertise to fine tune MariaDB on this high level. :frowning:

The main goal of this add-on is to run on rPI level devices with 1-4GB RAM and 1-200MB DB, to save their SD card from regular damage by write wear. What you need (I think) is a performance tuning for a real DB installation.

1 Like