Storing MQTT messages in DB after receiving

Tags: #<Tag:0x00007f326dccf5f0> #<Tag:0x00007f326dccf528>

Hello all,

I’m new to this community but have been playing around with home assistant for a bit now.

I currently have a setup where I receive an MQTT message every 20 minutes from a sensorboard in my beehive. This message updates an MQTT sensor, and then the sensor data gets stored into InfluxDB.

Although the solution above works, I’m not very fond of it. InfluxDB seems to poll the sensor at a certain interval, whereas I’d prefer a solution that just stores the MQTT message whenever a new one is received instead of continuous polling (especially with the low frequency updates). I’d also prefer if it were stored in an SQL type database like MariaDB instead of InfluxDB.

Is there a way to create a solution like this in home-assistant? Could anyone point me into the right direction?

Thanks, any help would be appreciated! :smiley:

Please check if force_update: true in the mqtt device configuration. If it is so then the device will poll at smaller frequencies even without a new payload recieved.

InfluxDB doesn’t poll the states, HA pushes the states to influxDB.

For long-term data storage influxDB is a better fit than MariaDB/MySQL, because it is a time-series based database.

I wrote a guide on how I setup PostgreSQL for short-term data (1 week) and influxDB for long-term data here.

Welcome!
If you are comfortable with databases and SQL, as a first iteration you can setup a MariaDB database server and use it as Home Assistant’s data store for it’s ‘recorder’ function, see info at link below. Depending on how beefy your HA server is, you can run the database server on the same server using standard linux install methods, docker or even a vm. Or run the database server on a separate machine. With this setup it is now easy to access the data coming into Home Assistant with standard database query tools and data science tools like juypterlab.

I do this using Postgresql in docker on a Intel i7 that also runs HA in docker. I have found it very successful. I have 3 years of history in Postgresql.

I ran InfluxDB and Grafana as well for a while, but did not find enough value to keep educated on another set of tools when I was already using python, matplotlib and juypter for ML and visualization. Others find the value of Influx and Grafana worth the effort.

I have a quarter trillion records in the Postgresql database and have found that performance of database is fine. Home Assistant is inserting between 30k and 40k records per hour into the Postgresql server. And when I ran Influx in parallel with Postgresql, the storage compression was not significant, as HA’s data is relatively horizontally small.

This nice thing about first moving to an external relational database is that you size up the performance and then decide if you needs require a time series database for analytics. Or since you already have a Influx instance running, you can run MariaDB in parallel and compare your experiences.

Home Assistant has some juypter notebooks at the second link below will give you a start on the data model directly in HA. I started here but do most work directly to Postgresql database with my own notebooks. There are other ML packages as well, Orange, is a great starting place. Link below.

https://www.home-assistant.io/integrations/recorder
https://data.home-assistant.io/
https://orangedatamining.com/

Thanks for the suggestion! I’ve tried both force_update: true and false. It seems that false will only push a the value if the value has actually changed from the previous one, whereas true will store it whenever it gets polled regardless.

It doesn’t seem to actually change the frequency of polling itself though. Maybe I’m wrong though.

Thanks for your response. I’ll take a closer look at your guide this week and see if that would work for me.

Is there any way to change the interval at which home-assistant pushes the data into influx (or other DB)? With force_update: true, and fill set to none in influx I can see that a new value is stored every 10 seconds. This is way too frequent for my use regardless of where the data is stored.

Thanks for your response! Coming from a data-science background I’m much more comfortable with a relational DB and doing my visualizations in R / Python, so that Jupiter lab add-on is a great suggestion.

I’ll have to take some time to read through your links, but it seems like a promising starting point!