Is Home Assistant the right tool for this?

jftaylorMn · November 9, 2022, 5:55pm

I have a high level question and limited understanding of Home Assistant. There has been no response on the HA Discord site, so trying here, instead.

My goal is to get data from an esp32/8266 to some sort of time series database for preserving history and subsequent analysis. Conceptually, ESPHome → HA → InfluxDb(or?) → Grafana will do that. At this time, there is no plan to have HA drive the ESP, but that could change. The type of data in this case is most interesting during periods of change, but is typically static for long periods. In my previous life, I used to work with process historians which had algorithms to only save new data when it fell outside a “deadband” (or the trajectory changed). This approach greatly reduces disk demand on historical data without losing fidelity when it matters. I realize that InfluxDb is not a historian, but features do overlap. Once Influx gets data with a new timestamp, it’s a different record. So, filtering needs to happen earlier. HA has too many facets that I have yet to internalize. From what I’ve seen so far, it appears that HA can optionally save to an external db, but at the same sampling rate. I would have no problem with HA saving relatively high frequency (seconds per measurement), redundant data over a short period (hours or days), but don’t want it in my long term (years) storage. Should HA be part of this solution or should I instead put the “save only when significant” logic at the ESP32 level and make API calls to InfluxDb directly. Any expert guidance?

nickrout · November 10, 2022, 2:01am

HA only saves a new entry to the database when a value changes. So it may do what you seek.

jftaylorMn · November 10, 2022, 2:15am

So, If for example a measurement is temperature and I only care about changes of a couple degrees or more (the deadband), the task in HA becomes a calculation to convert a floating point value to some even integer?

nickrout · November 10, 2022, 5:50am

Yeah, that is easy. Template another entity

Mahko_Mahko · November 10, 2022, 9:08am

Does the more granular data have any utility to you?

If not I would not even send it to HA, just what you find useful.

The or: filter is a great way to do this. For example, you can have a background infrequent time based data transfer, and supplement this with other conditions of interest for which you want instant updates.

or filter

Personally I wouldn’t know how to use Influx + Grafana with an ESP without HA (and don’t need to). I use the add-ons. I find Influx + grafana great for long terms stats. I don’t know much about the new long term stats in HA (for energy?) .

I run Influx alongside MariaDB (I run both add-ons). MariaDB replaces the native HA one, the Influx is extra.

I recall at one point I wanted to analyse my influxdb from python in my desktop, but I didn’t get too far.

Rudd-O · November 10, 2022, 4:58pm

I use Prometheus. I enabled the Prometheus integration on my Home Assistant instance, and have Prometheus scrape HA every 15 seconds. This gets me all the numeric data (many months of it) on my Prometheus, which I can then query easily.

jftaylorMn · November 10, 2022, 10:40pm

@Mahko_Mahko , yes the granular data may be of interest - IF the value is rapidly changing.
I expect to be running locally on minimal hardware with limited/no WAN access, so the primary goal is to eliminate disk growth due to “nearly constant” time series values. It looks like the “or” filter you reference - possibly in conjunction with the delta filter could be used. That might be enough information to convince me to install HA with ESPHome and test it.

I have decades of SQL experience, so could likely write an “after insert” trigger with MariaDB that would delete redundant data (and periodically shrink the database to recover fragmented storage). That assumes that the DB schema (table structure) is documented or obvious. From what I can tell, InfluxDB is tuned for time series data, so I couldn’t leverage my SQL skills, but can learn their query syntax(es) if the multi-dimensional nature of the time series database has clear advantages.

@Rudd-O - this is the first I’ve heard of Prometheus. It looks to be similar to InfluxDB, with focus on time series data. Thanks for the alternative suggestion.

mxander · November 10, 2022, 11:21pm

You can create a new template entity the gets the new sampled value only if it falls outside a predefined hysteresis (maybe using the statistics platform), maybe you can even do it in esphome. Another way would be using micropython and mqtt in the ESP.
But creating a sliding window style compression (like the ones in industrial SCADA / historians) seems to me a bit of a stretch. As long as I remember, to implement it, you need to be able to right in the past, which, as long as I know, is not possible in HA… but it might be possible if you connect the ESP directly to influx.
Good luck and keep posting. It sounds interesting.

MFM

Rudd-O · November 11, 2022, 12:39pm

Prometheus is highly efficient at storing time series data. For values that change over a range instead of totally randomly, you can expect a single series to consume less than a bit per sample.

Prometheus is awesome!