First time posting. Have been a little active on the subreddit, but I need more specifics before trying to dive into a thing and it being not the wisest choice for what I’m trying to attempt make use of the collected interior and exterior environmental data. I’m using tons of Xiaomi/Aqara WSDCGQ11LM’s which have the SHT30 temperature sensor. However, since the firmware sends out the temperature once there has been a significant enough change, I have to deal with quantization noise. This is somewhat mitigated by taking the collective mean of them, but the derivative of the resulting curve is messy showing that I need to shape the data to make use of it.
This is where my problem lives. I’ve tried the built-in low pass filter, 30 samples, 6 for the time constant, but I got a temporal phase shift like what I’d expect with a moving average. Speaking of those, I have to “double tap” an MA filter to get reasonable performance out of it, but the noise I’m dealing with is over longer time frames, 1-2 hrs, that appears to be resulting of the “random” temporal updates from the Zigbee sensors. Some would say, “okay, fine, just use a larger window for the moving average” but if I do that I have to deal with the temporal phase shift that’s half of the duration of the window, thus negating any “live” nature of the data.
In past when dealing with old noisy sensors like this, I could log the data and shape it with some special, strong algorithms like Savitzky-Golay filtering. And then there is polynomial fitting. In order to implement these to “get” “clean data” that I can then manipulate for rate of change or more comprehensive analysis, It seems like I’ll have to build something myself. I’m new to HA’s ecosystem so I don’t know where to start per se. I’m a hardware and math guy, not so much software development. I’ve done my share in BASH, can read C with some light experience in it, haven’t messed with python much (where I’ve been pointed) and I’m not quick, but I’ll make due. Where to I get this started in HA? How’d you go about it?
Edit: here’s a graphed sample of what I’m dealing with
Finally got around to setting up InfluxDB & Grafana. Visualization is fairly pragmatic, with my case where things can be sorted a bit better. I’m still looking at shaping the data, but now I’m going to poke it a bit more temporally, or at least try to and I hope the Aqara sensors behave in kind without deleterious effects.
Ideally I’ll be implementing Savitzky-Golay filtering, but in order for it to be proper, I need my data be be equally spaced temporally. Thanks to Grafana, I can see that this isn’t the case. I’m now looking at rate-limiting my sensors’ mean output, but moreover, I always need an output at a given time. The mean updates when a sensor sends an update. I know they function via local push, not local polling. I’m hoping I can ping a [random] device and force it to present an update to thus update the mean which is rate limited.
The other option is to do part of what I thought I may have to do, which is fitting a polynomial regression on the existing data to then pull the derivative from it, which could be good or bad. It seems redundant to make a poly fit the data, generate interpolated points to filter, and then fit another poly to it, but yeah…
Anyhow, here is the Grafana output, which explains what the noise source actually is more related too, the temporal updates from temperature changes coalescing.
One additional step down, used Grafana to do the temporal interpolation and then smoothed that with a center windowed mean. Passed that output to a polynomial regression. Now I’m looking at how to get the output of the polys back into HA, which doesn’t seen to be a normal thing to do, but is feasible, possibly.
Edit: Separate dashboards for the sake of debugging
I can’t quite help you with getting the data back to HA as I don’t have the containers, but I did find some things regarding the update frequency, if you haven’t tried them yet.
You should be able to increase it by increasing the precision of the sensor, which can be done with Zigbee2MQTT:
Maybe increasing the update frequency will help. Although I think, also after looking after the wikipedia entry for the Savitzky-Golay filter (well, mainly from my signal processing classes), having an actual fixed update interval would be better. I’d personally advice to use ESPHome and some I2C temperature sensors for that. If you have some minor knowledge about wiring electronics its really easy, the coding part is just some YAML.
That aside (I don’t know if you want to change your hardware), for implementing the actual data processing I do think python is indeed your best option. Since home assistant runs on it too, it integrates very well into the ecosystem. If you would really prefer using a different languages, you will probably need to set up a connection to the API server.
For Python though, the best starting points would either be pyscript or appdaemon.
pyscript runs within your home assistant instance, so it is a bit more limited, or at least has more chance of breaking your stuff. Though if you are just going to analyse data, it should be fine and is probably easier to work with in regards to getting the sensor data into your script. https://hacs-pyscript.readthedocs.io/en/latest/
You could create your own custom integration which reads historical sensor values from the DB and then outputs a new, filtered sensor value. Or, a faster but similar way might be to use the pyscript custom integration and only write a small amount of Python: GitHub - custom-components/pyscript: Pyscript adds rich Python scripting to HASS.
The low pass filter is an EMA, in case you didn’t notice that.
I still have a bit to learn how HA/HAOS, though I do plan on migrating out to running a HA Supervised or containerized via proxmox or similar. I’m much more accustomed to CLI or a more full GUI. I cut my teeth on Gentoo linux, but due to Life, didn’t keep up with software so YAML, containerization, calling external APIs and other more “recent” concepts I’ve fallen behind on.
I have things up and running in ZHA. Read lots of back and forth on Z2MQTT vs ZHA. I’m likely going to opt for moving off of ZHA vs buying new hardware, though I’m curious if I can maintain/transfer/link per-entity history after migration.