Send metrics to InfluxDB at regular intervals

ambientsound · January 6, 2017, 10:17pm

Home Assistant sends sensor data to InfluxDB only when a sensor changes. This causes holes in the graph drawn by Grafana when rendering an InfluxDB dataset, due to the fact that Grafana only fetches data for the visible time range.

Can this behavior be changed, such that all sensor data are submitted at a regular interval?

gpbenton · January 7, 2017, 9:14am

I think it is influxdb that is doing this see

Point 1 says

For the time series use case, we assume that if the same data is sent multiple times, it is the exact same data that a client just sent several times.

Oliviakrk · November 4, 2017, 8:42pm

But this causes gaps in grafana grapfs…

EDIt: Which can be fixed by using fill optinon in grafana: fill(previous).

magma1447 · May 31, 2018, 9:58am

I find this being an issue as well. fill(previous) is definitely not a solid solution/workaround. I am graphing temperatures in different rooms for my house. The problem is that the temperature is so stable (or to be honest, battery driven z-wave device) that it won’t change for days/weeks. Using fill(previous) only works if there is at least one value within the time period being graphed.

I suggest a similar implementation as OpenHAB has.

They push data from OpenHAB to InfluxDB on every change (just as the HASS component), and on a regular basis if configured, for example daily, or hourly.

My suggestion is therefore to add a new configuration option that allows regular updates, even though the value hasn’t changed.

magma1447 · May 31, 2018, 9:59am

@gpbenton If I understand that correctly, the conflict resolution you are referring to is only if they have the same timestamp. The same data could be sent with another timestamp and then stored in InfluxDB.

gpbenton · May 31, 2018, 2:18pm

AFAIK, influxdb treats items with a different timestamp as different data. It couldn’t really do otherwise.

rspears · June 18, 2018, 6:28am

I’d like this implemented too. I have a bit of free time, so I’m looking at trying to do this myself. I’ll update here and submit a pull request if I figure it out. I know Python well enough to make it happen, the harder part is understanding the architecture of the application and how it works. Right now it’s definitely listening for events and then queueing the individual changed states to be sent to InfluxDB. What essentially would have to be done is instead of triggering based on the listener only, if an option to trigger at an interval is present, the application should send all entity states at that interval AS WELL AS send on changed state.

I’m not super confident in my ability to figure this out but I’m going to look into it. Don’t get your hopes up.

rspears · June 18, 2018, 7:22pm

So I’ve been messing with this today and I’ve figured out that, really, this is an InfluxDB issue. InfluxDB is essentially a time-series database, and when it’s queried (either manually or with something like Grafana), it returns only the measurements within the requested time query. This is the way Grafana works: When you want to view the last 6 hours, Grafana queries the necessary values for that time period. So, this could possibly be fixed in Grafana by maybe getting the last result before the requested time period. It could also possibly be fixed in InfluxDB by returning the last value as the “start” value for the requested time period.

ANYWAYS. I wrote a Python script modeled after the influxdb.py component and simplified (it may be missing some shit that I don’t need right now, but it works great for my application so far).

Because I’m using a virtualenv, I had to make a shell script that activates the venv, runs the script, and then deactivates.

I made a shell_command service with this script, and then I made an automation that runs that shell_command service every 5 minutes.

Here’s what I got
DISCLAIMER: Run this at your own risk, I am in no way responsible for database corruption or data loss or anything like that. It has worked for me so far, for about 30 minutes.

import homeassistant.remote as ha
import datetime, re, math
from influxdb import InfluxDBClient, exceptions
from homeassistant.helpers import state as state_helper

PASSWORD = 'your password' # if necessary
STATES = ['sensor.hallway_thermostat_temperature','sensor.pws_temp_f','climate.hallway','device_tracker.phone','sensor.hallway_thermostat_hvac_state'] # state names you want to update
TIME = datetime.datetime.utcnow()
RE_DIGIT_TAIL = re.compile(r'^[^\.]*\d+\.?\d+[^\.]*$')
RE_DECIMAL = re.compile(r'[^\d.]+')

api = ha.API('127.0.0.1', PASSWORD) # host ip may need to be changed

entities = ha.get_states(api)

entities_to_push = [ent for ent in entities if ent.entity_id in STATES]

def event_to_json(state):
    try:
        include_state = include_value = False
        state_as_value = float(state.state)
        include_value = True
    except ValueError:
        try:
            state_as_value = float(state_helper.state_as_number(state))
            include_state = include_value = True
        except ValueError:
            include_state = True

    include_uom = True
    measurement = state.attributes.get('unit_of_measurement')
    if measurement in (None, ''):
        measurement = state.entity_id
    else:
        include_uom = False

    json = {
            'measurement': measurement,
            'tags': {
                'domain': state.domain,
                'entity_id': state.object_id,
                },
            'time': TIME,
            'fields': {}
            }
    if include_state:
        json['fields']['state'] = state.state
    if include_value:
        json['fields']['value'] = state_as_value

    for key, value in state.attributes.items():
        if key != 'unit_of_measurement' or include_uom:
            if key in json['fields']:
                key = key + "_"
            try:
                json['fields'][key] = float(value)
            except (ValueError, TypeError):
                new_key = "{}_str".format(key)
                new_value = str(value)
                json['fields'][new_key] = new_value
            
                if RE_DIGIT_TAIL.match(new_value):
                    json['fields'][key] = float(RE_DECIMAL.sub('', new_value))

            try:
                if not math.isfinite(json['fields'][key]):
                    del json['fields'][key]
            except (KeyError, TypeError):
                pass
    return json

ifdb = InfluxDBClient('127.0.0.1',8086,'root','root','home_assistant') # these values may need to be changed
ifdb.write_points([event_to_json(e) for e in entities_to_push])

You can see I specified only a few states to update, as updating all states every 5 minutes would probably make the database balloon relatively quickly. I’m going to watch my database size and possibly set a retention policy. I also got the idea of having a short term database that retains 5 min data for a week or so, and a long term database that retains daily and state-change data for a longer time period, maybe a year or even forever. All that would require is essentially creating a new database in InfluxDB and then changing the database name in the script and in Grafana.

Supakdee_Sodsriwiboo · August 22, 2018, 11:25am

Hi rspears,

I would like to use your code but unfortunately I could not find the way to run it.

I changed:
PASSWORD = ‘your password’ --> In ‘’ to my home assistant password
STATES = [‘sensor.hallway_thermostat_temperature’,‘sensor.pws_temp_f’] --> The list to sensors that I would like to update

Put the file @ /home/pi
Run it with “python /home/pi/regular_influxdb.py” (My filename is regular_influxdb.py)

I got the following error. Can you advise what I did wrong & what should I do to make it run?

PS: I tried to set it up as shell command and run it from home assistant - It could not run as well.

When run with command line:
Traceback (most recent call last):
File “regular_influxdb.py”, line 1, in
import homeassistant.remote as ha
ImportError: No module named homeassistant.remote

When run with shell command:
Error running command: python /home/pi/regular_influxdb.py, return code: 1
6:26 PM /srv/homeassistant/lib/python3.5/site-packages/homeassistant/components/shell_command.py (ERROR)

Tried with python3 also the same
Error running command: python3 /home/pi/regular_influxdb.py, return code: 1
6:34 PM /srv/homeassistant/lib/python3.5/site-packages/homeassistant/components/shell_command.py (ERROR)

Tried in virtual environment - Got the long error:
error message.yaml (15.6 KB)

rspears · August 22, 2018, 12:57pm

Hi,

My home assistant instance is set up as a virtual environment. Not running from a virtualenv will cause the first error you posted. What I did is created a .sh file that activates the virtualenv, runs the script, then deactivates. Then I put this in a shell_command thing in home assistant and have it run every 5 min.

I also use python3, but looking at the code I’m not sure if there’s any reason it wouldn’t run on python2. The long error you posted looks like it may be having trouble connecting to the server. Something else that may need to be changed is the stuff in InfluxDBClient(’…’) at the bottom. They are: the IP of the influx instance, the port, username, password, and db name. My influx instance is running on the same machine as home assistant, thus 127.0.0.1 is my ip.

Supakdee_Sodsriwiboo · August 23, 2018, 1:30am

Hi rspears,

Thanks for your reply - you’re right on.

I studied your code deeper in detail around the beginning & the end… Here what I need to do to get it work with my home assistant (my home assistant is running on VENV through SSL):

Enable API Component in home assistant
Change API line in the code to fit with SSL requirement
api = ha.API(‘MY_IP’, PASSWORD, port = ‘8123’, use_ssl = True) # host ip may need to be changed
Change influxdbclient line to reflect my username & password
ifdb = InfluxDBClient(‘MY_IP’,8086,‘USERNAME’,‘PASSWORD’,‘home_assistant’) # these values may need to be changed

Then I could run through command line in VENV but I ran through a lot of problem to try to use shell command to call this simply python “python /home/pi/regular_influxdb.py” (my filename is regular_influxdb.py

Finally I saw your reply today so I created .sh file to run in shell command --> Work flawlessly.

Thanks a lot for your code & help.
Supakdee

ntalekt · September 6, 2018, 11:25pm

Thanks for this script, it really helped.

FYI: I’m on 0.77.3 and getting This class is deprecated and will be removed in 0.77 when running the python. I assume it’s because the API is changing to no longer support the http password and moving to token based. Not sure, haven’t looked into it much.

Anyway, thanks again.

cotlone · April 18, 2019, 1:20am

This quote from the InfluxDB documentation seems to be of relevance, if anyone is still wondering about this issue:

Common issues with fill()

Queries with fill() when no data fall within the query’s time range

Currently, queries ignore fill() if no data fall within the query’s time range. This is the expected behavior. An open feature request on GitHub proposes that fill() should force a return of values even if the query’s time range covers no data.

cdmonk · November 10, 2019, 7:44am

Hi All,

I know that this has been solved, but has anyone found a way to do this? If you are just trying to fill the graphs, yes, grafana has the fill option, however there are some times when performing calculations in grafana, you need to know how often the data is being written. For power calculations I currently write to influxDB every 10 seconds using OpenHab (which I am trying to migrate away from, but this has become a sticking point).

Thanks
Cam

ocedric · March 11, 2020, 4:44pm

Hi everyone… I’m also really missing the “send every xx minutes” feature from openhab…

Did someone get any news about this ?

adner · April 25, 2020, 7:25pm

Hi, for me also interesting feature to have.

chris-kuhr · May 3, 2020, 5:45pm

Hi,

+1!

Best Ck

khouse75 · June 20, 2020, 5:10pm

I struggled with this same issue for a while. Even when updating at specific intervals then doing the correct math, the data end result was never close to what is should have been. I found that the best way to get with 1 kWh over a 24 hour period was to use integral(). For instance, if I want bar graphs for each day of energy used, the following gives me the best results:

SELECT integral("value") /3600/1000 FROM "W" WHERE ("entity_id" = 'home_power_measurement') AND $timeFilter GROUP BY time(24h) tz('America/New_York')

integral basically fills the area under the curve. The more data points you have, the more accurate the final result will be. For instance, my Sense energy monitor updates about every second and is within .1 to .2 kWh of what Sense reports for each day.

sarfata · March 21, 2021, 10:23pm

I also have this issue.

None of the workarounds mentioned here address the main problem for me which is that it becomes impossible to detect when a sensor stops working (runs out of battery, crashes, unplugged, etc).

Right now the InfluxDB database cannot make the difference between “there is no new value of the data” and “there is no data”. I think the only ways to fix this are:
1- send the data more regularly.
2- send a special value when HomeAssistant detects that a sensor has stopped sending data.

How do you deal with this?

thomas

kozmaz87 · April 20, 2021, 5:52pm

I found that the easiest way to deal with this is just to add fake sensitivity to your sensors… make accuracy_decimals: 3

That is noise yes but it will result in a new value more often than not and that will eliminate the gaps in the graphs.

influxdb suggests using the uniq tag but I did not manage to figure out how to do that so I went with increasing the sensitivity and then the graph just cuts it back in grafana to usable values.

Send metrics to InfluxDB at regular intervals

Common issues with `fill()`

Queries with `fill()` when no data fall within the query’s time range

Send metrics to InfluxDB at regular intervals

Common issues with fill()

Queries with fill() when no data fall within the query’s time range

Common issues with `fill()`

Queries with `fill()` when no data fall within the query’s time range