Extracting data from HA database for further data analysis

OldSurferDude · May 15, 2023, 1:13am

I wrote this in Google Docs. I’ll copy and paste here but the formatting is better in the Google Doc

Accessing HA database

Background

The project is to calibrate the temperature reading of a DHT sensor. These sensors are notoriously inaccurate. Even the +/- 2°C spec is questionable. Empirical data suggests that there is an offset from the actual temperature, which, in itself is +/- 2°C. Then there is an additional +/- 2°C at any moment in time.

To test the sensors, there needs to be an accurate temperature around the sensor. The was approached using a temperature controller and an old, non-electronic toaster oven. The oven is configured to be always on. The oven power is controlled by the sensor, that is, if the temperature is too low, it slowly raises the temperature by pulsing the power to the toaster oven until the oven is in range. If it is too warm, no pulses are sent. This controller is essentially a bang-bang controller, whose temperature ranges +/-0.5°C. It is not adaptive in the sense it does not learn how much power is needed to maintain the temperature.

The temperature sensors are connected to micro computers (μC) that have a radio which is used to communicate with a single board computer (SBC) that also has a radio. The SBC acts as a gateway which sends the data over a network to HA. HA, in turn, stores the data in its database.

Up to this point, the software to gather this data is barely adequately documented. The program running on the uC can be found here.

The version HA used for this project is: (Settings→About)

Home Assistant 2023.5.2

Supervisor 2023.04.1

Operating System 10.1

Frontend 20230503.3

Special note:

After many attempts on the HA Forum, Discord and general searches, I used chatGPT which helped me find this solution. Relative to the former methods, it was a pleasant experience. It did not provide any arrogant and snarky comments typical of those few who get off on that type of behavior. It is important to note that chatGPT relies heavily on original content. If what one is trying to do has not yet been done, it will be unlikely that chatGPT will provide a solution. This stresses the importance of sharing our wisdom and is my motivation for creating this document.

Accessing the data in the database

This is not well documented and what is documented fails. The purpose of this document is to fill in some of the gaps of that documentation.

Description of the failure

The first problem is that, while documentation exists, it may not be applicable to all versions of HA.

There is the implication that the Add-On SQLite Web (Settings→Add-ons→[ADD-ON STORE]), a graphical user interface (GUI) to the database will be the way to go. But after installing it, it will not start.

Using the HA command line to extract the data

It is possible to run a query from a command line in the HA terminal. (Settings→Add-ons→[ADD-ON STORE]–>Search “terminal”–>Terminal & SSH…be sure to turn on “show in sidebar”)

Here are the steps

sqlite3 must be installed into HA from the HA terminal… Use the command:

apk add sqlite

Move to the directory of the database. If the terminal was just opened, the command is

cd config

Create the file q.sql with the SQL statement that will extract the data then exit the SQL interpreter (the name is arbitrary but used in the command)
The first line will be the SQL statement

SELECT states.last_updated_ts, states.state, states_meta.entity_id FROM states

JOIN states_meta ON states.metadata_id = states_meta.metadata_id

WHERE states_meta.entity_id = ‘sensor.arduinodevice_130_4’

ORDER BY states.state_id ASC;

This is complicated, so follow closely.

Four of the columns in the states table are used:

last_updated_ts This column contains the timestamp of the datum. It is in unix epoch format which requires 16 bytes

state the data

metadata_id a number that is uniquely associated with the entity_id

state_id everytime the state of any device changes, the new state of that device is entered into the states table. The state_id will be the current highest state_id plus 1. Other fields of that record will be filled, too, like last_updated_ts and metadata_id.

Two of the columns are in the states_meta table are used:

metadata_id a number that is uniquely associated with the entity_id

entity_id the name of the device.

sensor. Is the way HA identifies the device as a sensor

arduinodevice is they way the sensor identifies itself

_130 is the device’s MySensors node number (NODE_ID)

_4 is the sensor number within the device (CHILD_ID)

Here’s what’s happening:

The entity_id (sensor.arduinodevice_130_4) is found in the states_meta table. The metadata_id (of the states_meta table) in that record is used to find all the records in the states table that have the same metadata_id (of the states table). All of the found records are sorted by the state_id (of the states table). Only the fields last_updated_ts, state, and entity_id, will be printed to stdout. (stdout is usually the screen, but can be redirected to a file)

Whew!

The second line will be the interpreter directive to exit the SQL interpreter

.exit

The command to use SQL statement s:

sqlite3 -readonly -csv home-assistant_v2.db <q.sql >device.csv

sqlite3 is the SQL interpreter program

-readonly is to ensure that the database isn’t changed

-csv means the data will be formatted as comma-separated-variables (spreadsheet file)

home-assistant_v2.db is the name of the database

<q.sql take the commands from this file

device.csv puts the result into a file of the name device.csv

Moving the data into a location useable by a spreadsheet

Now the trick is to get the output file (device.csv) to a directory that can be read by MS Excel or LibreOffice Calc. Prepare for hoops big and small, high and low.

There are two parts, make HA a network attached storage (NAS) and connect your PC to the HA NAS.

Install and configure HA NAS

In HA, (Settings→Add-ons→[ADD-ON STORE]–>Search “samba”–>Samba share→[INSTALL]

Then, at the top, click on the [Configuration] tab

Enter (and remember) a Username and a Password, A Workgroup of WORKGROUP seems to work.

This is the important part! In the [Allowed Hosts] dialog box, enter the IP address of the PC that will be connecting to the HA NAS. You can enter multiple IP addresses.

Also important. Scroll to the bottom and click on [SAVE]

Connect to the HA NAS from a PC

Linux

Linux is arcane, terse, annoying, but, with a little searching, there is help and good examples.

Ensure that cifs-utils is installed

sudo apt install cifs-utils -y

Create a mount point for the NAS

mkdir /home/homedirectoryoflinuxuser/NASHA

Append the line to /etc/fstab

//the_IP_address_of_HA_instance/config /home/homedirectoryoflinuxuser/NASHA cifs user=remembereduser,password=rememberedpassword,vers=1.0,uid=1000,iocharset=utf8,auto,rw,exec,x-systemd.automount 0 0

Mount the NAS

sudo mount -a

Windows

Open up the file explorer

In the directory address box (not to sure of the name, it’s towards the top, to the left) enter

\/the_IP_address_of_HA_instance\config

A dialog box should pop up requesting a username and password. Enter the remembered credentials.

It is recommended to check the remember credentials box.

One error is resolved by ensurinf that the IP address of the PC has been entered into [Allowed Hosts]. Other errors are beyond the scope of this document.

Move the results to the PC

It is recommend to “move”, as opposed to “copy” the results to the PC. If left on HA, it may cause an out of space error on the container/virtual box in which HA is running.

Once moved, the data is ready to be analyzed.

HeyImAlex · May 15, 2023, 2:36am

Or install this, add the entity(ies) you want to extract, select the time range and click this:

Save the resulting csv from your browser wherever you want to. Done.

meierjo · January 17, 2024, 5:30pm

Highly recommend the History Explorer card. Get it while you can!

Thank you Alex for all the hard work. Good luck on whatever is next for you.

meierjo · January 17, 2024, 6:44pm

Thanks OldSurferDude - this write-up is very helpful.

Going to chime in for those who need to select data with a bit of date filtering. This is how I select values of a Temp, Fan, and Climate entity from the database backup using a bit of date filtering. This is working for me as of 2024-01-17.

Bit of background first:
I need to test out some math I’m using in a trigger/automation. I’d like to be able to gather some initial data from real sensors, see how well the triggering algorithm is working and then tweak the trigger and test again - USING THE SAME DATA.

So after I get my initial data, I invoke a full backup. I download that full backup tar ball from Home assistant to my local machine, unpack it and open the Home Assistant Database file (home-assistant_v2.db) with DB Browser for SQLlite.

Once the db file is all loaded up, I’ll execute my query (below) to grab that initial data, tweak the query until I’m sure I have the right data, and save the results as a CSV file using said feature of DB Browser for SQLite.

Now the fun part: I have a Python script parse the CSV file and send the data back to Home Assistant via the REST API. I have duplicated my automation/trigger in HA to look at the replayed data that the Python script is sending in either in real time or faster/slower depending on options in the Python script. The script also calculates the variable send rate - so the data is replayed pretty close to how it came from the initial sensor run.
NOTE: REST API entities are temporary - they do not survive a HA restart and are created upon the first POST from the python script.

I use the Samba integration to drop my CSV file on the HA server’s config directory, and a button in the UI to invoke the script to replay the data - sit back and see the automation trigger then tweak and test different algorithms.

I also used a GPT that was constrained to only follow the HA documentation - link. Many thanks to u/Jendosh for creating and sharing !

Here’s the sql query I use in DB Browser for SQLlite.

SELECT strftime('%Y-%m-%d %H:%M:%f', states.last_updated_ts, 'unixepoch', 'localtime') AS 'Time', states.state AS 'Value',  states_meta.entity_id AS 'Entity' 
FROM states 
JOIN states_meta ON states.metadata_id = states_meta.metadata_id 
WHERE states_meta.entity_id = 'sensor.YOUR_TEMPSENSOR_ENTITY' 
AND
	strftime('%Y-%m-%d %H:%M:%f', last_updated_ts, 'unixepoch', 'localtime') BETWEEN '2024-01-10 00:00:00' AND '2024-01-10 07:10:00'
UNION
	SELECT strftime('%Y-%m-%d %H:%M:%f', states.last_updated_ts, 'unixepoch', 'localtime') AS 'Time', 
	states.state AS 'Value',
	states_meta.entity_id AS 'Entity' 
FROM states 
JOIN 
	states_meta ON states.metadata_id = states_meta.metadata_id 
WHERE states_meta.entity_id = 'climate.YOUR_CLIMATE_ENTITY' 
AND
	strftime('%Y-%m-%d %H:%M:%f', last_updated_ts, 'unixepoch', 'localtime') BETWEEN '2024-01-10 00:00:00' AND '2024-01-10 07:10:00'
UNION
	SELECT strftime('%Y-%m-%d %H:%M:%f', states.last_updated_ts, 'unixepoch', 'localtime') AS 'Time', 
	states.state AS 'Value', 
	states_meta.entity_id AS 'Entity' 
FROM states 
JOIN 
	states_meta ON states.metadata_id = states_meta.metadata_id 
WHERE states_meta.entity_id = 'fan.YOUR_FAN_ENTITY' 
AND
	strftime( last_updated_ts, 'unixepoch', 'localtime') BETWEEN '2024-01-10 00:00:00' AND '2024-01-10 07:10:00'
ORDER BY strftime('%Y-%m-%d %H:%M:%f', states.last_updated_ts, 'unixepoch', 'localtime')