And as a kind of example, what I run is a series of SQL Inserts depending on the type of entity I want to track. Each SQL summarizes/aggregates the values I consider important. For instance, this is the SQL I use for “binary” entities:
sql1=("INSERT INTO ha_datamart_states (cday,chour,entity_id,friendly_name,sensor_type,state,num_changes,state_seconds,kW_consumed,avg_kWh_consumption,max_value,min_value,avg_value) "
" SELECT concat (curdate() - INTERVAL 1 DAY, ' 00:00:00') cday,tmplt.hour chour,tmplt.entity_id,%s friendly_name,%s sensor_type,tmplt.state,SUM(tmplt.state_changed) num_changes,SUM(tmplt.state_seconds) state_seconds,null kW_consumed,null avg_kWh_consumption,null max_value,null min_value,null avg_value "
" FROM ( SELECT ttime.hour,tevents.entity_id,tevents.previous_state state, CASE WHEN date_format(tevents.created, '%%H') = ttime.hour AND tevents.state <> 'last' AND tevents.state <> tevents.previous_state THEN 1 ELSE 0 END state_changed, TIME_TO_SEC( ADDTIME(TIMEDIFF( CASE WHEN date_format(tevents.created, '%%H') <> ttime.hour THEN date_add(concat (date_format(tevents.previous_created, '%%Y-%%m-%%d '), hour, ':00:00'), INTERVAL 60 MINUTE) ELSE tevents.created END, CASE WHEN date_format(tevents.previous_created, '%%H') <> ttime.hour THEN concat (date_format(tevents.previous_created, '%%Y-%%m-%%d '), hour, ':00:00') ELSE tevents.previous_created END), CASE WHEN tevents.state='last' and ttime.hour='23' THEN 1 ELSE 0 END )) state_seconds "
" FROM( SELECT st.entity_id, st.state_id, st.state, st.created, st.previous_state_id, st.previous_state, CASE WHEN date_format(st.previous_created, '%%D') = date_format((NOW() - INTERVAL 2 DAY), '%%D') THEN concat (curdate() - INTERVAL 1 DAY, ' 00:00:00') ELSE st.previous_created END previous_created "
" FROM (SELECT e.entity_id,e.state_id,e.state,e.created, ( SELECT e2.state_id FROM states e2 WHERE e2.state_id < e.state_id AND e2.entity_id = e.entity_id AND date_format(created, '%%D') = date_format((NOW() - INTERVAL 1 DAY), '%%D') ORDER BY state_id DESC LIMIT 1) previous_state_id, "
" ( SELECT e2.state FROM states e2 WHERE e2.state_id < e.state_id AND e2.entity_id = e.entity_id ORDER BY state_id DESC LIMIT 1) previous_state, ( SELECT e2.created FROM states e2 WHERE e2.state_id < e.state_id AND e2.entity_id = e.entity_id ORDER BY state_id DESC LIMIT 1) previous_created "
" FROM states e WHERE entity_id = %s AND date_format(created, '%%D') = date_format((NOW() - INTERVAL 1 DAY), '%%D') ORDER BY e.entity_id ) st UNION ALL "
" SELECT * FROM (SELECT e.entity_id,e.state_id,'last' state,concat (curdate() - INTERVAL 1 DAY , ' 23:59:59') created,e.state_id previous_state_id,e.state previous_state,e.created previous_created FROM states e WHERE entity_id = %s AND date_format(created, '%%D') = date_format((NOW() - INTERVAL 1 DAY), '%%D') ORDER BY state_id DESC LIMIT 1) st2 ) tevents, "
" ( SELECT '00' HOUR FROM dual UNION ALL SELECT '01' FROM dual UNION ALL SELECT '02' FROM dual UNION ALL SELECT '03' FROM dual UNION ALL SELECT '04' FROM dual UNION ALL SELECT '05' FROM dual UNION ALL SELECT '06' FROM dual UNION ALL SELECT '07' FROM dual UNION ALL SELECT '08' FROM dual UNION ALL SELECT '09' FROM dual UNION ALL SELECT '10' FROM dual UNION ALL SELECT '11' FROM dual UNION ALL SELECT '12' FROM dual UNION ALL SELECT '13' FROM dual UNION ALL SELECT '14' FROM dual UNION ALL SELECT '15' FROM dual UNION ALL SELECT '16' FROM dual UNION ALL SELECT '17' FROM dual UNION ALL SELECT '18' FROM dual UNION ALL SELECT '19' FROM dual UNION ALL SELECT '20' FROM dual UNION ALL SELECT '21' FROM dual UNION ALL SELECT '22' FROM dual UNION ALL SELECT '23' FROM dual ) ttime "
" WHERE date_format(tevents.previous_created, '%%H') = ttime.hour OR date_format(tevents.created, '%%H') = ttime.hour OR (date_format(tevents.previous_created, '%%H') < ttime.hour AND date_format(tevents.created, '%%H') > ttime.hour) "
" ) tmplt GROUP BY tmplt.hour, tmplt.entity_id, tmplt.state ")
This query counts the number of times (and amount of time) that the entity was on each state. For example, the kitchen switch will tell me how many times it was ON or OFF and for how long, grouped per hour. Advantage of this approach is that I have only 48 records on a day for one switch, instead potential dozens or hundreds of them. So I can keep a quite big historical information.
For other entities light bulbs or power relays I also keep total consumption and average, and same for temperature sensors…
It was a long time since I played with SQL so I’m sure there is a lot of improvement for the query performance and the process, but so far it worked well for me