Because I don’t have the time to invest hours into installing an unknown database, testing it in a scientifically accurate way and create a valid summary that other people can actually make sense of. That’s the exact reason I used a search engine to find this thread and ask you guys about it
That other test you linked has the exact same problem as the one you posted before. I didn’t know about that one though, so thx
And, in addition to that, yes, I feel like InfluxDB is not only the “ok” choice for Home Assistant, but actually the better one. But that’s just a feeling, so I might be totally wrong. My conclusion for now is that I will happily continue to use InfluxDB until it breaks and I can finally test my backup strategy in a real scenario
I think the performance differences only become clear if you have an extremely large set of sensors logging a lot of data. So from the performance-perspective, I think there is no reason to switch.
In this light, Influx makes perfect sense for home assistant.
The reasons I switched over are:
1- I have vast sql knowledge. So using SQL gives me far quicker results then learning Flux and spend hours and hours to find out how a certain query is written in Flux.
2- Joins! I was baffled reading this: https://github.com/influxdata/influxdb/issues/13498
Joins were not supported like 6 months ago, and are a technical-preview kind of thing.
But… only within influx itself. Say I wanted to join a table containing a lookup-set of the holidays? Good luck!
3- Updates. So in my case my data was slightly off. I wanted to rename an entity because I made a typo when setting it up, and noticed after a month. But this leads to old data still being stored on the old entity-id, and the new data on the new entity-id.
Sorry bro! Update does not exist! You have to remove your old data, edit it somehow somewhere outside of influx, and re-insert in again.
4- Indexes. Influx stores just a blob of data and indexes only it’s tags. So say you want to add an index to a field in your measurement-data? Sorry! Cannot do. Remove your previous data and reinsert it all again using the correct tags.
With timescale (or any postgresql), you can create an index on any json-field you like, or simply create a computed column and index that one. 1 line of SQL. Done!
5- Gap-fill data. Say I missed 1 day of measurements because I was updating home assistant. I would like to ‘copy’ the previous day, so my monthly totals will be somewhat OK. Easypeasy with a SQL-query.
6- Client tools. This last one was the final deal-breaker.
There are simply no client tools available, except for a few command-line tools: https://docs.influxdata.com/influxdb/v1.8/tools/
Where is InfluxDb studio??
So, there you have a nice timeseries db, and no good way to explore your data before you paste your query into a Grafana chart. It’s like having a great engine but no car around it with seats and a steering wheel. And no, the explore-window in influx is not equivalent. Your query won’t be saved, and you cannot have multiple queries open. It’s feels like programming through a keyhole.
This list goes on and on. Influx is nice!. SQL is better. Because it’s more mature.
So, let me turn this question around now. What has Influx to offer that cannot be done by TimescaleDb? Performance and memory certainly not, because I run both side by side on an RPI4
Ofcourse my opinion is biased, and yes, I am the creator of the timescale-db addon.
But the reason for creating this addon is because Influx let me down when I needed it.
Not on performance, it’s fast enough. But on functionality.
I want Influxdb to be better, because it sounds cooler and PostgreSql sounds so 1974…
But in the end, I just wanted my dashboards, and I wanted them quick.
(Btw, I never worked with PostgreSql before in my entire life, I’m a Microsoft SqlServer expert).
So the moral of the story: It took me less time learning how to create a hassio-addon ánd imlement timescaledb ánd learn PostgreSql ánd learn TimescaleDb ánd build my dashboard on top, then learning flux and trying to let it sing without proper tooling.
I am more then happy to give Influx another spin someday when above concerns have been properly adressed. And maybe this time it will blow my socks off and I will never look back at the 1970’s again
Thx for the detailed insight. These are very valid points, if applicable. I guess I just don’t need most of the things you listed. I only use InfluxDB through Grafana, nothing else, and although renaming old entity data would be nice, its not worth the effort to make the switch. Its not just the data that I would have to migrate (which would probably be the easiest task), its also all of my grafana dashboards, and sadly thats not as easy as running a migration script . Also, there is nothing I could name off the top of my head that I dislike in my Grafana setup, that is caused by InfluxDB’s limiting abilities.
Indeed, you should have a very compelling reason to rewrite all your dashboards.
If influx works for you now and in the near future, don’t switch.
But In your original question you asked of there were any people who had made the switch.
Well, I have… only not for performance reasons.
And later you stated you feel that Influx is actually the better choice for HA.
That claim depends on what you are trying to achieve with it I suppose.
In the dashboard above, I calculate the costs of the last charge.
But the electricity-price depends on the time of day (and weekend, and high low tarifs).
So, If I start charging on friday 21:30 PM until saturday 2 AM, what are my real charging costs?
Or what if that friday is a national day-off (so it falls in a low-tariff).
Its exactly that complexity where SQL really shines.
But I really hope that Influx one day gets this super awesome uprade that makes this kind of stuff trivial. That day is the day I will switch back to Influx.
The only reason to really care about SQL is if you want to connect other tools to your data, for example many dashboarding apps which support SQL might not support influxdb. If you just want to make pretty grafana graphs with your data then influx is perfectly fine.
I’m sorry. I don’t share that opinion.
None of my mentioned reasons above are addressed with that statement.
Influx is super nice for simple stuff, don’t let me talk you out of that.
But transforming raw measurements into useful information is a whole different area of expertise.
Mainly because your data is not alone. You need to combine it with a lot of other data sources to be really usefull.
So please take my concerns away with links to solutions.
One other reason for dropping InfluxDB is it’s very limited DB size on 32bit systems such as the Raspi. I’m collecting HomeAssistant data for about six months now and have hit the limit where InfluxDB cannot compact its data files any more (~2GB database size) and will eventually freeze the whole system every other week. Totally unacceptable, especially when considering that databases are meant to serve data that is far larger than the available memory (otherwise I don’t need a database I can simply store everything in memory). Hence I will also investigate switching to TimescaleDB.
Is there any way to transfer collected data from InfluxDB to TimescaleDB or do I have to start from scratch?
Perfect, that worked like a charm. I was able to successfully migrate from InfluxDB to TimescaleDB by keeping all old data. Also connecting Grafana to TimescaleDB was succesful and provides even better visualizations (e.g. handling time intervals in which there is no measurement). I’ll try to summarize my project in the next few days as a reference for others.
what if you don’t specify concurrently? (that would lock out writers, so HA can’t be running)
I am trying to see whether it possible to setup the tables in timescaledb enabled postgresql server, and then restore my data from normal postgresql db.
I got it installed and it didn’t trigger any errors. I can see that extension has been added to the DB, but being new to the Postgres, can you may be put a bit of light for me on how the data is actually stored there, as I can’t see any new schemas added.
That’s getting me confused in how I should query the data from Grafana.
Since no time periods are recorded, but individual points in time (start), you probably have to work with a subquery and, if necessary, determine the next point in time (end) via a lead function. From this subquery you can then determine the sum of the time differences of two points in time (start and end) for each genre.
thanks for your reply @CM000n! Glad to hear it could be possible. Tried it first with influxdb, but as far as I know I can´t get different string states displayed in grafana…
I discovered Timescaledb and PostgreSQL just yesterday and want to learn a bit more about. Can you lead me maybe in the right direction how I can add a subquery for this here:
SELECT
$__timeGroupAlias("time",$__interval),
state AS metric,
max(id) AS "id"
FROM ltss
WHERE
$__timeFilter("time") AND
entity_id = 'sensor.get_the_genre'
GROUP BY 1,2
ORDER BY 1,2