Turn HA into ULTIMATE Data Analysis platform

Hello!
I very much like Home Assistant, but seeing the history is totally painful. I can’t store long periods of time, like years without hacks, and I can’t view lines in the browser because the interface is simplified.
So, I made a project that transforms HA into a fast and beautiful framework to store and analyze data (a lot of it!)
redash

Problematic: the current DB is:

  • slow
  • big

Timeseries data can be stored with more efficiency, for example, in column-oriented databases like Clickhouse (What Is ClickHouse? | ClickHouse Docs).

Due to the use of some features, data can take up 10–1000 times less space, and queries will have the same performance increase.
In special cases (like slowly changing values, not a big subset of categorical string values), the gain can be even greater (if the data is compressed well, it takes up less space and queries are executed faster due to the increased read speed.).
So, I made a set of addons that allow you to store data in a ClickHouse and conveniently explore it using SQL.

It consists of three parts:

  1. PostgreSQL with the pg_ivm extension: creates an incremental materialized view with type casting
  2. ClickHouse with a materialized PostgreSQL engine that replicates PostgreSQL matview
  3. Redash. The best (in my opinion) BI system

You can use all of them or only the needed parts. You can set only Postgres and Redash and not create a matview. In this case, you will get all the advantages of Postgres and Redash without complicated settings (which I will hide under the hood anyway)

NB!

  • Tested only on AMD64 Hyper-V virtual appliance; later will test on Arm

  • The project is underdeveloped; use it at your own risk.

  • It is a first step to integrate Home Assistant with ClickHouse; later, I will develop a custom integration, so PostgreSQL will not be needed.

  • Redash installs through docker-compose and uses a host docker daemon. Redash uses its own PostgreSQL DB, which has a temporary hardcoded volume path on the host system.

7 Likes

First off, I would like to thank you for seeding this project and giving clickhouse a fighting chance for HA.

What are you looking for, if any, in terms of support?

First impressions: These addons are a little raw.

All the data goes to a custom prostgresql addon which doesn’t have the ability to mount a persistent volume. You are currently storing all of the data inside the container.

Is that by design?
Would you be open to mounting the addon data directory to store Postgresql data?
Would you consider using the Perconna version as a base for the image?

For some reason, this isn’t working for me. I have 55G of history in mysql (~2yrs worth of data) and I believe HA is doing some sort of data migration and postgresql is not available.

Would you be open to sharing a roadmap or what your plans are?

I’m trying to figure out if I should throw some time into this addon or victoriametrics?

Hi! I will answer right from the end, and later I will write out the answers to the questions in detail.

In your situation, you do not need to install PostgreSQL at all, all you need is to install the clickhouse addon and use MySQLMaterializedView instead of PostgreSQLMaterializedView. Specifically, in postgres, you can also, for example, not replicate DELETE and configure data rotation so that historical data is stored only in ClickHouse. If you are ready, I will try to finish the addon so that you can use the existing Mysql

P.S. I don’t know anything about victoriametrics, but at first glance, clickhouse is definitely better

First off, I would like to thank you for seeding this project and giving clickhouse a fighting chance for HA.

I want to emphasize that there is not only the ClickHouse Addon but also Redash. At the moment, it’s more of a proof of concept (PoC) than a stable product.

What are you looking for, if any, in terms of support?

Anything: beta-testers, pull requests (PRs), any feedback, and thoughts.

All the data goes to a custom prostgresql addon which doesn’t have the ability to mount a persistent volume. You are currently storing all of the data inside the container.

The volume is persistent; in the Dockerfile, there is VOLUME /var/lib/postgresql/data, but it’s not considered best practice and will be changed in the future.

Would you be open to mounting the addon data directory to store Postgresql data? Would you consider using the Perconna version as a base for the image?

The PostgreSQL part should be totally rewritten. For example, yes, addon data directory and one DB instance for HA and Redash.

Would you be open to sharing a roadmap or what your plans are?

The entire roadmap should look something like this:

  1. Fix things you already mentioned, like mounted data directories, and test on all architectures.
  2. Implement data retention in PostgreSQL and set up long-term data storage in ClickHouse.
  3. Introduce static type casting in HA and ESPHome for sensors (without it, there will always be hacks).
  4. Develop an extension for writing data directly to ClickHouse.
1 Like

For #4 are you thinking a native clickhouse protocol connector (ie: influxdb) or validating an sqlalchamey clickhouse dialect (ie: GitHub - cloudflare/sqlalchemy-clickhouse)?

Would either require a pull request from HA core in order to integrate into the mainline distribution?

I’m ready and eager to start test.

Side note: I believe, by default, docker image builders should provide the facility to mount the service configurations outside of the container image. Advance users often have tweaks to the service configuration that’s specific to their environment. Allowing to mount this file/folder externally as well would be a great advantage for the serviceability of the container.

For #4 are you thinking a native clickhouse protocol connector (ie: influxdb) or validating an sqlalchamey clickhouse dialect (ie: GitHub - cloudflare/sqlalchemy-clickhouse)?

I believe HA has the potential to be a serious high-load project, so a native connector would be necessary.

Would either require a pull request from HA core in order to integrate into the mainline distribution?

Static type casting of course require PR in HA core

Side note: I believe, by default, docker image builders should provide the facility to mount the service configurations outside of the container image. Advance users often have tweaks to the service configuration that’s specific to their environment. Allowing to mount this file/folder externally as well would be a great advantage for the serviceability of the container.

Unfortunately, I don’t have a lot of experience with Docker. If you have any examples of good projects or links to best practices, it would be very helpful.

Thanks

It seems the manual approach of using MaterializedMySQL database works, but as ClickHouse CTO said (They don’t support MaterializedMySQL)

So it seems MVs + MaterializedPostgreSQL database is they way to go since it is in the path of being GA:

Again the best approach here will be to use SQLAlchemy properly but it is a long path

@apborov please feel free to ping me if you need support on your endeavour!

For sure! Clickhouse was build for storing analytics data so yes - it is better to use Clickhouse for analytics. The difference between VictoriaMetrics(VM) and Clickhouse(CH) is that VM is a tsdb for storing metrics, but VictoriaMetrics’ storage is built from scratch using ideas from Clickhouse’s MergeTree table engine - see VictoriaMetrics — creating the best remote storage for Prometheus | by Aliaksandr Valialkin | FAUN — Developer Community 🐾

And VictoriaMetrics have some fundimental flaws in their assumptions about timeseries that cannot be fixed because they are too entrenched;

Anyone who wants to build a TSDB should read and really understand RRD first. It got a bunch of things right, and I’ve not seen another TSDB since that doesn’t get at least one of those things wrong. They pay accuracy and performance penalties for it.