✍️ Scribe: Long-Term History for Home Assistant (TimescaleDB)

Hi everyone! :wave:

I’m sharing Scribe (GitHub - jonathan-gatard/scribe), a custom component to record Home Assistant history to a TimescaleDB database.

:handshake: Feedback Wanted!

I’m just starting out with this project and I’m very open to advice!

  • Is the architecture sound?
  • Are there features you’d expect from a tool like this?
  • Any tips on optimization?

I’d love to hear your thoughts! :chart_with_upwards_trend:


:baby: For Everyone (The “Why”)

If you want to keep years of sensor data without slowing down Home Assistant, Scribe is for you. It runs alongside your default recorder.

:white_check_mark: Pros:

  • Infinite History: Keep data for years without bloating your main HA database.
  • Performance: Your Home Assistant stays fast because it doesn’t have to manage massive history files.
  • Visualization: Perfect for creating beautiful dashboards in Grafana.

:x: Cons:

  • Setup Required: You need a running TimescaleDB instance (Docker or Add-on).
  • External Viewing: This data is optimized for external tools (like Grafana), not the built-in “History” tab. (Though you can use custom cards like apexcharts-card or history-explorer-card to view it in dashboards).

:nerd_face: For Advanced Users (The “How”)

Scribe is built to be a “set and forget” high-performance writer.

  • Async-First: Built on asyncpg. It writes data asynchronously and never blocks the main event loop.
  • TimescaleDB Native: Automatically manages Hypertables and Compression Policies.
    • Compression: Typical savings of 90-95%. (e.g., 50GB of raw data becomes ~2.5GB on disk).
  • Rich Context: It doesn’t just dump states. It syncs Users, Areas, Devices, and Integrations to dedicated metadata tables.
    • Example: SELECT * FROM states JOIN areas ON ...
  • Resilient: Features an in-memory buffer. If the DB is down, Scribe queues events and retries automatically.

:floppy_disk: Data Examples

Scribe stores data in a structured way that is easy to query.

States Table (states):

time entity_id state value attributes
2023-10-27 10:00:00 sensor.temp 20.5 {"unit": "C"}
2023-10-27 11:00:00 binary_sensor.switch on

Data Tables:

  • states: Records all state changes.
  • events: Records all events (if enabled).

Metadata Tables:

  • users: Syncs HA users (name, is_admin, etc.)
  • areas: Syncs areas (Living Room, Kitchen…)
  • devices: Syncs device info (Model, Manufacturer…)
  • entities: Syncs entity registry info (Platform, Domain, Name…)
  • integrations: Syncs config entries (Domain, Title…)

Keywords: Home Assistant, TimescaleDB, PostgreSQL, PSQL, History, Recorder, Long-term Storage, Grafana, Analytics, Database, Custom Component, Data Science, Big Data, SQL, Hypertables, Compression, Dashboard, Visualization, InfluxDB alternative, LTS.

3 Likes

Hi,

I’ve been running Scribe in production for 15 days. The result: a clean database and numbers that speak for themselves.

My “states” table holds the equivalent of 314 MB of historical data, but thanks to TimescaleDB’s native compression, it effectively takes up only 15 MB on disk.

The integration runs silently, the monitoring sensors are accurate (size, ratio, events…), and my Home Assistant recorder can finally breathe. :rocket:

Hi Jonathan,

Wow, I am obviously the first one answering on your release.
First of all:
Thanks so much for all your efforts, the documentations is nice, the (general) purpose is clear :slight_smile:

How I came to your integration (took me a while as well)
Requirement:
Saving data long term in a database and reading it with Home Assistant Standard Cards and Grafana.

Problem
Recorder with a pure transactional DB is not made for it.
Grafana might able to do, but again not made for it

Potential Solutions:
1.) InfluxDB and Grafana - Possible, but Home Assistant Cards can only access InfluxDB data with quite some work (also in coding)

  1. Using transactional DB? → Not made for time series data

3.) I came to know about LTSS, but unfortunately there is no further development. But luckily, I came accross this beautiful custom integration :slight_smile:

Where I am?
1.) I installed Postgresql in Docker (ngosang/timescaledb-postgis - Docker Image)
2.) I installed scribe with HACS

What I understood so far, the recorder DB is actually saving short and long term statistics as well.

state_class is measurement and state_class is total or total increasing will be saved short term whereout the long term statistics as an average will be build.

You already get where I am struggeling, right?
What data should I save exclusively by scribe?
What data makes sense to save in both places (recorder and scribe)?
There is obviously a use case, otherwise you would not have building it, right? :slight_smile:

Recorder vs Scribe - Using the same DB?
Is there any reason why I should not use the postgresql as target for the recorder?
A.) I was thinking if different credentials but the same database should be used?
B.) Or if I would create another database inside the same postgresql with different credentials.

In case A.)
I need to make sure I am not saving the same data twice, at least this would a wise idea, right?

In case B.)
Its the same host having the DB, but completely isolated from the scribe database.

Advantage of both cases I could get rid of the mariadb I am currently running.

I’d love to hear about experience and recommendations. :slight_smile:

PS: I understood, the HomeAssistant Cards can easily access the scribe database and visualize without Grafana, correct?

I am not Jonathan but I am searching for a possibility to safe the history of my device_target.
I would like to have my own Google Maps timeline. HA provide this feature already but the data is pured after 10 days. :frowning:

Maybe scribe is a solution for this?

Probably it is.
I mean, you can save the data as timeseries and with the sql integration easily pull it back.

But… its a new custom integration… will it survive for some years?
@Kalypox might know but is quite silent at the moment

This is exactly what I’ve been looking for, thank you! Had been trying LTSS but this is definitely more user friendly for my level of db knowledge.

1 Like

Hi,

To answer your confusion, here is the clear distinction:

1. The “Why” of Scribe vs Recorder

  • Recorder (Standard HA): Designed for short-term operational history (usually keeps 10 days of raw data). For long-term, it downsamples data into hourly statistics (min/max/mean). You lose the granular details (the peaks, the exact timing of events) after a while.
  • Scribe: Pure Data Archiving. It keeps all raw states forever without downsampling, but uses TimescaleDB’s compression to keep the disk usage extremely low (90-95% compression).
    • So keep both: Recorder for your daily Logbook/History in the HA App. Scribe for keeping a 1-year history of your temperature sensor with second-precision for beautiful Grafana dashboards.

2. Database Setup
I strongly recommend Option B (Separate Databases). You should definitely use your PostgreSQL instance for both, but keep them isolated:

  • Create a database homeassistant for the standard Recorder.
  • Create a database scribe for Scribe.
  • Use the same Postgres user or different ones, that doesn’t matter much.

Scribe and Recorder have different schemas and lifecycles. Trying to merge them into a single DB tablespace is a recipe for conflicts and headaches during migrations. Using the same host/instance is perfect and very efficient.

3. Visualization
Scribe is primarily designed to be consumed by Grafana (or other SQL-compatible tools). Standard Home Assistant cards read from the Recorder/LTS database.

While you could technically create SQL sensors in HA to read back from Scribe, the standard “History” card won’t use Scribe data. Scribe is your “Data Warehouse” for deep analysis and long-term dashboards alongside HA.

Hope this clears things up!

1 Like

Yes! This is exactly what Scribe is perfect for.

Home Assistant’s standard Recorder purges device_tracker history after 10 days because keeping GPS coordinates every few seconds fills up the database very quickly.

Scribe solves this: It stores all state changes, including latitude/longitude attributes from your trackers.
It uses TimescaleDB compression, which is incredibly efficient for repetitive data like GPS coordinates. You can keep years of location history for negligible disk space.
Visualization: You can then use Grafana with the Geomap panel to visualize your own private “Google Maps Timeline”, entirely self-hosted and persisted forever.

Check what I do:

Give it a try! Just make sure to include your device_tracker entities in the Scribe configuration.

1 Like