Implementing Image Caching for brands.home-assistant.io

Buzze · September 23, 2023, 10:46am

I would like to introduce a topic that has the potential to significantly improve the efficiency and privacy of the HA instances: “Caching images for the `brands.home-assistant.io’ locally.”

In today’s digital landscape, user privacy is of paramount importance, and images are frequently shared and accessed in various applications. Typically, these images are fetched from a central server each time they are requested, which can inadvertently expose sensitive user data and compromise their privacy.
And since Home Assistant’s 1st pillar is privacy (from Privacy, Choice, and Sustainability).
However, by implementing image caching, we can address these privacy concerns effectively.

Caching images locally can alleviate these issues by storing copies of frequently accessed images closer to the end-users. This allows subsequent requests for the same images to be fulfilled without placing additional load on the central server. Instead, the cached images can be served directly, leading to faster load times and reduced bandwidth usage.

To make this discussion productive, I encourage HA community members to share their experiences, best practices, and recommendations for implementing image caching. We can discuss different caching strategies, tools, and technologies that have proven effective in various contexts.

I look forward to hearing your thoughts, experiences, and insights on this topic. Your contributions will undoubtedly benefit the entire community and help us make informed decisions about image caching strategies.

tom_l · September 23, 2023, 2:00pm

There are only two hard things in Computer Science: cache invalidation and naming things.
– Phil Karlton

_{and off-by-1 errors.}
_{– Leon Bambrick}

steverep · September 26, 2023, 8:32pm

You are correct that HA is currently not doing a great job of caching the brand images locally. There are basically 2 ways to do it: service workers or standard browser HTTP cache. And coincidentally, there are issues and pull requests currently open to deal with both:

As for the privacy concerns, I guess I don’t understand how you think sensitive user data might be leaked? The frontend is open source so anyone can see exactly how the requests are made (just through standard img HTML elements or CSS background-image).

neohidra · September 26, 2023, 9:08pm

Is there an issue, which is preventing the distribution of these images with the Homeassistant ?
Not haviing access to the Internet gets you no brand images at all.

The old onboarding guide did not work, fully, without an Internet connection, I haven’t tried the new one yet.

These two issues I believe can be solved easily by the devopers.
Am I wrong?

steverep · September 28, 2023, 1:11pm

AFAIK, the main reason for not distributing them with HA is size. If they were, the install would contain hundreds of MB of images, most of which a single user would never use.

The issues and pull requests I linked would solve the problem with internet outages (as long as you’ve accessed the UI with internet on that device recently so they can be cached).

As for onboarding, I think the new design might have solved that, but I’d have to check.

steverep · October 4, 2023, 5:53pm

FYI, there is a historical discussion on this topic from Month of "What the heck?!" (2020)

Buzze · October 6, 2023, 4:40pm

I think one of the possible options would be to have a choice about downloading the necessary images that are in use once and if you download more extensions in the future then download the images again to update them.

steverep · October 6, 2023, 7:01pm

When configured properly, that’s essentially what service workers and browser caches do, but in a much more seamless way. They download something, keep it locally for a while, and periodically check if it’s still valid or needs an update. There’s no need for a user task to complicate this.

Buzze · October 8, 2023, 5:10pm

It would be nice if it was stored on the HA server instead of the browser, because some of us set the browsers to auto delete cache when closing it and in that scenario it wouldn’t work. Also if it was blocked on a network level (the brands domain) it would disappear after some time anyway from the browser cache too.

steverep · October 8, 2023, 7:39pm

It would be nice if it was stored on the HA server instead of the browser, because some of us set the browsers to auto delete cache when closing it and in that scenario it wouldn’t work.

First, that is not generally correct as such a setting only clears the browser’s HTTP cache. If it’s a modern browser with HTTPS access, the service worker is doing the caching and you’d have to clear the site data instead to get the same effect.

Second, if you do either of those, you lose all caching benefits and force re-downloading the entire frontend on each session. I don’t really understand how that position comports with a topic titled “Implement image caching” and cites reduced bandwidth and download time.

Also if it was blocked on a network level (the brands domain) it would disappear after some time anyway from the browser cache too.

You seem to be arguing less for caching and more for simply not using the brands server. That’s not necessarily how I read your OP, but okay.

In that case, I’d go back to my earlier question as to how and what user sensitive data you feel is being leaked? Frankly, you are giving away the same, if not more, anonymous data by reading HA’s documentation or even just downloading the software. And you certainly give away much more with a profile and posts on this forum.

Also, your claim of reduced download time and bandwidth then becomes highly dependent on your local hardware/traffic, your ISP plan/hardware/traffic, and where your browser is relative to your local instance (which is not necessarily next to it). Your local instance certainly will not always win, especially if you aren’t home.

demlak · May 31, 2024, 12:20pm

Dear @steverep,

You can repeat that as often as you like, but that doesn’t make it true. Meta-data that is collected on a permanent basis is something completely different from data that I provide without compulsion and on a case-by-case basis. And also the content is very different between connection-data of my hassio to the brands server versus my participation on the forum… and also: my using of the forum makes it possible to make a more detailed profile in combination with the connection-data to the brands server… the masses make the difference!

it is an absolute mess that data-connections are established at all without informing the user.

The Very first sentence on the main Website:

Open source home automation that puts local control and privacy first.

Where is the control in here? Where is the privacy in here?

your comparison is massively misleading and serves to confuse people who take the issue of privacy more seriously than you do. To proceed in this way is reprehensible.
the correct way would be to be honest and say that you absolutely want to collect the data and thus be as transparent as possible.
The best way would be to implement an opt-in procedure:
Users should be able to choose whether they want to save the icons locally to get maximum self-sufficiency and data sovereignty.

Same for DNS-Leak: