I started to develop a custom integration for monitoring and managing Crypto mining rigs. I noticed when the async_setup in code was called, HA provided a complete dictionary of every configuration data. For ex: the dictionary includes the username and password I entered for InfluxDb, the Ewelink username and password I entered for sonoff integration. Isn’t this security flaw in the HA design? This means as a user, we have to be really careful when we install integration, especially from the community store. Otherwise, a malicious component could simply collect and share the sensitive information.
Any information needed for an integration should be requested and shared only with explicit user approval. Simply sharing all available configuration data to every integration may sound convenient for development, but is not right!
Now, the flexibility of Python makes it bad for security generally speaking.
I’m pretty sure a malicious custom component could find the secrets even if it was not “offered” in the hass object…
I can’t give any knowledgeable response to the detail of this but honestly, the whole secrets.yaml thing has always given me sleepless nights.
I know it has been discussed long and hard in the past but I really think it is time something was done about all these passwords and api codes in plain text.
At least if HA ever wants to be taken seriously (by which I mean mainstream).
As usual… I’m more than happy to be shown the error of my thinking.
I am not Python expert. I only started to code in Python because I wanted to monitor and manage my crypto mining rigs using HA. I managed to create decent integration using AppDaemon (see screenshots below). But AppDaemon is limited in some ways. For example, I could not find a way to set an unique id for the entities my app creates. Also no support for defining devices. Which is why I am dabbling into creating a custom component.
Yeah, from want I am seeing in the dict, I really don’t think storing sensitive information in secrets.yaml is really doing anything other than providing a convenient way to share data. I did some search and found keyring · PyPI, which provides a way to store and retrieve using the operation system’s keyring. Something like this would be ideal.
I don’t think storing passwords in secret.yaml has anything to do with OP. It only allows to separate sensitive config data to another file (in order to exclude it from publishing).
Configuration is loaded from all files, then parsed as the whole and handled as single configuration object (correct me if I’m wrong)
If such an object is provided to all components (which is likely the case), then it’s not only bad design but the security risk.
WoW,that is not good. …
A huge security risk. . The configuration object is available to all components???
And with al those series of updates, one component can be changed to get sensitive data between the many updates, and then back to the original sources… Most people who regulatory update, would never find out…
Most modular / plugin based architectures have a certain relationship of trust between the components. Modules that you install are inherently granted access, because you trust them by installing them. This worked well in the past and still kinda works in closed source environments where every module is vetted and only available through a single source. It works less and less well in the open source world, as the number of attacks on open source packages increases.
It’s a rather common approach in open source in general. Every time you install something from pip or pypi or npm or flatpak or whatever other package manager, you trust that these components are not malicious. If installed as root, you trust them basically your entire system. There have been a rising number of abuses of this implicit trust and that shows the limits of this approach (top 8 malicious packages found in npm). Keep in mind that integrations, even trusted ones, can install their own dependencies, which can be subverted / tampered with. Even HA core components will pull in tons of third party dependencies they don’t control.
Fixing this is not trivial. While you can limit the config data you explicitly make available to custom integrations, a flat out malicious integration can attack your system with a myriad of potential exploits once it sits on your system and runs with the same permissions as other HA components. If you want to avoid this, you would have to entirely isolate every integration into their own sandbox environment with strict access control. That’s very hard to achieve. Even Apple and Google barely manage to keep malicious apps out of their ecosystems and they pour millions into mitigating this.
Staying away from custom integrations in general can help to reduce the risk. Most of the time you don’t need them anyway.
Bottom-line 1: “we had support for keychains and vaults, these have been removed, as they prove to be impractical and actually… nobody used them (or cared about them as no-one complained even mentioned anything about the removal).”
Bottom-line 2 is more-or-less what I said above: Whatever could be implemented, actual malicious code could easily workaround it.
That’s an interesting thread, thanks for linking it. I think the main bottom line here is, it’s impossible to sandbox the code for custom integrations because of the way Python itself works. Python will always allow any code that runs in a process full access to all other data in the process, and even to modify it on the fly. So anything you end up doing in terms of securing is futile, as custom code can easily circumvent it.
As a C++ developer myself with little Python experience, I wasn’t aware that it was so bad. Seems like any code can enumerate all currently living objects through the GC and even modify them. With ‘features’ like that, you can forget about any kind of data and execution isolation. A custom integration could pretty much do anything it wants to your HA and there would be no technical way to stop it. So in the end it comes down to this - you have to fully and entirely trust any custom integration you install.
Hopefully. But I wouldn’t bet on it, at least not before damage has been done.
It being open source alone is no guarantee. Yes, the source could be peer reviewed and audited. In theory. But in practice, who actually does that ? Auditing third party source code for security flaws and intentional backdoors is time consuming and requires specific expert skills. And it’s a continuous process, not a one off thing. I doubt that people will do this on custom integrations, especially the less used ones.
So nefarious behaviour would most likely be discovered through suspicious outgoing connections showing up in the firewall logs or through the fallout of the attack itself, after damage was already done. It all depends on how well the malicious code is crafted.
That said, I don’t see any realistic way to mitigate this. People just have to understand that by installing third party integrations they implicitly give them full trust. Maybe HACS makes it a bit too easy to install random obscure stuff.
Official integrations are obviously vetted. And Unofficial integrations / components require the end user to explicitly change the file system, in order to be able to add custom components. It’s not as simple as turning on a toggle and suddenly unvetted code can interact with the system.
You have to either run code to install HACS or you have to manually create the custom_components folder and copy what you want in to it. And then when you restart Home Assistant, the log warns you about every component that is not official.
The point here is that Home Assistant is secure out of the box, it requires the end user to specifically make changes - in order to make it less secure. This is really no different to flashing custom firmware to an ESP based device, or flashing custom firmware to some IoT device or camera, unless you specifically inspect the code, you have no idea what that device is doing on the network, it could be downloading trojans to machines on the network, it could be tying to brute force the router password, you don’t know unless you check.
Yes! This is exactly my point. HA reads everything into one big object and passes it to every component. There is really no good justification for doing this. Most components are simply going to read only the data that is relevant to its domain. Even if a component needs data from other components/domain, it should not be passed this way, because it will result in a tight coupling between the components involved. If the data structure of the other component changes, then it will break the component depending upon it.
Yes, we do place some trust. But the problem here is that HA does something that user is not made aware of and cannot easily catch. Because of the positive reviews about Tesla HACs integration, I choose to install it and trust it with my Tesla user account access token. But when I want to try another integration or custom component thinking that it is a simple weather integration or a custom component that will enable me to control my SONOFF switches, I will not expect HA to share my sensitive Tesla access token, which can not only be used to control my car remotely, but can also be used to gather all sorts of sensitive information such as where my car is currently.
Yes, it is possible for someone clever to find ways to breach security measures and do something malicious. Just because locks could be picked, do we leave our homes unlocked? We put additional security devices and hope to get notified at the very least if someone attempts to break in or manages to break in. If we really think about it, what HA does is far more worse. We don’t need a clever programmer to find ways to breach the system. The programer does not even have to bother writing few lines of code to read the yaml files. Every piece of data is being handed over to the programmer in a big silver plater as soon as the setup function is invoked.
I keep reading in this thread that Python is insecure and it will allow anyone to access anything in the system, so there is no way for us to have some sort of control. I disagree! I do not think any language or package is inherently unsafe. It is possible for a well designed system to at the very least track and notify the user about such code. iOS and Android are very good examples of this. If an app requires permission to access some configuration or another sub-system, the user will be prompted immediately.
A lot of trust is being placed on HA because of its wonderful design of running subsystems in isolated docker containers. It should be possible to place all sensitive data in keychain within a separate docker and provide a secure API for the rest of subsystems to gain access to pieces of required data through explicit approvals from the user. If that is too much code, then at the very least, please don’t share everything with every integration! If the SONOFF component really needs my Tesla access key, then I will gladly enter it again while configuring the component. At least that way, I will know what is being shared with whom.
Interpreted languages with reflection and garbage collection are inherently harder to lock down than compiled languages with native support for things like memory isolation and code execution permissions. I can trivially create a locked down virtual memory region for a plugin in C++. I cannot do anything like that in Python.
iOS and Android are heavily locked down sandboxed and permissions based systems that have been specifically built around isolating applications from the ground up. This is built into the OS. Exploits are actively mitigated and removed by companies spending billions on those systems. And even that doesn’t keep them from being rooted, jail broken and exploited in various ways.
HA has nothing like that. And it would be way out of scope for a home automation platform to even try to implement anything like this. If you are seriously worried about a third party integration, then nothing prevents you from auditing its source before installation.
Python is not the only language that run inside a docker container. Even if we want to implement this secure secrets store only using Python, then implementation can be placed in a separate isolated docker container. The things living inside the secrets container would be a keystore, an admin web interface and a Rest API server that is secured with OAuth. When another component needs some sensitive information, the user we would be required to login into the web interface, select the pieces of data that should be shared and other component would be provided access/refresh tokens, which can be used in the future read the data whenever needed.
Yes, as a programmer, I can definitely audit codes. But not everyone using HA may have the expertise to audit codes or have the time. Instead, I will see if I can put some effort into creating this secure container and submit it for review. If it proves to be worthy, may be it will make into HA someday.
So you would ask the user to log in each time a secret is needed? So about once every minute?
Or would you store the access/refresh tokens so that any component can read it?
So you would tie an entire new custom integration API (which would have to be created, maintained and audited for security issues) to Docker ? What about the HA core installation method that doesn’t use Docker ?