First I’ll wait how open they are to understand what’s their intend. But sorry, the Topic of this discussion is exactly the same and thus I’d write the same Topic again. Please don’t take this personally, but if you/someone wants to control or stop this safety discussion you’re risking exactly, that with help of HA someone burnes his house or a regional power outage leads to not getting the energy system up again, or someone is hacking into your system and coordinate a regional power plant overload. You may ask why? Simply because the safe “Off” state of these mains plugs can actually, supported by HA be programmed like everyone likes. If you follow well approved safety engineering like suggested above, then these big power consumers like e.g. an electrical heating startup guaranteed with safe “Off” state with result that is allowing power plants to recover and these attributes Start_Up_On_Off are well specified in the Zigbee spec. But these require urgently to be overwritten by safety measures. So I suggest to not hide behind some secret government programs and don’t accept a catastrophic design behaviour in major of protecting someones thread? Everyone who wants to collaborate with Dr. He can contact him using his email address - can’t he?
On the other side this community has a much bigger interest that these problems get addressed and solved asap and to be honest not only for the US but worldwide - who does not agree to that?
If I’m wrong, please kick me out of this thread and stop me wasting my time here!
I’m not saying anything is secret I’m asking for permission to share parts of thier proposal before I do. It’s polite.
The government program as soon as I get the link it goes here. (Placeholder)
No offense intended but this one was for a discussion… Now I see we’re talking about implementing a specific thing… Which naturally narrows conversation. Is why I asked for two that’s all. Nothing else. It helps search and categories later… And helps keep the subject material here from going to one single rabbit hole, Which we all know we all have a tendency to do here) What if this goes nowhere and your ideas are amazing?
Dear Nathan, thank you much for this hint! I completely agree to not go into deeper details, but safety mitigations can only be achieved with satisfaction if they are solved ideally on hardware level. My examples above are only examples to give someone a very pragmatic understanding where I estimate to be, and how these very low level safety functions create, support or solve a country wide power outage - or if thousands of HA controlled power consumers are manipulated like they can be today. From my experience of many years of safety system engineering is, that no individual safety expertise is enough to understand more than a single cipher percentage of the finally clarified problems that arise peu à peu when starting with the safety engineering.
I highly appreciate you to take this discussion always back up to the highest flight level, but sometimes it is necessary to explain details to onboard other persons - can you agree this?
Just for the sheer irony… Now you’re cooking with gas… ![]()
Hi Tim, thank you much for your suggestion! I think we should take your idea into account. But just from perspective of system safety I’d be scared of giving any critical safety funcions in hands of a higher level SW. The main reason for not standardly supporting this (if there are other solutions possible) is, that you maybe explode the harmonization process for requirements engineering, engineering, implementation and testing.
On the other side this example demonstrates extraordinary how important it is to create a very toplevel system safety architecture to clarify first, how HACS can violate system security or privacy or maybe take these functions from HACS implementation by safety reviews and approval process natively into the lighthouse to ensure that any of the other goals are violated - do you agree?
On the other side, there is onother point that I don’t want to loose from your suggestion is, that highly important the timed repeating of check commands is used as so called heartbeat retrigger of the e.g. thermostate function after the power range of the power consumer was positively checked. If, however this heartbeat or safety check results fail, you do not even have to send a switch Off command as the in advance placed countdown will stop everything on lowest embedded level in the plug itself.
On the other side I fully agree this to not be ideal if this power monitoring could be better done on base of zigbee specified attributes and in the power switches or plugs itself.
Thus my experience says Safety First means to:
A) establish and harmonize the Safety Process, then or parallel harmonize and create the safety concept…
B) Harmonize and document all safety related FTA, critical paths, risk analysis
C) Finalize and harmonize a safety architecture by involving all other disciplines, from hardware, software, security, privacy, HA community,
D) Safety functions implementation at Lowest possible System (HW) Level with lowest impact on other disciplines resulting in Least cost testing effort.
Just as note to Nathan: We are talking about a highest level of a safety concept and try not to loose in details. This maybe somedays end in our system safety architecture.
Tim, we’re maybe talking only about 15$-20$. Using my newly created safety blueprint it took me ~5 minutes to approve two good looking plugs. The 1st is from ledvance and the 2nd is from innr out of 5 plugs plus 4 kinds of simple wall switches that I tested.
With these two plugs of different manufacturers I was not only able to 1st safely replace the normal on with the on_with_timed_off function. I also was 2nd able to approve the running countdown by reading back the on_time attribute. 3rd prove the countdown timer is running. 4th programmed the start_up_on_off attribute, that after a power outage the plug will find itself always in a safe “Off” state and read back this value to ensure it arrived. This prevents, respectively supports the electric power plants from failing to recover and closes the ring and repeat the picture of try to control at least a part of the herd of cats.
In my understanding this is highly important to be mentioned for the energy suppliers, because this I estimate these high power plugs to control the biggest crowd power consumers in private households. This makes the above mentioned examples and efforts mentioned above so important for a future where lots of households will use HA at home.
On the other side this means safety concept wise, that the non functioning devices do not satisfy the zigbee spec but should from safety perspective. Thus if you find any first person which burned his house using a power plug and HA could maybe have best arguments to fight against manufacturers or resellers that are ignoring the IEC62368 by supporting switch functionality for power consumers above 15W respectively above 100W.
Below 15W we have no safety issue at all according to IEC62368 because the energy is not sufficient to start a fire but we could have a power outage recovery issue that is required to clarify e.g. maybe on side of Dr. He?
Please no discussions about the following blueprint here. Its just a prove of concept and by far not to see from the flight level which this discussion is for! Its only a small safety puzzle part and invitation to add more safety to other HA members for controlling high power consumers and as partly prove of concept. You can find my first blueprint here: Safety Plug On_With_Timed_Off and Safe Off State fall back at start up.
This could be maybe also a weak starting point for testing more suitable high power plugs, find out manufacturer that do not respect Zigbee specs and may exceed any critical safety limits. Normally in an ideal world I would highly expected all plugs to include this functionality internally and guarantee to protect against any exceeding the IEC62368-1 safety limits by default - But sadly they don’t.
Wow! thank you, Edmondena, for taking the time to write such a detailed and technically grounded response. Your background in automotive functional safety and standards really comes through, and I appreciate you bringing that perspective into the discussion. It honestly took me a while to go through everything and make sure I understood your ideas and suggestions. Apologies for the delayed reply as well, I was traveling and didn’t get a chance to follow the thread closely.
I want to be very clear up front: many of the risks you describe, especially around heaters, fail-safe behavior, and single points of failure, are real. Your ALARP-based reasoning is the kind of thinking that consumer home automation may lack today. At the same time, most Home Assistant users don’t have 15+ years of safety-system development experience like you, which is precisely why we’re interested in hearing from a broad range of stakeholders: experienced engineers, maintainers, and also everyday users, to better understand different perspectives, expectations, and concerns.
Let me start by answering one of your key questions directly:
First I want to raise a question on the author of this article about the target of the request of coorperation: Do you want to 1. involve in ZHA development with respect to improvement of safety, security and privacy, or do 2. you wan’t to regulate and control the development in focus of these topics. What is the background and your honest intend to involve here?
Our intent is clearly (1), not (2).
We do not aim to regulate, control, or externally dictate the development of Home Assistant.
The background for this effort is that Home Assistant has increasingly become real household infrastructure. When security, privacy, or system-level assumptions fail, the consequences can extend beyond “software bugs” into physical or safety-relevant outcomes. Our goal is therefore to support the ecosystem, not to turn Home Assistant into a certified industrial safety system.
Concretely, we are hoping to:
- collaborate with maintainers, developers, and experienced users to better understand where high-severity risks actually arise in real deployments;
- make safety, security, and privacy assumptions and boundaries more explicit; and
- develop optional, adoptable improvements that fit naturally within existing Home Assistant workflows and governance.
Any outcomes from this work are intended to be voluntary, incremental, and community-driven, not mandates. The focus is on risk awareness and practical support, not regulation.
For additional context, this discussion is connected to the U.S. National Science Foundation’s Safety, Security, and Privacy of Open-Source Ecosystems (Safe-OSE) program. Safe-OSE recognizes that vulnerabilities in an open-source ecosystem: including development, integration, deployment, and supply-chain processes, that can affect all users, with technical, socio-technical, and even cyber-physical consequences. Importantly, the program is not limited to traditional software; it explicitly includes hardware, system designs, processes, and cyber-physical platforms like home automation. The goal is to enable meaningful, ecosystem-specific improvements that mature open-source projects often lack the resources to pursue on their own.
More details about the program can be found here:
I agree with you that safety, security, privacy, robustness, and functional safety are distinct concerns, which also applies in academic research projects, with different engineering methods and evaluation criteria.
One challenge in the Home Assistant ecosystem, and frankly in most cyber-physical consumer platforms, is that these concepts are often discussed together because failures in one domain can cascade into another (e.g., a security failure causing unsafe actuation). That said, I agree that clarity of terminology is essential if we want actionable outcomes rather than endless debate.
This is a very valid concern, and I agree with the underlying tension you’re highlighting.
From a open source platform perspective, HACS is not an appropriate foundation for safety-critical or guardrail-based guarantees. The current HACS model deliberately optimizes for openness, low friction, and rapid innovation, which is great for ecosystem growth, but those same properties make it difficult to rely on HACS components for anything that needs strong assurances around review, provenance, or long-term maintenance. So yes, using HACS to implement safety mechanisms creates real security and trust tradeoffs.
That said, I think it’s important to separate two things:
- Using HACS as a safety mechanism, and
- Understanding and managing the risks introduced by third-party integrations (which is unavoidable in an ecosystem like Home Assistant).
In fact, this issue is one of the vulnerability classes we plan to study in our research proposal: supply-chain and third-party integration vulnerabilities. Home Assistant’s strength, its extensibility via community integrations and add-ons, also expands the attack surface through transitive dependencies, insecure defaults, update paths, and limited vetting. HACS significantly lowers the barrier to discovery and installation, which improves usability and innovation, but at the same time raises the need for scalable provenance, vetting, and safety mechanisms at the ecosystem level.
Our intent is not to label HACS as “unsafe” or to remove it from the ecosystem. Rather, it is to:
- better characterize which risks third-party components introduce and where they matter most;
- help distinguish best-effort automation from safety- or security-relevant functionality; and
- explore ecosystem-compatible ways to improve transparency, trust signals, and risk awareness around integrations and add-ons for general users.
Thank you for laying your personal project out. This is a very thoughtful and technically solid project of a real safety-engineering effort, and your breakdown of what works, what breaks security/privacy assumptions, and where device non-compliance limits risk reduction is very helpful.
I want to acknowledge first: what you’re describing is a strong and concrete project idea in its own right. A lighthouse effort focused on functional safety requirements, Zigbee compliance, fail-safe behavior, and alignment with standards like IEC 62368 is well-motivated and addresses an important class of risks in Home Assistant deployments.
At the same time, I want to be transparent about scope. The direction you outline, including requirements engineering, certification pathways (TÜV/CE/UL), Zigbee homologation, and a hardened reference platform, is very specific and targeted, and goes deeper into functional safety system design than what our current project proposal is scoped to cover.
Our project effort is intentionally framed at the ecosystem level: understanding how safety, security, and privacy risks arise across diverse deployments among different stakeholders, identifying recurring vulnerability patterns (including third-party integrations), and developing optional, adoptable improvements that fit existing Home Assistant workflows. It is not intended to define certification requirements or mandate hardware-level safety guarantees.
That said, your points resonate with our goals, especially the need to clearly distinguish safety from security and privacy, the trade-offs around HACS, and the importance of simple, reusable solutions if anything is to see real adoption.
I appreciate you sharing this in such detail. Even if it sits somewhat outside the core scope of our project proposal (which is still at an initial stage of development, and still under finalization), it’s the kind of grounded input that helps keep the broader discussion realistic and constructive. Thanks again.
Thanks, Nathan, I really appreciate you stepping in and helping keep this thread focused and well scoped.
You’re right that my original intent with this post was to open a high-level discussion around the Safe-OSE program and to gather broad, early-stage community perspectives while we’re finalizing the project proposal. It wasn’t meant to turn into a deep dive on implementing a specific safety mechanism or architecture within this same thread.
I agree that separating those discussions makes a lot of sense, both to keep this thread aligned with its original purpose and to avoid narrowing the conversation too early into a single technical direction. A separate thread for detailed implementation ideas would also make things much easier for others to follow, search, and reference later on.
“What if this goes nowhere and your ideas are amazing?”
Haha
I couldn’t agree more. NSF projects are highly competitive, and there’s always uncertainty. We might get the funding and be able to work concretely on some of these ideas, or we might not and have to keep things more academic and exploratory. That’s just the reality, who knows how it will play out.
Thanks as well for being thoughtful about how and when proposal-related material is shared. I really appreciate the care you’re taking there. I’ve included some high-level project context in my reply to Edmondena, which I hope helps clarify a few of the questions that came up.
Really grateful for your help in guiding the discussion and keeping things constructive and organized.
Hello Tianzhi,
thank you very much for all these detailed clarifications! Just please let me clarify maybe some uncertainties or misunderstanding, that I found in your answers?
Topic HACS vs. integration of safety functions:
- Requirement for HACS for safety functionality:
The integration of HACS is actually necessary because the ZHAs core functionally somehow lacks e.g. the possibility to implement highly important functions like attr_read, which is concretely required to prove that the safety (countdown timer) is properly setup. Funny fact: zha_issue_cluster_command and zha.set_cluster_attribute are well known in basic ZHA, but zha_read_cluster_attribute is not. - HACS ressources impact and robustness violations:
HACS is on my experience somehow overloading and bloating limited router system resources which are the main limitation if you want to get openwrt for security and privace involved.HACS is heavily dynamic and internet bandwide narrowing - somehow/sometimes and maybe I’m wrong. Safety and complexity is absolutely contradicting and will never fit together! I saw big, highly prominent companies failing here, that on base of simple analysis could be shown provable evidence that required them to completely discard their safety architecture just few months before SOP to succeed. E.g. I see the HACS installation as of my today router logs is forcefully trying to update its database and destabilizing a normally stable system where it is required for privacy concerns to bootstrap an encrypted dns routing and puts the complete ZHA system out of order.
2.b HACS storage wear out as root cause for permanent and random faults:
Pls take into account, that using HACS will use the same NAND storage residing on the required SD-Cards like where the lighthouse ZHA included safety is installed. Every write to this device causes highly unreliable wear leveling algos into duty that, beneth everyones control generate imense risk of data failure. I spent years of understanding these mechanisms and guarantee you to fail if write cycles in the ZHAs storage are out of control. - Safe simplicity vs. complexity decrease systematic faults:
Only the simplest safety systems are low complex enough, that a team of highly skilled and experienced engineering experts could even understand it. The big difference here is that this team is fully focused on solving exactly these problems and have no other job. Finally i suggest using HACS for safety development → Yes, HACS in live safety production → No → good luck if you think someone can succeed keeping complexity and flexibility without any tribute/tradeoff to safety. - Finding the easiest and shortest way:
In my concrete experience with HA this would require us to include some HACS functionality into the lighthouse but for live safety systems without HACS. Meaning that HACS installations could be tainting all safety, robustness and reliability which would be required for a high functional safety. On base of few weeks of integration for me as a newbee here, very less effort should be necessary to include these safety functionality e.g. as entities for everyones use into the standard ZHA device quirks. E.g.if a device (power plug / switch) gets positive safety approval using HACS, then a simple Safety CR could be raised to implement this functionality for this device and finally gets activate it by default.
Please don’t mind me to get very conrete here?:
Safety strategy discussion on base of concept studies and known regulatory requirements
a) The plug’s device quirks / entities (maybe pls. correct me) can accept infinite switch on time only if these devices are used e.g. for low power LED lamps and do not exceed the CE/FCC/IEC62368 S1 (15W) limits 3s after power up or later. But on a power outage it should always be forced to go offline “safe state” to allow energy plants recovery which could be the tribute the safety user system has to pay (Just an idea to be discussed!?)
b) if the power consumption not exceeds the 100W limit but the 15W limit after the 3s fault tolerant time, The safety timer and startupOff is required to be active and Power is required to monitor every 5s. The system would have to be classified “unsafe” at the moment it deviates somehow from the known lighthouse environment.
c) If the device switches power consumer within 5s exceeds the 100W it will be required to switch off if (safety is activated & password protected) the system is not exactly as it was tested for safety approval. This is the most critical safety stage and I suggest not to accept any deviations or other classify the system prominently as “potentially unsafe”. But users can use it, disable safety at their own risk and is similar like installing HACS. All deviations taint the safety system as “potentially unsafe” which is prominently to be informed in ZHA dashboard and maybe part of a safety package installed at once.
c) Like my prove of concept example shows HACS today is required for reading my safety countdown timers. I personally experience this as ugly workaround solution and “potentially unsafe” and violating other lighthouse goals. In opposite ASFAIK, this on_with_timed_off functionality maybe state of the art since long in zigbee specifications and the gap should be closed in ZHA and not worked around using HACS for this. On the other side, some no name plug (i tested two) are simply not providing this functionality, while the two functional ones are “only” missing the quirks/entity/safety implementation in ZHA.
-
Suggested conclusion using HACS (to be discussed)
So simply using HACS together with safety functionality should search for a compromise with safety in action, when users see prominently e.g. “Safety Tained” in the dashboard, but only proved at below the100W IEC62368 S3 limit.
If they are developing or using safety together with HACS for high power consumers “Safety Violated!” should be displayed and automatically switching Off these consumers is the default, or it requires a password confirmation with reading and confirming a critical warning message. -
Functional safety testing - a must have:
Above suggestions have another important background, such as every user will see sooner or later the safety functionality in action and gets a participant in a very wide range of different applications and use cases. If this is not taken into account our safety goals get never a good approval stage and reuse cannot be guaranteed if no one sees this.
As an every ones experience this is similar to an ABS warning light. You can stop the car but ABS won’t help you. This prominent warning light is the peak of the ice berg of highest criticality safety beneath. In perfection, no one will never notice the safety in action just in case that you follow all traffic signs and what you learned at school - but the warning light exists and requires testing together with the complete system every day when you switch on your car.
Please to think about: Should high power switching capabilities which by international standards are classified as highest critical fire hazards be monitored by a some guy’s nice to have implemented add on? Does anyone want to risk catastrophic personal injuries and personal capital losses just as a tradeoff to a marginal comfort?
I think this is absolute elementary and should be exactly treated as this, without
tradeoffs or at least visible for every one who uses such things and switch unclassified devices per default into safe “Off” state. As ZHA gets more and more attractive for everyone using home automation, and the above focus will become sooner or later an experience for personal tragedies - If this will not be addressed as required or not very soon! Thus we cannot expect to have enough time to try setting up a professional safety project in an community driven open source development. We should do this undoubtedly, but parallel will require to support quicker integration paths with the risk to implement safety functionality not in perfection. I fully agree to your suggestion for maybe picking some low hanging fruits but not forget to establish a really good safety process, risk board, CR management… especially for this - Can we maybe find an agreement with this?
Thank you all so much for this discussion and please don’t take care to challenge my very low ZHA experience - I really hope you prove my half knowledge as wrong or someone can point me on how these technically simple safety requirement can be better solved? BR
While there is a lot of discussion here regarding safety (and I agree with Nathan, too much detail in certain areas for now), security hasn’t been mentioned. One reason why I refuse to professionally install HA into someone else’s home is the lack of RBAC controls. The fact that any user in a HA home can walk up to a tablet, web browser, etc and access any object is deeply troubling, especially if those objects are tied to systems like alarm, heating, location tracking, and others.
Is this research team actually working with Nabu Casa or the leaders at HA? If not, then this is all for naught (at least in terms of HA improvements)
Hi John, sorry this is not correct and on base of my experience I expect the “securety” topics to be solvable (pls see above) if you use a hardened openwrt router as host platform and run HA at low user rights level. The security can be assumed to be longterm up to date by the openwrt comminuty that mostlyfocus on security.
Just what I have in mind is not coming without tradeoffs because maybe no one will expect a system to be safe, secure and private id you accept installing WIFI sensors and actors under control of HA in favor of Zigbee devices. Please don’t understand me wrong this idea will very hard to accept for some one and I do not expect for everyone. But accepting every companies cloud services and closed source components with IP internet access will never be expected to get secured somehow.
On base of years of safety development “safety first” is to be taken absolutely serious. If you fail to agree a robust safety concept between all involved disciplines before developing the “rest” of the system architecture, then you have a very high certainty to fail the complete project. This is simply because safety requirements may overrule many others and not in rare case require you to redesign your complete system architecture with lot of your development work going to waste.
If missing safety burns your customers house, then the installed securety is obsolete and your’re bancrupt.
Unless I’m missing something, I’m not understanding how a hardened underlying platform is going to enforce RBAC (role based access control) on entities within Home Assistant itself, I’m talking about the software abstractions of heating controls, light switches, location data, etc. For example, in other home automation platforms I can set it up so that only a certain user can operate the heat switch presented by the software, but other users in the household can not.
You can’t have safety without security controls.
Hi John, sorry, please read above. Safety without security I already mentioned around my first post and 100% agreed to be granted. If I understand that right, role based access control maybe either implemented as part of HA core functionality or can be restricted by the underlying openwrt e.g. simply by blocking the http, https and ssh ports. If this is not at all in place in HA, I fully agree and even see this for my personal use case to be a problem. But technically this could be not a big concern as a user restricted access management may also be implemented on the host OS and can be solved independently. Please do not understand me wrong but I think to find a safe and feaseable lighthouse project base the security may want to first focus on external security risks and second on internal risks that maybe could be implemented as add on in a first step. I think without the safety issues above beeing in place and approved this system should be too far from professional customer installation projects - Maybe I’m wrong and lets see how others would prioritize this?
In my experience “Safety First” overrules “No safety without security” but this is clearly a chicken egg problem and everyone is invited to help here? As I see external security to be easily solvable under some tradeoffs, protection against internal misuse may have less priority for once own household installation and thus could be harder to find some to focus their development on? I really want to appologize for my answer and should better criticize myself and not your suggestion! What you say is absolutely important and is not on my side to prioritize anything as this not my thread.
Maybe its better instead to ask Nathan what options maybe are known or exist for reuse to establish a risk board and where people can publish especially “safety” “security”, “functional” and “privacy” risks to be bundled for this request? Thanks and BR
Dear All, please note: Request for Safety, Security and Privacy Risks Board Creation as suggestion for coordinating your risks, effort and ideas.
Hello John, I do have been in contact with HA leadership in involving them as the advisory board member of this project. Thank you for pointing it out. We indeed want to make some practical improvements for the ecosystem.
Hello All, without loosing the high level focus please let me 1st point out and document a newly found #safety #risk that is very curious and similar to the “RBAC” input of John above. 2nd I try to show up how this will lead to contradicting #safety and security requirements and 3rd how this opposition may be solved.
I’m still talking about a #safety critical high power switching plug, that turned on and off just by unacceptable low EMC immunity robustness when using my older microwave for warming my coffee. If we use this unacceptable but provable effect at a little higher abstraction level, it shows to be exactly the same like what John noted (as RBAC) while just instead of an unauthorized person, the unauthorized microwave is switching the high power consumer on without having any access granting for this to do and is comparable like pressing the switch button on the plug itself. Additionally I find this is 1:1 comparable to a so called “child protection” which I saw other plugs having included that I have tested so far.
With this finding and looking at #safety risks mitigation we’ll required to accept, that a local access to the plug’s switch button maybe never at all be protectable for any reason. On the other side safety would require always to be able to turn this plug to safe state “Off” or pull this mains plug out of its wall plug (best controllablity).
Please also note, that I’d like to better have documented this example on a dedicated risk board, but so far such a board or underlying processes don’t exist at all, this may be a very good example for our discussion to demonstrate how #safety and security follow partly the same goals but are diverging in its details. At the end you may want to access control this plug locally e.g. as child protection or against unauthorized switching it on, but can and should never be controlled to pull the consumer out of mains wall plug to turn it so its required safe state “Off”.
BTW: If you look at this rendered text above, you can see that HA is even missing tags for #safety or #risks, while security and privacy can be tagged ![]()