Something is seriously wrong with the Config Check process

Once again one of my scripts, to quote from the log:

could not be validated and has been disabled

There was no other indication or warning or sign that the script had been disabled and I did not find out until I noticed several hours later that it hadn’t been running.

‘Check Configuration’ tells me with great excitement (note the !) that HA will start:

…with no sense of irony that HA will start but just not the bit that I am specifically reloading for.

Surely I can’t be the only one who thinks that this is unacceptable?

In this case it wasn’t a big deal, but for something to be ‘disabled’ in an almost silent manner like this just has to be wrong.

Yes, the mistake in the script was entirely my fault, and in this case it was a trivial function that was disabled but it could quite easily have been something important.

And before anyone jumps in, the process to check would have been:

  1. [Sidebar] Developer Tools
  2. [scroll down] Scripts, wait for tick
  3. [Sidebar] Settings
  4. System
  5. Logs

Is this really what is expected after every iteration of testing a script?

What would be so wrong with providing a notification?


Over the last year or two I been finding with HA that for every giant leap it makes forward in functionality (and there have been many) it seems to take several small but incredibly annoying steps backwards (usually in usability).

1 Like

Unnecessarily inflammatory and provocative title. There’s nothing “seriously wrong with the config reload process”: it does what it says it’s going to.

2 Likes

The previous system would have disabled all your scripts, rather than just the one. That for me is a massive usability improvement.

The notification issue seems relatively trivial in comparison - I’ve added a shortcut to the logs in my sidebar, so it reduces the clicking a little, but just accept that checking the logs is now part of the testing routine.

That doesn’t mean there can’t be something wrong with it.

Yes, and told you it had done so.

Maybe in your opinion, but that doesn’t mean it couldn’t or shouldn’t be done. It would presumably be trivial to provide?

It doesn’t mean that it can’t be improved, but describing a config reload function that reloads the config it as “seriously wrong” in a topic title isn’t going to get you much sympathy.

GitHub - home-assistant/core: Open source home automation that puts local control and privacy first.

Alternatively, you could ask for your money back.

[yawn] Please don’t start on that one…

2 Likes

Not being a developer, I’ve no idea how trivial it would be. But why not post an issue on GitHub pointing this out as a bug (or if you already have post the link here) - it would possibly stand a better chance of getting a response than posting on the forum.

2 Likes

/me think there should be a “venting” / “frustration” category for this kind of unuseful posts.
Definitely not a “configuration” question

1 Like

I’ve noticed the same thing and have mentioned it a few times on the forums.

I believe it started 3 or 4 releases ago. Or at least that’s when I noticed it the first time.

And yes I agree it’s frustrating that the config checker says nothing is wrong in your config (I assume that’s what is supposed to be checking given the button label) and then silently fails to tell you that one of your scripts (for me it has only been automations since I rarely use just scripts) not only isn’t correct but that it just completely stopped from loading it at all.

I literally just had the same thing happen last night after I created an automation and messed up the “repeat-until” config. the automation should have run twice by this morning and it hadn’t so I kept checking the triggers in the code (and missed the real issue) until I went to look at the trace and realized the system didn’t even know the automation existed.

at least before if it didn’t load all of your automations due to an error it would pop up a notification of such in the sidebar. I think I would rather have that (no automations will load and a notification) than just have it fail silently for one automation.

So to be fair it isn’t the config reload that’s the problem. It’s the config checker that’s problem.

Maybe if you changed your title to that it might better pinpoint the real issue.

So a post is unuseful if someone points out a problem in the forums? I personally don’t see anything inflammatory at all in the post. Maybe a bit of exasperation but I can see where it is coming from - the thing that used to work now all of a sudden doesn’t work the way you expect and no one told you it doesn’t work the way you expected it to anymore.

But that’s just me.

4 Likes

Fair point.
Done

It is not silent - it tells you in the logs. You just need to adjust your expectations and look there.

Before, you wouldn’t have even known which script or automation was failing, as the notification was not precise enough. My recall may be faulty, but I’m pretty sure that meant looking through all of your scripts/automations to try and track down the error reported in the notification. The improvement is that it now tells you precisely which one is faulty, and only disables that one, not all of them.

So overall, in my view this is a side effect of a much needed underlying and massive improvement to the config checker - it may need further slight tweaking, but is far from the catastrophe that the OP implied.

Hang on… this may just be a mis-interpretation (by you, or me) but I didn’t call ‘the problem’ a catastrophe. I just suggested that whilst the resulting outcome for me this time had been merely very annoying, if the disabled script had been important then it might have been.

Just clarifying…

So, how often do you read your logs without some indication that the logs need to be read? I know I almost never do. Why would I?

no frontend notification = no issue to look for in the logs. :man_shrugging:

especially if the result of the config check is a big green (misleading…) “everything is good!!!” (yes that was an exaggeration but I hope you get the point).

And add to that there are some failures that actually tell you what the problem is right there in the config check result box.

And again that it used to work as expected and give at least a persistent notification that something was wrong.

true but you at least had an indication that SOMETHING was wrong. And then it told you to look in your logs for the more precise indication of what the problem is. Which is when I would actually look in the logs.

Now the same message is placed in the logs but the system doesn’t even tell you that there’s a problem that needs to be addressed so you don’t know to even look.

That did happen on occasion when the line number etc couldn’t specifically be determined. But at least there was an indication in the frontend that something was wrong in the first place without you needing to stumble across the issue when something doesn’t work later on.

I’m OK with that improvement but it’s not an improvement if the system doesn’t tell you something is wrong in the first place until you stumble upon it later when something doesn’t work.

I think you should be able to notice a trend in what I’m trying to point out…that there is no obvious notification that there is even a problem to look for.

right.

and it’s a side effect that needs to be addressed.

and it can’t be addressed unless someone points it out as an issue in the first place.

Hence the thread.

No reason to shoot the messenger.

At the risk of repeating myself, things get fixed by posting issues on GitHub, not posting in the forum about how one specific issue is part of a wider narrative about the decline and fall of HA

fair enough.

Doing a full validation of a configuration file in-situ, from within and while the system that is being configured is running, is anything but trivial. Some settings can only be validated by actually running the initialization code of certain components (and you can’t do that while they are running) or by adding complicated and sometimes unreliable checking code. This is a well known issue in software engineering and isn’t specifically related to HA. Most software will not even provide an interactive config check, for that reason. And some will only allow it while the application is not running.

Technically it doesn’t say that. It says that nothing is bad enough to keep HA from starting. It doesn’t say that everything is OK. Arguably that makes the entire function much less useful though.

Why even have logs or error messages. You have to adjust your expectations and just run the application on a debugger, like everybody else does ! Oh wait.

Logs are fine. But then the ‘config check’ function should either be made fully functional (which can be very hard / impossible), removed entirely because it’s not reliable or its limitations should be clearly noted in the UI. Anything else is bad UX.

1 Like

Yeah, I know.

That messaging has evolved recently as well.

the config checker used to be fairly reliable about telling you that there was a problem even if it couldn’t tell you exactly where the problem was (invalid config see configuration.yaml line ? etc). And if there was no error it would say something like “Everything is OK!” or something to that effect.

Then it went from that to a persistent notification - “the following components could not be loaded. please see the logs for details” - which those details used to be in the config checker box itself instead of buried in the logs.

now if there is an error it just fails silently. Unless you already know to look in the logs after every change just in the off-chance that something might have failed.

Evolution is supposed to be a good thing…right?

:wink:

I guess my basic point is that it used to be way more useful and with recent changes not so much anymore.

Yeah well. That feature is much more complex under the hood than what it may seem. I mean I don’t know the details of how it’s implemented in HA, but if something like this ever came up in a strategic dev meetings for our (commercial) application, I would veto it immediately and point out how much of a time and resource intensive nightmare this would be to maintain longterm (read - how much of a pain in the ass it would be).

The thing is, if you want something like that to work really reliably (ie. close to 100%), you need to build your entire component loading / init / run system around that. That comes with a high testing and maintenance workload and it’s a long term commitment that will limit you in all kind of ways. It may not even always be possible. Imagine a component that accesses a hardware resource (dunno, like a Z-something stick or some other hardware device with exclusive access) that gets configured over the YAML. Like maybe just a path to the ttyUSB device. In order to validate this path in the config, you will need to directly talk to the device, see if it responds, if it’s the right device and FW version, yadda yadda. The problem is, that device is currently in use. You can’t just query it like that from a second process, because it has exclusive access. So you would have to shut it down on the main process, do the validation query and then restart it on the main process. During that time you may incur data loss. And that’s just one example of many.

And then you will have to weigh this complexity against the benefits this feature actually provides. I wouldn’t be surprised if HA gradually moved away from this feature in the future. I would.

Edit: that said, the current message it shows is just plain weird.

As mentioned above the issue is that it used to work better than it does now so functionality has been removed for some reason. I’m not asking for it to work better than it did before - only the same.

I just want there to be a notification that there is a problem without needing to check the logs “just in case”.

I can’t see how that would be difficult seeing as the error checking is already there and the message already gets sent to the logs.

The system already knows there is a problem. It just doesn’t tell you unless you check the logs.

Why not as a minimum just also create a notification that there is a problem and even better just print the same log error message into the config check window so you don’t need to go digging for it in the system files?

1 Like