The future of YAML

danielo515 · April 17, 2020, 5:39pm

Please, tag me if you do. My username is exactly the same on github

runningman84 · April 17, 2020, 6:35pm

I would also like to include the . storage folder in git. Therefore the files should be splitted real config settings should be separated from auth sessions and other dynamic data. This would allow us to have a meaningful gitignore file and auto sync to git. From my point of view there is no real difference between json or yaml. You can convert them anyway. The only benefit of yaml files are comments. They would not work because you cannot keep these during any write or update operation in home assistant.

123 · April 17, 2020, 7:24pm

FWIW, there was a proposal, in the Architecture repo, for the use of JSON schemas. However, it didn’t attract any attention:

github.com/home-assistant/architecture

JSON schemas for explicit semantics

opened 03:20PM - 18 May 19 UTC

closed 07:42PM - 31 Jan 21 UTC

ties

## Context At the moment Home Assistant uses JSON files, both in the code (ma…nifests) as in it's configuration (`~/.homeassistant/.storage`). The validation of these files is performed using Voluptuous. This allows for validation, but does not have a native construct for versioning. ## Decision Start using (versioned) JSON schemas for JSON ## Consequences * Validation of JSON files is easier. URL of schema refers to valid elements. * Versioning of the file format of JSON files becomes explicit. * Migrations are easier * Previous versions of file formats are still available * It easy to see that a file was valid before migration, and is after migration. * Improved tool support The latter is an intriguing point. While JSON is not used as the human-readable file format, the tool support that comes with json schemas provides a good experience. The [Vega-Lite editor](https://vega.github.io/editor/#/examples/vega-lite/bar_grouped) is a good example for this, with autocompletion of both keys and valid values for enums. If needed it should even be possible to generate a JSON schema from calls that instantiate the Voluptuous validator objects.

danielperna84 · April 17, 2020, 7:31pm

No. But in order to present a configuration dialog there need’s to be some mechanism that decides what to display based on the selected variant. With a simple schema that’s no problem. But with more complex demands from an integration new problem will arise. For example if there are nested parameters. At which nesting-level should the logic decide “ok, this is the parameter we need to configure, not the one below”? It’s unknown how complex configurations can be, and imposing limits like “we only support one level of nesting because that’s how we decide what and how to render” could lead to some integrations not being possible to configure via UI at all, whereas it wouldn’t be a problem at all with a manually constructed configuration flow.

Apart from that: how should a schema like the ones above provide separate steps? I mean like a wizard where you first enter credentials, click next, then do something else. Which of course needs some logic all in itself to handle. But a schema couldn’t possibly provide this while maintaining the current yaml-style so many want to keep. Or at least I assume this. I’m not involved in any of this. I’m just trying to provide possible reasons for why the devs did head the direction they chose from a programmers perspective.

One bit to note: we already have that schema stuff, validation, yadda yadda yadda. For a long time actually, and it’s working perfectly fine. I don’t think anything about that should change, as that’s actually what’s making the yaml-configuration we already have possible. To me there’s only one problem: generating the UI from it. And that’s where the current solution falls short. Hence it has to be done manually (in the sense of building the UI). And with a carefully crafted configuration flow in place, the text-based configuration just is obsolete and causes the developer more work. Imagine an update of an integration (UI), and nobody noticed the yaml-mode-config broke or requires adjustments by the user. Bam! Breaking change! Everybody hates these.

petro · April 17, 2020, 7:36pm

Versioning is a must with databases.

speedfire · April 17, 2020, 8:01pm

There’s obviously a lot of concerns and opinions around that decision. I still have not had the time to read all the replies as I try to measure the impact of such a change.
I kinda get why this decision had to be made. In my case, I spent a lot of time building CICD around this project. My main goal was to avoid disrupt home automation services while developping/configuring upgrade (household onboarding).
Using GitlabCi, Ancible, Docker-compose and Virtual machines, I have reach a point where I can deploy update to my production RaspberryPI in just one click only once it has been developed/configured locally and tested in a staging environement.
In that context, all configurations needs to be done as Code. My git repo tracks YAML files as my entry point of configuration. I can revert back in time and tracks all changes brought to my production environement. I specificly choose not to use UI except when UI makes change to YAML files which are commitable to my source control.
I still don’t know if it will play well using the .storage files, but I feel like storage are more like the result of HA processing than en entry configuration. And not sure about how I will be able to configure those files to fit different environement (dev, staging, prod) configuration.

IMO, Configuration as Code will always be needed more than UI to be viable in the long run.

finity · April 17, 2020, 9:08pm

Replying to myself here to keep this (currently unaswered) question from getting lost.

petro · April 17, 2020, 9:11pm

just deleting all the hacs.xxxx files didnt’ work?

finity · April 17, 2020, 9:15pm

Not to sound too uppity here but TBH if the user can’t read the instructions in the official documentation (as long as those instructions are well written and up to date) then I’m not sure I would use that user as a basis for changing the target audience of HA.

Most integration configuration instructions are generally pretty straight-forward. If they can’t follow those specific instructions then how will they ever hope to be able to figure out the more complex stuff?

finity · April 17, 2020, 9:19pm

Yes, it did for me in that situation.

But my point was we are always told (as has been said even in this thread) that those files are never supposed to be edited.

And I’m an experienced user who (mostly) has an understanding of how things are supposed to work and where to look.

How is the new target “non-tech-savvy” user supposed to deal with figuring that out and then be able to know that they can (and then figure out how to) edit those un-editable files?

There is no “official” troubleshooting/recovery method for that situation that I’ve ever seen.

dshokouhi · April 17, 2020, 9:21pm

You have to account for the users who don’t read documentation, they take up a significant percentage. Much more than you think. You can guide them to the docs but keep in mind most new users don’t even know how to read the docs let alone find them. To us who have used HA and other software its fine but imagine your friend who isn’t so tech savvy and doesn’t even read the manual for a normal tech product.

petro · April 17, 2020, 9:22pm

The only reason I’d say that your argument would be brushed under the rug is because it’s a custom solution. But I agree that it holds merit. Something like that is always possible.

finity · April 17, 2020, 9:29pm

If they still can’t figure out how to read the instructions even after you point them out to that “new user” then I don’t think I’d ever want to try to explain how to create automations, templates, etc. so I would rather they not use HA in the first place. I’ve dealt with those types many times already in the forums and really honestly tried to help them and it’s a totally lost cause. Intentionally drawing more of those types as the target user base is going to be an absolute nightmare to support here.

Again, not to sound too uppity because we’ve all needed some help in the past, but if you can’t follow the simple instructions even with a nudge in the right direction then good riddance.

bhaonvashon · April 17, 2020, 9:29pm

So, maybe there are a subset of configurations where a simple schema would suffice, no? That is a start. Any ideas as to the percentage of current integrations that might fall into this category?

A couple of things come to mind here:

For the purposes of exploring a new approach, maybe we can agree to not constrain ourselves to thinking just in terms of the current monolithic YAML-style configuration per integration. As you point out, this appears to already have its limitations. What we are trying to achieve is an externalized textual representation (in whatever form that might take) of the configuration that would address the use cases outlined in this thread.
Consider breaking the monolithic YAML configuration into multiple simplified chunks, each with their own schema, that is reduce the complex configuration (and associated schema) into a sequence of simplified chunks (and associated schema).
As to how to hook these chunks together, that sounds like a state machine, no? So, the configuration declaration provided by integrations would then consist of a number of schemas for the chunks and a state machine declaration for how these chunks relate to each other.
The goal is to keep the WHAT (schemas, state machine definitions) separate from the HOW (GUI, textual external representation).

The schemas definitions together with a state machine definition could then be used to drive a GUI manager. I think it might also be able to drive a textual manager that would then expect various configuration chunks to be available (e.g. in different files?).

I wonder, couldn’t you argue that the current “config flow” is a form of state machine, just in code rather than a declaration? Ideally, a new approach would consist mainly of schema and state machine declarations, not code (or at least not code in the integration, but associated with it).

Can you point us to some representative integrations (and the code) as a concrete example of such a complex configuration scenario? It would be interesting to see an actual example for enlightenment! Both one that illustrates the difficulty with the current monolithic YAML-style approach and one that is illustrative of the new “config flow” approach.

I have no idea if this is just a pipe dream, I’m being naive, or both, or whether there are other projects/technologies out there that might apply to this. I need to read more on the “configuration as code” meme. Being an old Lisper, I have an idea what this might mean, but need to do my research.

Thanks Daniel. This is interesting…and productive I hope.

Edit: Well, didn’t take much to see what “configuration as code” is. I’ve seen the term, just never drilled into the definition. Not what I was thinking. It is what we are trying to achieve with a textual representation…that is, treat configuration as we do code using the same tools for version control, etc. that we do for code. Its the goal here. Sorry about that. I was thinking along the lines of Lisp, and that data and code are the same form (s-expressions)…I think the technical term is homoiconicity.

dshokouhi · April 17, 2020, 9:39pm

Oh I agree with you, normally the last thing you will hear from me is the link to the docs but if I have to send you more than 2-3 links to explain how to do 1 thing I will tell you to read all of the docs.

I am just saying that if you want to appeal to the vast majority user base who don’t read instructions you need to develop the software in a way that is is so easy to understand that you don’t need the docs. In the long run this is the end goal for all software. The main issue I think most of those new users had were formatting errors. Too many or not enough spaces, improper use of the dash, too many sensors. All that goes away with config flows. I get that it sucks you can’t tell what entities were linked to what integration but maybe the devs can compromise and give us a file that HA updates on its own to show the entities linked to the integration. Asides from that you don’t need to modify these so much. If you want to adjust the JSON then nothing stopping you just restart right after the change so it sticks, you can do this for changing a password.

When you have a product that takes hours to setup and understand you steal that fun from a new user, they want to flash their device hook up to wifi and run with it. If you don’t know YAML your stuck before you can even run and then it no longer becomes fun, it becomes a burden.

finity · April 17, 2020, 9:51pm

I believe I made it clear that is the problem…

Why should this be the goal for software that has the claim that it will always be free and open source?

What is the benefit to anyone (dev or moderately advanced user/support person) to intentionally draw in that type of user?

unless there are other plans on that front as well…

If that were true then HA wouldn’t be as popular as it is right now. The vast majority of current users didn’t know anything about yaml or even Linux when they started and that obviously wasn’t a barrier for them (me…).

The draw is the number of integrations and it’s flexibility.

So I really don’t think that argument holds any water at all.

nickrout · April 17, 2020, 9:56pm

Devs criticisd for making software too hard for beginners.

Devs make software easier for beginners.

Devs criticised for changing software.

finity · April 17, 2020, 10:07pm

I think tht we can all agree that there are two extremes we need to stay away from - needing to write all code in machine language and dumbing down the software for the most uneducated hick who can’t program a clock on their VCR.

I vote we stay away from the latter.

SeanM · April 17, 2020, 10:08pm

Decisions should be made based on what’s best for most people. And this approach is undeniably the best for most people.

Billions of people are familiar with this approach, because it’s how things work pretty much everywhere else. When you install an app on your phone you head to the Android or iOS app store, search for the app, press the install button and will be using it within seconds. Same exact flow in Home Assistant now. Try linking integrations in the Alexa, Google Home, IFTTT, SmartThings etc apps. They all work this exact same way. It’s the standard that users expect.

These UI config flows have a large number of benefits versus YAML. There’s on-screen instructions so that you don’t have to pull up a documentation page. They validate your input so that you’ll know what went wrong immediately (invalid password, closed port, etc) without having to hunt through log files after a restart. Information is often pre-filled so that your latitude and longitude would already be there when adding a weather integration for example. Integrations like Plex can show an authorization page that you login with one click from your password manager. Your network can be scanned for devices and added with a single click. And it’s only going to get better over time.

This approach reduces burnout for contributors, makes integration setup significantly easier and faster for users, cuts down on troubleshooting and support questions, requires far less documentation updates, and makes breaking changes a thing of the past in most cases. All objectively good things that benefit everyone, including us power users. It’s not dumbed down, It’s significantly smarter and more convenient.

But I also acknowledge it’s not 100% perfect. People would like a choice between UI and YAML for this, totally understandable. The blog post already went into detail on why that’s not viable currently. It places a large burden on contributors and when you have limited resources you have to choose how to spend that time wisely. If somebody smart can figure out the technical hurdles and come up with a good solution, I’m sure it would be looked into.

Ultimately though these minor drawbacks only affect a very tiny percentage of power users, and it’s these folks that are the most capable of adapting to new situations and figuring out workarounds. Making the platform more accessible to way more people at the expense of some minor inconveniences for power users is a good trade-off IMO.

finity · April 17, 2020, 10:20pm

Not if the goal is to draw in users who can’t read simple instructions…

I can see how that might help them for setting up some integrations (until you run into the situation I had above that no one has provided the answer to) but then what?

Literally everything else in HA and home automation in general of any real usefulness is significantly harder than entering in a few lines of yaml to set up an integration.

Now we have a new user who can’t be expected to be smart enough to figure out how to add a few lines of yaml to configure an integration but then expect them to try to figure out automations, scripts and templates?

Really?

That sounds like more work for us (if we even want to continue providing that help at all). And I know from personal experience on this forum how tiresome that can be to try to help someone who just can’t be helped.

And since it has been made abundantly clear that the devs are never, ever, ever supposed to be asked directly for help that it is up to us to help the new users figure things out.

So it’s OK to overburden us with more “customer support” but not the devs to make it more convenient for us?