The future of YAML

This is your opinion. Others might differ. No one is forcing you to use it or be part of the community.

1 Like

@frenck- I think the part that @danielo515 is referring to with git (or at least I understood it so since I am in the same boat) is that when I modified something in the UI which then updates .storage is not easy to track in git:

  • either I put the .storage in git (I can live with a private git so ok with passwords in .storage) but then I get to much noise in git due to the state info (such as last login) in the .storage/auth
  • I do not track .storage and I loose when I changed something

I am ok with the .storage not being intended to be edits but humans (at least they are debugging…) but my memory on what I change is limited so I need git to help me :wink:

3 Likes

Excuse me?

Exactly! See blog post “Breaking changes” section.

2 Likes

I’m in a similar boat to you, up until now my entire HAC config is in a private Git repo with the .storage folder in my .gitignore file so it doesn’t cause things to get out of sync.

I have now set up my Pi so it can now git push so I am able to back up the files that keep changing in the .storage folder, and am currently using NodeRED in order to do an automated git push every night at 1AM.

While this sort of works, it isn’t ideal, I do still get the file conflicts which have to be manually resolved, so I’m still searching for a better way to do things.

I’m not running the full HA with snapshots feature, but could probably do something similar by Zipping up my config folder, but I then I lose a lot of nice things that git version control has.

1 Like

Please take a breath and read a lot more about what is going on before you make such a statement. You may not like the changes and you are free to make your opinion known but I find the developers honest and trying to make the best system they can will the resources they have. I don’t always agree with the direction and or answer. I state my opinion but do realize that their answer is final. Sometimes a final answer changes due to resource changes. This does not mean anyone is lying. Obviously you have never worked in a dynamicly changing environment. If you did you would understand a truth today may not be the same tommorow!

5 Likes

If you are doing pushes only from the HA, not sure were the conflicts come from.

I had for some time automated git pushes but stopped doing so because they I was not adding comments to the git commit. Now I combine an automated backup from the whole folder (I use rsync) + manual git commits when I manually change anything. It is working fine but as said above I get too much noise in my git commits for state changes.

Yeah that’s the worry. I guess we’ll get out the flaming brands and pitchforks if that ever happens. For now I can live with this change.

2 Likes

That can be very tricky though, if not impossible in some cases. I can’t come up with a good example from the top of my head, but integrations may look like they require similar data, but in some details they actually need some extra “seasoning” to clearly communicate what configuration options are possible. Let’s call it conditional configuration or relational configuration.

An example from a totally different application (Remmina, a tool for remote access using different protocols):

  • For every host you configure a host and a port
  • If it’s RDP you can provide the username, password, domain, bit-depth etc.
  • If it’s SSH you can provide the username, password, but instead of that maybe which key to use, and domain and bit-depth don’t apply altogether
  • Then there’s VNC, NX, whatever, all with different requirements

To make one thing clear: such a configuration can be easily done in yaml, provided the user doesn’t use options for a configuration variant where they don’t apply. But the user could add those options nonetheless, possibly causing the things to not work as they should. If the user reads the documentation though, this won’t be a problem. But if not, it’s a potential source for frustration.

And here the benefit of the UI-driven configuration comes into play. It ideally just allows you to configure parameters that are valid for what you are trying to do. Other parameters that globally are valid, but invalid within the feature scope, would just be hidden.

So now to the point I originally wanted to get to: it’s hard, if not impossible, to create an abstracted representation for the configuration-possibilities (referred to as schema) that can be used to validate input, and at the same time do the reverse operation of generating a UI for it. So the WHAT you from your statement lines up with the schema, the HOW is the logic to generate a UI from that schema. And because the HOW is a problem, it just has to be done by the integration maintainer.

To further visualize the problem imagine a non-existent integration for which you always need an ip, a port for variant a and b, username for b and c, and username + password just for c.

So now let’s pretend we could traverse this tree to generate a UI for it:

# BTW, this is what we have right now
integration:
  ip: value
  port: value
  username: value
  password: value

This wouldn’t work because all parameters would always be visible. Let’s try this:

integration:
  ip: value
  a:
    port: value
  b:
    port: value
    username: value
  c:
    username: value
    password: value

This would work, but port and username are redundant. If there were changes and not every occurrence of port and username where adjusted, this could break the integration. So my final attempt:

integration:
  ip: value
  a:
    port_a: value
  b:
    port_b: value
    username_b: value
  c:
    username_c: value
    password_c: value

Now this is nice without any redundancy. But by doing this we loose the ability to exactly derive the validation-method from the parameter name. So some sort of mapping-mechanism that tell’s us that port_a and port_b both have to be integers would be needed. And if a new variant c get’s introduced it could cause a problem if the validator-mapping for the new parameter isn’t correct.

So to sum this up: attempts to solve such problems always come with drawbacks that have to be worked around in some way. So the most stable solution is the first example, because here it’s hard to break anything, regardless of any change, added variants, whatever. But it also happens to be the solution that’s can’t be visualized automatically at all. Or at least not in a way that disallows the user to enter data that he’s not supposed to supply.

4 Likes

@danielperna84

Thank you for the very thoughtful reply! I appreciate the detail on the difficulties. “Schema” is exactly what I’m thinking of, although I don’t have the nitty gritty details or experience with how schema definitions might apply or what their deficiencies might be. Your insight is helpful.

Considering your example, would you really need to configure more than one of the variants at a time? Even if you did, is it too much to expect that some redundant information may be necessary to achieve the result? If one accept’s some redundancy, then it might reduce to a validation problem for each variant. Consider:

integration:
  variant_a:
    ip: value
    port: value
  variant_b:
    ip: value
    port: value
    username: value
    password: value
  variant_b_invalid:
    ip: value
    port: value

I can also see that additional markup might need to be introduced to facilitate things. I can also see that an eventual solution to the textual configuration problem would not be YAML. I know some other technologies that might apply exist, but have no idea whether they would address these issues (e.g. Dhall, or of course Lisp (please no flames)).

So there is:

  • A representative set of scenarios that can be used to validate an approach.
  • How to define the “schema” (technologies that can be used).
  • How to validate user input (seems like this applies regardless of HOW the data is acquired).
  • Representative examples in different textual formats (YAML, etc.).
  • How to generate the UI (e.g. something like what jsonforms does)

I suppose this is not the place to be discussing the details of how this would work. But identifying the issues is a great start. Thanks again for the thoughtful reply.

6 Likes

I’m doing automated pushes from my HA machine to keep track of changes in my .storage holder, but I also do sometimes do a git pull when needed:

When I update the config files, usually tweaks to YAML files, or adding custom components, I use a different machine running Sourcetree to do a git push, then shortly after a git pull and a HAC restart on the Pi running HAC to test my changes.

With some more tinkering of my .gitignore and NodeRED flows I may be able to workaround some of the version conflicts that appear when updating my config folder, as I rarely need to make changes to the .storage folder from the machine running Sourcetree.

Jep - above I was proposing to split the files in .storage between config and state so it would not require tinkering with the .gitignore, ideally subfolders but did not get any feedback. I guess I would need to ask in the architecture repo - so far I have only done a PR to core, not triggered an architecture discussion.

4 Likes

Please, tag me if you do. My username is exactly the same on github

I would also like to include the . storage folder in git. Therefore the files should be splitted real config settings should be separated from auth sessions and other dynamic data. This would allow us to have a meaningful gitignore file and auto sync to git. From my point of view there is no real difference between json or yaml. You can convert them anyway. The only benefit of yaml files are comments. They would not work because you cannot keep these during any write or update operation in home assistant.

1 Like

FWIW, there was a proposal, in the Architecture repo, for the use of JSON schemas. However, it didn’t attract any attention:

No. But in order to present a configuration dialog there need’s to be some mechanism that decides what to display based on the selected variant. With a simple schema that’s no problem. But with more complex demands from an integration new problem will arise. For example if there are nested parameters. At which nesting-level should the logic decide “ok, this is the parameter we need to configure, not the one below”? It’s unknown how complex configurations can be, and imposing limits like “we only support one level of nesting because that’s how we decide what and how to render” could lead to some integrations not being possible to configure via UI at all, whereas it wouldn’t be a problem at all with a manually constructed configuration flow.

Apart from that: how should a schema like the ones above provide separate steps? I mean like a wizard where you first enter credentials, click next, then do something else. Which of course needs some logic all in itself to handle. But a schema couldn’t possibly provide this while maintaining the current yaml-style so many want to keep. Or at least I assume this. I’m not involved in any of this. I’m just trying to provide possible reasons for why the devs did head the direction they chose from a programmers perspective.

One bit to note: we already have that schema stuff, validation, yadda yadda yadda. For a long time actually, and it’s working perfectly fine. I don’t think anything about that should change, as that’s actually what’s making the yaml-configuration we already have possible. To me there’s only one problem: generating the UI from it. And that’s where the current solution falls short. Hence it has to be done manually (in the sense of building the UI). And with a carefully crafted configuration flow in place, the text-based configuration just is obsolete and causes the developer more work. Imagine an update of an integration (UI), and nobody noticed the yaml-mode-config broke or requires adjustments by the user. Bam! Breaking change! Everybody hates these. :man_shrugging:

1 Like

Versioning is a must with databases.

There’s obviously a lot of concerns and opinions around that decision. I still have not had the time to read all the replies as I try to measure the impact of such a change.
I kinda get why this decision had to be made. In my case, I spent a lot of time building CICD around this project. My main goal was to avoid disrupt home automation services while developping/configuring upgrade (household onboarding).
Using GitlabCi, Ancible, Docker-compose and Virtual machines, I have reach a point where I can deploy update to my production RaspberryPI in just one click only once it has been developed/configured locally and tested in a staging environement.
In that context, all configurations needs to be done as Code. My git repo tracks YAML files as my entry point of configuration. I can revert back in time and tracks all changes brought to my production environement. I specificly choose not to use UI except when UI makes change to YAML files which are commitable to my source control.
I still don’t know if it will play well using the .storage files, but I feel like storage are more like the result of HA processing than en entry configuration. And not sure about how I will be able to configure those files to fit different environement (dev, staging, prod) configuration.

IMO, Configuration as Code will always be needed more than UI to be viable in the long run.

4 Likes

Replying to myself here to keep this (currently unaswered) question from getting lost.

just deleting all the hacs.xxxx files didnt’ work?

Not to sound too uppity here but TBH if the user can’t read the instructions in the official documentation (as long as those instructions are well written and up to date) then I’m not sure I would use that user as a basis for changing the target audience of HA.

Most integration configuration instructions are generally pretty straight-forward. If they can’t follow those specific instructions then how will they ever hope to be able to figure out the more complex stuff?