Parsing json (I think...)

123 · July 2, 2019, 1:20pm

Oops! My mistake; I missed that important detail.

pnbruckner · July 2, 2019, 1:35pm

Actually, I need to be more careful about how I use the terminology. JSON actually is a text representation of data, so effectively is a string. The main point here is that the attribute contains the “raw JSON”, instead of a Python data type that is created from parsing the JSON.

finity · July 2, 2019, 1:35pm

I’m using the noaa weather alerts custom component.

Tbh, I’m really not sure what the use case is for the component author to present the attribute of the sensor in that manner. But I figured if it was there I would try to find a use for it.

I think I figured out eventually that I wasn’t dealing with a properly formatted json. As mentioned, it’s darn hard to tell the difference until it just wouldn’t work and I figured there had to be something wierd.

But i also couldn’t even figure out how to slice it up to make it useful as a regular string either.

pnbruckner · July 2, 2019, 1:41pm

I looked at the code. Why aren’t you using the alerts attribute instead of the alerts_string attribute? The latter is a JSON formatted representation of the former.

{{ state_attr('sensor.X', 'alerts')[0].properties.event }}

finity · July 2, 2019, 9:50pm

I’m not sure what you’re asking here.

Are you saying that the author of the component should use “alerts” in the component code instead of “alerts_string” to get a properly formatted json string?

Or are you saying that the component already provides an “alerts” attribute that is properly formatted json?

If it’s the latter then I’m not seeing that attribute anywhere.

If it’s the former then I’m not sure why the author wrote it the way they did. I haven’t really dug into the code to see what could be changed there to make it work. I was just trying to figure out if I was doing something wrong on my end.

Another related question tho is why would there be a difference in the results in the template editor between copying/pasting the entire contents of the attribute into the template editor and setting that to some variable as opposed to just directly using the attribute itself?

As you said, json is just a string. So where does the distinction come from between the json string and the raw string? I thought it had to do with the formatting of the string itself that allowed it to be parsed as json (just like a string of numbers & symbols can be interpreted as a datetime object as long as it’s correctly formatted)? But if that was the case then neither method should work or both methods should work.

I’m not grasping the difference.

pnbruckner · July 2, 2019, 10:02pm

The way I read the code, there should be both an alerts attribute and an alerts_string attribute. The former is a Python list/dict, and you should be able to get at the pieces you want as I showed. The latter is that passed through json.dumps, which takes the list/dict and outputs it in a single string which is a JSON representation of the list/dict. (That’s why when you did [0], you just got the first character of the string, namely '['.)

In Jinja you use . and [] operators to get at pieces of dict's and list's, respectively.

What does the entity look like in the States page? Or if you do {{ states.sensor.WHATEVER }} in the Template editor?

pnbruckner · July 2, 2019, 10:27pm

Because you copied it as if it was a list/dict, not as a string. As I said above, you’d have to do {% set x = '...' %} to be equivalent.

There is no difference, and sorry if I confused you earlier. The alerts_string attribute contains a string, which happens to have a list/dict formatted as JSON.

finity · July 2, 2019, 11:04pm

here is the entity with its attributes in the states page:

the entity doesn’t expose the “alerts” attribute.

However, this has helped me figure out a couple of things…

First, and most importantly, I didn’t realize until your last post that “properly formatted json” includes whitespace (newlines, spaces, tabs).

Second, I had no idea what the json.dumps() did.

Putting those two pieces of info together I was able to change component code to get a truly properly formatted json string in the “alert” attribute. And now I can access the desired values using the nethods above like I expected to.

Here is the bit of code I changed (the last line):

try:
            nws = noaa.NOAA().alerts(active=1, **params)
            nwsalerts = nws['features']
            self._attributes = {}
            self._state = len(nwsalerts)
            if self._state > 0:
                nwsalerts = sorted(nwsalerts, key=sortedbyurgencyandseverity)
                self._attributes['urgency'] = nwsalerts[0]['properties']['urgency']
                self._attributes['event_type'] = nwsalerts[0]['properties']['event']
                self._attributes['event_severity'] = nwsalerts[0]['properties']['severity']
                self._attributes['description'] = nwsalerts[0]['properties']['description']
                self._attributes['headline'] = nwsalerts[0]['properties']['headline']
                self._attributes['instruction'] = nwsalerts[0]['properties']['instruction']
                self._attributes['alerts'] = nws

and the result:

Time to submit a PR to the code.

Thanks for the help!

pnbruckner · July 3, 2019, 2:22am

What version are you using? The latest has this:

github.com

dcshoecomp/noaa_alerts/blob/965b1e3fb0387df1b26cfdc8f92c7cb99e8fd5cb/custom_components/noaa_alerts/sensor.py#L86-L117


def _update(self):
    from noaa_sdk import noaa
    if self._zoneid != 'LAT,LONG':
        params={'zone': self._zoneid}
    else:
        params={'point': '{0},{1}'.format(self.latitude,self.longitude)}
    try:
        nws = noaa.NOAA().alerts(active=1, **params)
        nwsalerts = []
        for alert in nws['features'] :
            nwsalerts.append(alert['properties'])
        self._state = len(nwsalerts)
        self._attributes = {}
        self._attributes['alerts'] = sorted(nwsalerts, key=sortedbyurgencyandseverity)
        self._attributes['urgency'] = self._attributes['alerts'][0]['urgency'] if self._state > 0 else None
        self._attributes['event_type'] = self._attributes['alerts'][0]['event'] if self._state > 0 else None
        self._attributes['event_severity'] = self._attributes['alerts'][0]['severity'] if self._state > 0 else None
        self._attributes['description'] = self._attributes['alerts'][0]['description'] if self._state > 0 else None
        self._attributes['headline'] = self._attributes['alerts'][0]['headline'] if self._state > 0 else None
        self._attributes['instruction'] = self._attributes['alerts'][0]['instruction'] if self._state > 0 else None

This file has been truncated. show original

Notice line 99.

finity · July 3, 2019, 3:22am

Well, whaddayaknow…

I must have missed that update. I was using the one right before the last one.

I’ll try that one out to see what I get and if it works I’ll have to remove the PR I just submitted.

That’s a very good reason why these things need to be in a system like HACS so everyone knows what the latest version is and if you don’t have it.

finity · July 3, 2019, 5:16pm

You know, after I wrote that I started thinking (dangerous, I know ) about the validity of that statement.

Is that actually entirely correct or is there some other “magic” that HA uses to determine if the object in question is json or just a string?

I know that if you wrap the alert_string attribute contents in quotes it becomes a string by definition. And in the template editor the attribute contents not wrapped in quotes is determined to be json and can be parsed as such.

I’m just trying to clarify my thinking on this…

pnbruckner · July 3, 2019, 6:17pm

A string is a string is a string. It just so happens, however, that some strings may contain JSON formatted data. But when you’re talking about entity states and attributes, it doesn’t matter. A string is a string is a string.

I think you’re getting confused between attributes that are strings, and attributes that are lists and/or dictionaries. You can do this in Jinja: state.x.y.attributes.z.key1.key2[0] if the z attribute is a dictionary containing a key named key1, and the value for that key is also a dictionary, but which as a key named key2, and the value of that key is a list containing at least one element. I.e., you can only use Jinja property operators if the attribute is a list/dict, not if it’s a string. (Well, you can do z[0] if it is a string, but then you’ll just get the first character of the string.)

So, the important point is string vs list/dict, which has nothing to do with JSON.

Maybe where the confusion starts is how that string or list/dict attribute got its value, especially if ultimately you know the source of the information is a JSON encoded string. If the JSON encoded string is parsed (e.g., via json.loads), the result, which is written to the attribute, is a list/dict. If, however, it is not parsed, or if some list/dict data structure is passed through json.dumps to create a JSON formattted string, then the attribute will be a string, which contains JSON formatted data.

As far as I know, in Jinja, there’s no way to parse a JSON formatted string to get at the individual data elements. (There is, however, the reversion, which can take a complex data structure and “dump” it out in a JSON encoded string.)

Another point of confusion may be that in some cases where you have a variable available named value, you might also have available a variable named value_json, which is value parsed to extract the elements of the JSON formatted data, the result being a list/dict.

finity · July 3, 2019, 8:26pm

thanks for the explanation.

I (kind of…mostly…) understood the part about the differences between a “string” (non-formatted text) and a list/dict. But TBH I didn’t think about the latter specifically in that way. I just knew that the text needed to be formatted in nested (or not) key:value pairs. I also missed the distinction betweeen the {} & the notations and what they represented.

However,

I think that was the part I was misunderstanding. You were definitely spot on (and I think there is a general confusion about it) that for some reason I was equating some JSON formatted text with the actual JSON itself as returned by some web inquiry. And since most of the time we do get our attributes, rest sensor data, etc from a JSON encoded source then I think that just reinforces the confusion.

that said, I’m still not completely wrapping my head around this:

Does the operation of parsing the data just simply add the required whitespace to properly construct the list/dict correctly from the JSON data? And the un-parsed JSON formatted string lacks the required whitespace and then it just becomes a pseudo-JSON encoded string (IOW, it just looks like a list/dict but doesn’t contain any embedded whitespace elements required for it to be a proper list/dict)?

Even if we ignore the idea of “JSON encoding” and simply look at the data as a string (but it also just happens to be conveniently formatted to meet the JSON standards ) there has to be something done in the operation of parsing the data that creates a proper list/dict. If not then I’m still failing to see difference between the alert_string copied to the template editor as a variable called ‘value’ (and without wrapping it in quotation marks to force it to be a string) then that variable being able to be successfully acted upon by jinja operators and the actual attribute of the entity which the same jinja operators return an error.

So the bottom line is it the whitespace that makes the difference?

Again, thanks for your help on this. It may not seem like it but it’s definitely helping to clear the fog.

pnbruckner · July 3, 2019, 8:57pm

No, no, no.

When I say “list/dict”, I mean a Python object whose type is list or dict, which is not a string (aka str.) I’m not just talking about something that is still a string but has its contents massaged a bit.

Maybe this will help…

results in this:

and this:

In the above the attribute is a string, and it represents the “original JSON encoded/formatted data.” And since it’s just a string you can’t use Jinja’s . and [] operators to directly extract the items from the “original JSON encoded/formatted data.”

Whereas…

results in this:

and this:

In this one the attribute is not a string, but rather is a Python list object that contains a Python dict object, which represents the “original JSON encoded/formatted data”, parsed into its individual components, which you can use Jinja’s . and [] operators to directly extract the items from the “original JSON encoded/formatted data.”

And if this doesn’t help, then I guess I’m at a loss of how to explain it any better.

EDIT: BTW, don’t get hung up on how I created the states (using JSON. That was just the easiest way to create relevant states for the rest of the discussion, and I only included the pictures of how I created them in case you wanted to try it yourself.) It’s more important to look at what the resulting entity looks like on States page and how its attribute reacts in the Template editor.

finity · July 4, 2019, 3:09pm

OK, I was getting close above but now I think I’ve got it…

When I said that:

I was wrong about calling it “whitespace”.

It should have been called metadata because the parser interprets the string as JSON and then returns a JSON Object by inserting something into the code (metadata) that defines the result as a standalone monolithic piece of data called an object. And then you can use other methods to interpret that object and extract the desired bits and pieces as necessary.

I think that the other confusion I had was between the colloquial use of the term parse as we regular humans use it to describe the act of physically dissecting a “string” of information we see on the page in front of use into human readable information as opposed to the use of it in this sense as an operation that is performed on it by a machine interpreter which results in a defined piece of data in machine readable format. I saw that you italicized the word but I thought that was for emphasis and not for specificity.

And, relatedly, I was also failing to pick up on the distinction between the concept of a general list or dict and how those are defined as specific objects internally.

The examples you posted helped in seeing the difference between the string and the object and then working thru several examples using the “alerts” and “alerts_string” above I was able to see the very subtle differences. Namely, that the key:values are split up using double quotation marks and the string returned using the ./ methods the key:values are split up using single quotation marks. I never noticed that before.

Another thing I never realized is that I’ve seen the “alerts: [object Object]” notation in the attributes before (for example, in my weather alerts project attributes) but I had no idea what that was really trying to tell me. Until now I honestly thought that it was just telling me that the attribute had exceeded the 255 character limit but I could still access all the data inside by using the ./ methods. Now I know that it is telling me explicitly that the data is specifically an object data type.

And, last (I think…), is that I didn’t realize the subtle differences between the two ways of entering information in the states/services section and template editor to specify info as a string or an object or that by default those things try to interpret everything as a JSON encoded string unless otherwise specified.

I hope I haven’t said anything incorrect this time. We don’t want you to be any more .

I definitely know a lot more than I did at the beginning of this thread. I’m sure I’ll have dig into it some more to get a better grip on this but at least now I know what I don’t know and I have a base understanding to know how to even search for more info.

Thanks again.

pnbruckner · July 4, 2019, 4:12pm

I think the biggest disconnect in this discussion is the use of Python terminology. I’m guessing you don’t know Python. Which is fine, but it does make it harder to describe what’s really going on, because it definitely involves Python. E.g., there is no such thing (in this scenario) as a “JSON Object”. They really are Python objects, specifically list, or dict or str, or a combination of those. And, of course, there’s also what Jinja does “on top of” Python. (FWIW, much of what can be done in Jinja, at least in this system, is really Python.)

So although we may be using different terminology, I think you’re understanding it, at least from a practical perspective, which is all that really matters.

BTW, when creating/updating a state in the state machine via the States page, you do actually enter a string which contains JSON encoded data. So when you enter {"x": [{"y": "z"}]}, that’s a “JSON string” that represents an object that in Python would look like this: {'x': [{'y': 'z'}], which is a dict containing a list containing a dict. That “JSON string” is actually parsed (& converted) into the corresponding Python object to create the state’s attributes.

Now, when you enter {"x": "[{\"y\": \"z\"}]"}, you’re still entering a “JSON string”, but this time it represents an object in Python that looks like this: {'x': '[{"y": "z"}]'}, which is a dict containing a str. And, BTW, in this case what you entered is a “JSON string” which represents an object that ultimately contains another “JSON string”. So, in effect, you encoded a “JSON string” into another “JSON string.”

finity · July 4, 2019, 8:37pm

Good guess!

Yeah, I got that part, too.

OK, so I guess that means that all of the different entities and associated attributes are all “python objects” even if they are a string and there is code embedded into all those objects that tell the interpreter which data type the object is - list/dict/string/int/float/datetime/etc?

YAY! And, yes, that’s the important thing.

petro · July 5, 2019, 11:48am

You could say that. There are ways to get that information out if you need to check it.

davidkirk · March 17, 2024, 2:26pm

Totally random question, but do you live in Anchorage? Looked from your data feed that maybe you did. I am looking for some help with a project if you do.

finity · March 17, 2024, 2:58pm

nope. Indiana. Always have.