Scrape sensor improved - scraping multiple values

Hi Fellows!

I wanna make a sensor from NOAA. I’ve found the css element what i need. It contains a letter and a number. The letter is alone when there is no solar or geomagnetic event and the number (from 1 to 5) appear when something happend in the last 24 hours.
But i can scrap only the letter and the number is unreachable for me. Here is what i’ve done:

  - name: SWPC Radio
    resource: https://www.swpc.noaa.gov
    scan_interval: 300
    headers:
      User-Agent: Firefox/10
    sensor:
      - unique_id: swpc-radio
        name: SWPC Radio
        select: "div.noaa_scale_bg_1:nth-child(2) > div:nth-child(1)"

I beleive i need a value_template but after sever hours of reeding and testing what i find in this topic and homeassistant’s template helper i can not figure out.

I’m not a programmer but i think the css is changing when the data is changing.

I ask your help.

Please post a screenshot marking up, or a description of, the value you’re trying to read.

I think that the data you want is being dynamically pulled in and rendered from this URL:

https://services.swpc.noaa.gov/products/noaa-scales.json

and that this Javascript is updating the page and (as you suspect) modifying CSS classes:

https://www.swpc.noaa.gov/sites/all/modules/custom/swx_noaa_scales/swx_noaa_scales.js

If I’m right, this is a job for a REST sensor processing that JSON response.

Thank you for your fast answare.
You are right, this is exactly what i like to scrap. I have try the rest sensors too but did not work, the attribute what i need is ‘-1’ and i can’t get yaml work with this negative number, have no idea why…

You probably need to use ['-1'] notation.

So you want those three values (letter+number, but they hide any 0s) as individual sensors?

The value can be like R, R1, R2, R3, R4, R5. R means no event. But in the json R=0, R1=1, etc…

I’ve try [’-1’] but not working:

Invalid config for [sensor.rest]: invalid template (TemplateSyntaxError: expected name or number) for dictionary value @ data['value_template']. Got '{{ (value_json.[-1].R.Scale }}'. (See ?, line ?).

the json looks like this:

"-1":{"DateStamp":"2023-02-16","TimeStamp":"18:34:00","R":{"Scale":"0","Text":"none","MinorProb":null,"MajorProb":null},"S":{"Scale":"0","Text":"none","Prob":null},"G":{"Scale":"0","Text":"none"}}}
value_json['-1']['R']['Scale']

No dot before the brackets, quotes around the -1. Always safer to use bracket notation rather than dot notation.

Also, you had a ( without matching ) in the template in the error message.

Troon!

THANK YOU!
I’ve spend at least 6 hours in the last some days to figure this out.

Now it is working as i imagine
And i have learn some importart things aboute configuring rest sensors.

Thank you for your support!

1 Like

can someone help me ?

the page where the data is is https://aquarea-service.panasonic.com/installer/functionStatus
while the login page is https://aquarea-service.panasonic.com/

this is my code

multiscrape:
  - resource: 'https://aquarea-service.panasonic.com/installer/functionStatus'
    scan_interval: 30
    form_submit:
      submit_once: True
      resource: 'https://aquarea-service.panasonic.com/'
      select: "#login-form"
      input:
        Email: dimare.gabriele
        Password: xxxxxx
    sensor:
      - select: "#function-status-text-033"
        name: Stato Aquarea PDC   

What are you getting back from that code? Can you find that value if you do right-click, View Page Source rather than using F12? If you can’t, it’s dynamically-created and you’ll need to look at the network requests, work out where the data is coming from, and access it that way probably with a rest sensor.

@Troon I get nothing, data not available. I tried to do as you say, but I only see the source of the home page and I can’t find those data so they will be created dynamically. what should i do then?

F12, Network tab. Reload the page and look at all the different sources. Ignore the JPGs, PNGs, CSS files etc, but buried in there somewhere (and probably appearing at regular intervals) will be a file that contains the data in its response.

Here’s an example from my Pihole page:

That api.php file is returning a JSON dictionary that the page’s Javascript is using to update the page dynamically.

You need to find the equivalent, and then use that URL in a rest sensor:

If you need help extracting the value once you’ve found the data, post the response here as properly formatted code, rather than a screenshot. Like this:

{"domains_being_blocked":936293,"dns_queries_today":33510,"ads_blocked_today":6708,"ads_percentage_today":20.017904,"unique_domains":3924,"queries_forwarded":19569}

I found this file, which seems to have a lot of data inside. All right ? @Troon

https://drive.google.com/file/d/1mVB-u-sEL-EkxrQRiS3zPXY0hHsid1du/view?usp=sharing

I’m not clicking on a random Google Drive link. It’s unlikely to be that, I’d have thought. If you think it is, please post its content as formatted code text.

sorry I attached my one drive file because it’s too big and it doesn’t allow me to paste it all. I pasted it on these two sites, you don’t have to download anything just view them. Let me know if everything is OK ! A thousand thanks

this
https://controlc.com/3e2fd592
or this
https://pastebin.com/wp69PDYs

it’s the same

I’m pretty sure that’s not the file you want. It refers to one called api/function/status, which should be a POST request and returns JSON.

Have a look in your Network tab to see if that file is getting requested, and have a look at the request headers and the response.


these are the files i find.
the one I pasted you is called function-status.min

It’s the second-last one in that screenshot, status, which is an xhr (“Ajax”) call. That’s the one that I’m expecting to contain the data. Click on that and paste the response here as code-formatted text.

Thanks !!
there he is

{
	"connectionStatus": 0,
	"cnCntErrorStatus": null,
	"errorCode": 0,
	"statusDataInfo": {
		"function-status-text-005": {
			"textValue": "2006-0310",
			"type": "basic-text"
		},
		"function-status-text-027": {
			"type": "simple-value",
			"value": "48"
		},
		"function-status-text-049": {
			"type": "simple-value",
			"value": ""
		},
		"function-status-text-025": {
			"type": "simple-value",
			"value": "45"
		},
		"function-status-text-047": {
			"type": "simple-value",
			"value": ""
		},
		"function-status-text-068": {
			"type": "simple-value",
			"value": "19"
		},
		"function-status-text-009": {
			"type": "simple-value",
			"value": "32"
		},
		"function-status-text-007": {
			"textValue": "2006-0339",
			"type": "basic-text"
		},
		"function-status-text-029": {
			"type": "simple-value",
			"value": ""
		},
		"function-status-text-041": {
			"textValue": "2006-0960",
			"type": "basic-text"
		},
		"function-status-text-063": {
			"type": "simple-value",
			"value": "3"
		},
		"function-status-text-060": {
			"type": "simple-value",
			"value": "2092"
		},
		"function-status-text-023": {
			"type": "simple-value",
			"value": ""
		},
		"function-status-text-045": {
			"textValue": "2006-0300",
			"type": "basic-text"
		},
		"function-status-text-021": {
			"type": "simple-value",
			"value": ""
		},
		"function-status-text-043": {
			"textValue": "2006-0300",
			"type": "basic-text"
		},
		"function-status-text-065": {
			"type": "simple-value",
			"value": "0"
		},
		"function-status-text-015": {
			"type": "simple-value",
			"value": "34"
		},
		"function-status-text-037": {
			"type": "simple-value",
			"value": "1350"
		},
		"function-status-text-058": {
			"type": "simple-value",
			"value": "1005"
		},
		"function-status-text-013": {
			"type": "simple-value",
			"value": "31"
		},
		"function-status-text-035": {
			"type": "simple-value",
			"value": "16.23"
		},
		"function-status-text-019": {
			"type": "simple-value",
			"value": ""
		},
		"function-status-text-017": {
			"type": "simple-value",
			"value": "31"
		},
		"function-status-text-039": {
			"textValue": "2006-0940",
			"type": "basic-text"
		},
		"function-status-text-051": {
			"type": "simple-value",
			"value": ""
		},
		"function-status-text-056": {
			"type": "simple-value",
			"value": "0"
		},
		"function-status-text-011": {
			"type": "simple-value",
			"value": "31"
		},
		"function-status-text-031": {
			"type": "simple-value",
			"value": "12"
		},
		"function-status-text-053": {
			"type": "simple-value",
			"value": "-"
		}
	},
	"statusBackgroundDataInfo": {
		"0xA0": {
			"value": "0"
		},
		"0x20": {
			"value": "0"
		},
		"0xE1": {
			"value": "1"
		},
		"0xE0": {
			"value": "1"
		},
		"0xFA": {
			"value": "0"
		},
		"0xF0": {
			"value": "1"
		},
		"0x80": {
			"value": "1"
		},
		"0xF9": {
			"value": "1"
		},
		"0xC4": {
			"value": "0"
		}
	},
	"deviceStatusList": []
}

if I wanted to recover the “function-status-text-060” field?

Excellent, but whoever wrote that system needs a slap. That status URL returns a load of coded values for each field, and the data field you highlighted is function-status-text-033.

The response you pasted doesn’t actually contain that value (paste the response in here to be able to read it better): my guess is that is because it hasn’t changed and doesn’t need updating.

The page-updating Javascript (the one you linked to, beautified here), contains this:

    var thermoBar = {
        "function-status-text-033": {
            type: "simple-value",
            value: "-"
        }
    };
    var thermoOff = {
        "function-status-text-033": {
            type: "basic-text",
            textValue: "2006-0900"
        }
    };
    var thermoOn = {
        "function-status-text-033": {
            type: "basic-text",
            textValue: "2006-0910"
        }
    };

Looking at the response, I think you’re waiting for a message in the response with key function-status-text-033 and one of

  • value: '-' (unknown?)
  • textValue: "2006-0900" (off)
  • textValue: "2006-0910" (on)

Another bit of Javascript then turns those into human-readable text in the user’s language.

To actually read this, you need a rest sensor (docs). Here’s my attempt, but it will need some additional work to get the login working:

rest:
  - resource: FULL_URL_OF_THAT_STATUS_FILE
    authentication: PROBABLY_NEED_SOMETHING_HERE
    binary_sensor:
      - name: Panasonic Termostato
        value_template: >
          {% if 'function-status-text-033' in value_json['statusDataInfo'] %}
            {% if 'textValue' in value_json['statusDataInfo']['function-status-text-033'] %}
              {{ (value_json['statusDataInfo']['function-status-text-033']['textValue'] == '2006-0910') }}
            {% else %}
              unknown
          {% else %}
            {{ states('binary_sensor.panasonic_termostato') }}
          {% endif %}

So that returns:

  • on if the 033 field has a textValue field containing 2006-0910
  • off if it has that field but it contains anything else
  • unknown if the 033 field is present but without a textValue
  • retains prior state if the 033 field is not present

Good luck setting all that up!

Just seen this edit.

rest:
  - resource: FULL_URL_OF_THAT_STATUS_FILE
    authentication: PROBABLY_NEED_SOMETHING_HERE
    sensor:
      - name: Panasonic 060
        value_template: "{{ value_json['statusDataInfo']['function-status-text-060']['value'] }}"

That will work if that field is in the data every time, otherwise use a checking strategy like in my prior post.