Scrape Website - Time Series Data

Using Multiscrape, I’m trying to scrape the Height Gauge reading off of this site: https://waterservices.usgs.gov/nwis/iv/?sites=01664000&parameterCd=00065&siteStatus=all&format=rdb

I did find that if you use this URL: https://waterservices.usgs.gov/nwis/iv/?sites=01664000&parameterCd=00065&siteStatus=all

It displays the XML version. The value I’m looking for is: ns1:value

I’m looking to scrape the height gauge value and import it to an HA entity, so I can know that height of the river. Ideally, I would also be able to use that same entity in a mini-graph card to see the last few days worth of readings in a graph.

Just having a little trouble identifying the data for ingestion. I’m not a developer and have been googling this for months. Any help would be greatly appreciated

You mean gage height I guess and I am not sure what you want to achieve as there are a lot of values. What is the end-result that you have in mind?
Added to that, if you change the rdb at the end in json, then you can consume this data more easily…maybe RESTful or REST sensor but again…what do you need ?

Sorry…I realize now I never explained what I am trying to do. Ideally I would like a card that shows the last height measurement of the river (last entry). I’m assuming I could also use the same entity and have it plot out a mini graph card to show the historical values over the last several days. I also figured out that the URL I posted only shows a static block of time. I was able to tweak the URL to give only last recorded reading. Here is that URL: https://waterservices.usgs.gov/nwis/iv/?sites=01664000&parameterCd=00065&siteStatus=all&format=rdb

I also updated my original post

If you can find a way to use an XPATH statement, its not too hard. I’ve never looked into using CSS selectors though.

In JSON:

https://waterservices.usgs.gov/nwis/iv/?sites=01664000&parameterCd=00065&siteStatus=all&format=json

The value would be:

{{ value_json['value']['timeSeries'][0]['values'][0]['value'][0]['value'] }}

and the timestamp is:

{{ value_json['value']['timeSeries'][0]['values'][0]['value'][0]['dateTime'] }}

sensor:
  - platform: rest
    name: test
    scan_interval: 3600
    resource: "https://waterservices.usgs.gov/nwis/iv/?sites=01664000&parameterCd=00065&siteStatus=all&format=json" 
    value_template: >
        {{ value_json['value']['timeSeries'][0]['values'][0]['value'][0]['value'] }}
    json_attributes_path: $.value.timeSeries[0].values[0].value[0]
    json_attributes:
      - value

@mobile.andrew.jones, any explanation why the .value is not working for the value_template?
i.e. this works for attr. path

$.value.timeSeries[0].values[0].value[0]

but this doe sNOT work in the value_template

{{ value_json.value.timeSeries[0].values[0].value[0].value }}

It is. It’s values that isn’t working as you want, because it’s a reserved keyword:

Always use bracket notation and you won’t have this problem.

1 Like

I would also recommend:

headers:
  Content-Type: application/json

That Worked!! Thanks so much! You guys are amazing!!!

1 Like