Yes @danieldotnl is trying to keep up, although severely limited in time. I really appreciate all the support @parautenbach is providing to the scrape community! I simply cannot reply to each (private) message myself, and try to focus on providing more value in multiscrape instead.
Anyway, I looked into this tonight and realized that I fixed this some time ago but never merged it into the master branch.
I believe this release fixes your issue:
- name: SOS scraper2
resource: https://www.dailyfaceoff.com/nhl-weekly-schedule
scan_interval: 360000
sensor:
- unique_id: hockey_strength_of_schedule_test
name: Hockey Strength of Schedule Test
select: '#__NEXT_DATA__'
value_template: '{{ now() }}'
attributes:
- name: props
select: '#__NEXT_DATA__'
value_template: >
{{ value | from_json }}
Note @parautenbach … I also tried with value_json but that did not work. I would assume this is because it really is not a JSON file, it is a string of JSON.
You can see in the following messages it was only some unpublished code. I downloaded that version and tested and got it working in one step. Thanks for the (try) at helping and I am glad it was not just some stupid mistake I was making.
From debug I get this response: “Unable to scrape data: Could not find a tag for given selector”
I tried to use: select: “pre” but that didn’t solve the issue
Maybe someone can help me with the right tags?
==== Solution ====
I found the solution. I fixed it with:
This may be useful to some of you. I’ve figured out how to scrape dynamical (Javascript-generated) websites using Browserless and multiscrape and have written this up here:
So i got this as the selector: #repo-content-pjax-container > react-app > div > div > div.Box-sc-g0xbh4-0.fSWWem > div > div > div.Box-sc-g0xbh4-0.emFMJu > div.Box-sc-g0xbh4-0.hlUAHL > div > div:nth-child(3) > div.Box-sc-g0xbh4-0.brFBoI > div > div.Box-sc-g0xbh4-0.jGfYmh > div.Box-sc-g0xbh4-0.lhFvfi > span.Text-sc-17v1xeu-0.kKFNhh.react-last-commit-oid-timestamp > relative-time
But it is not working… here is what it looks like in HA:
The first bit of data I’m trying to grab is the 24hr snow fall, so console gave me this: #snow_report_1 > div.snow_report__content.row > ul > li:nth-child(2) > div > h5
It seems to make sense, but doesn’t work.
I’m trying to scrape a temperature measurement from a website - measurements are added every hour to a string - so far I can retrieve the entire string with measurements after ‘var query_temp’ - but I’m not experienced enough with this to obtain the last measurement (these are always in the positions -5 to -1 from the end of the string - indicated in the figure below). Could anyone point me in the right direction?
But I’m getting a new error if I include this line:
Error loading /config/configuration.yaml: while parsing a block mapping
in "/config/configuration.yaml", line 795, column 9
expected <block end>, but found '<scalar>'
in "/config/configuration.yaml", line 798, column 119```
Hi. I’d like to scrape a status indicator. The problem is that the element has no data in it, but rather the only thing that changes is the colour defined in the style attribute
We’d need the URL or the full HTML (pastebin?), and confirmation that the data you’re after is in the HTML as originally downloaded (View Source rather than F12 DevTools).
Could be as simple as select: div.buy-value.
If that colour definition is in the original HTML as fetched (i.e. not dynamically loaded afterwards), RESTful binary sensor. If that colour isn’t used anywhere else in the document, and the page length isn’t too great:
binary_sensor:
- platform: rest
resource: URL
value_template: "{{ 'rgb(93, 199, 22);' in value }}"
Lots of "if"s there, but without a URL or the HTML to go off, I have to make assumptions.
Since posting I’ve realised that multiscrape has an attribute key which should be able to return the tag attributes but somehow it does not work for this particular element. I’m experimenting with something like:
- name: O-Life Home Charger status
unique_id: o_life_home_charger_status
select: ".spot-list-item div:nth-child(1) div div .charger-status-dot"
attribute: "class"
value_template: "{{value}}"
which I believe should return “charger-status-dot”, but it fails. It seems to work fine with other selectors that I am already getting from this page. Is it because the div is actually empty?