Scrape sensor improved - scraping multiple values

I have one last question, inside the div, there is a
tag that is been filtered, is it been replaced by ‘/r/n’ inside the string ?

  - name: cinema_bdarcy
    resource: "https://www.boisdarcy.fr/cinema-de-la-grange-de-la-tremblaye.aspx"
    scan_interval: 3600
    sensor:
      - unique_id: cinema_prog_bdarcy
        name: "Cinema de Bois d'Arcy"
        value_template: ""
        attributes:
          - name: jour
            select_list: "#field_blagenda_evenements_nid div .node .field_date_contenu .jour"
            value_template: |
              {%-set value = value.split(",")-%}
              {%for x in value%}
              |{{x}}
              {%-endfor-%}
          - name: mois
            select_list: "#field_blagenda_evenements_nid div .node .field_date_contenu .mois"
            value_template: |
              {%-set value = value.split(",")-%}
              {%for x in value%}
              |{{x}}
              {%-endfor-%}
          - name: annee
            select_list: "#field_blagenda_evenements_nid div .node .field_date_contenu .annee"
            value_template: |
              {%-set value = value.split(",")-%}
              {%for x in value%}
              |{{x}}
              {%-endfor-%}
          - name: titre
            select_list: "#field_blagenda_evenements_nid div .node .field_contenu .titre a"
            value_template: |
              {%-set value = value.split(",")-%}
              {%for x in value%}
              |{{x}}
              {%-endfor-%}
          - name: heure
            select_list: "#field_blagenda_evenements_nid div .node .field_contenu .field-name-field-adresse-unique"
            value_template: |
              {%-set value = value.split(",")-%}
              {%for x in value%}
              |{{x | trim}}
              {%-endfor-%}
          - name: lien
            select_list: "#field_blagenda_evenements_nid div .node .field_contenu .titre a"
            attribute: href
            value_template: |
              {%-set value = value.split(",")-%}
              {%for x in value%}
              |{{x}}
              {%-endfor-%}

The problem is with the ‘heure’ attribut where all the ‘,’ are

Thank you very much Daniel, I am going to try the latest release and change the separator

Hello Daniel,

I am working on a gas module to find the cheapest gas station around. I use the viamichelin.fr that track 95000 station all over Europe, so it could may be useful to others here.

They refresh the data every hour or so and looks very accurate since its mandatory in France to declare the gas price to the government.

We could also track the money spent on gas each month, track the gas prices, …

I love multiscrape, its amazing fast, since there is only one page to scrape. Multiscrape works great with a table, but here some station have the E10 but others wont carry GPL for exemple. So for some stations I will have up to 6 gas type and some others just 3.

Here is how I extract the data (not the full code, I have one sensor per gas type)

multiscrape:
  - name: gas_provider_michelin
    resource: "https://www.viamichelin.fr/web/Stations-service?tid=city-1338248&fuelsLastUpdate=72&fuelsType=5"
    scan_interval: 3600
    list_separator: '|'
    sensor:
      - unique_id: gas_provider_michelin
        name: "Gas prices via Michelin"
        value_template: 0.0
        unit_of_measurement: '€/L'
        icon: mdi:gas-station
        attributes:
          - name: max_station
            value_template: 20
          - name: selected
            value_template: 5
          - name: city_code
            value_template: "city-1338248"
          - name: gas_name
            value_template: "1,B7,Gazole,b7_gazole_price|2,E5,SP 95,e5_sp95_price|3,E85,Ethanol,e85_price|4,GPL,LPG,lpg_gpl_price|5,E10,SP 95-E10,e10_sp95_price|6,E5,SP 98,e5_sp98_price"
          - name: city
            select_list: ".shared_address_search .searchbox .truncate"
            attribute: value
            value_template: |
              {{value|title}}
          - name: station
            select_list: ".poi-item-gasStation .poi-item-name"
            value_template: |
              {%-set value = value.split("|")-%}
              {%-for station in value-%}
              {%- if loop.index <= 20 -%}
              {{station|title}}|
              {%- endif -%}
              {%-endfor-%}
          - name: address
            select_list: ".poi-item-gasStation .poi-item-details-address"
            value_template: |
              {%-set value = value.split("|")-%}
              {%-for add in value-%}
                {%- if loop.index <= 20 -%}
              {{add | title}}|
                {%- endif -%}
              {%-endfor-%}
          - name: zip_code
            select_list: ".poi-item-gasStation .poi-item-details-address"
            value_template: |
              {%-set value = value.split("|")-%}
              {%-for zip in value-%}
              {%- if loop.index <= 20 -%}
                {%- set list1 = zip.split(',') -%}
                {%- if list1[1] -%}
                {{list1[1] | title}}|
                {%- else-%}
              | -
                {%- endif -%}
              {%- endif -%}
              {%-endfor-%}
          - name: e10_sp95_price
            select_list: ".poi-item-fuel-price-5 .poi-item-fuel-value"
            value_template: |
              {%- set value = value.split("|") -%}
              {%- for p in value -%}
              {%- if loop.index <= 20 -%}
              {%- set price = p | replace(',','.') | float(0) -%}
              {{'%0.3f'|format(price|float)}}|
              {%- endif -%}
              {%- endfor -%}
          - name: lpg_gpl_price
            select_list: ".poi-item-fuel-price-4 .poi-item-fuel-value"
            value_template: |
              {%- set value = value.split("|") -%}
              {%- for p in value -%}
              {%- if loop.index <= 20 -%}
              {%- set price = p | replace(',','.') | float(0) -%}
              {{'%0.3f'|format(price|float)}}|
              {%- endif -%}
              {%- endfor -%}
          - name: dte_update
            select_list: ".poi-item-fuel-price-update em"
            value_template: |
              {%- set value = value.split("|") -%}
              {%- for d in value -%}
              {%- if loop.index <= 20 -%}
              {{d | replace(' · ', '')}}|
              {%- endif -%}
              {%- endfor -%}

      - unique_id: gas_e10_sp95_min
        name: "Essence E10 prix minimum"
        select_list: ".poi-item-fuel-price-5 .poi-item-fuel-value"
        value_template: |
          {%- set ns = namespace(min_p = 999.0 | float ) -%}
          {%- set value = value.split("|") -%}
          {%- for x in value -%}
            {% if loop.index <= 20 %}
              {%- set price = x | replace(',','.') | float(0) -%}
              {%- if price < ns.min_p  -%}
                {%- set ns.min_p = price -%}
              {%- endif -%}  
            {%- endif -%}
          {%- endfor -%}
          {{ns.min_p}}
        unit_of_measurement: '€/L'
        icon: mdi:gas-station
        attributes:
          - name: station_index
            select_list: ".poi-item-fuel-price-5 .poi-item-fuel-value"
            value_template: |
              {%- set ns = namespace(min_p = 999.0 | float, index=0 ) -%}
              {%- set value = value.split("|") -%}
              {%- for x in value -%}
                {% if loop.index <= 20 %}
                  {%- set price = x | replace(',','.') | float(0) -%}
                  {%- if price < ns.min_p  -%}
                    {%- set ns.min_p = price -%}
                    {%- set ns.index = loop.index -%}
                  {%- endif -%}  
                {%- endif -%}
              {%- endfor -%}
              {{ns.index}}
          - name: station_name
            value_template: |
              {%- set arr_str = state_attr('sensor.gas_provider_michelin', 'e10_sp95_price') -%}
              {%- set prices = arr_str.split("|") -%}
              {%- set ns = namespace(min_p = 0.0, index = -1 ) -%}
              {%- for price in prices -%}
                  {%- if price|float(0) < ns.min_p|float(0) -%}
                    {%- set ns.min_p = price -%}
                    {%- set ns.index = loop.index -%}
                {%- endif -%}
              {%- endfor -%}
              {%- if ns.index > -1 -%}
                {%- set arr_str = state_attr('sensor.gas_provider_michelin', 'station_name') -%}
                {%- set stations = arr_str.split("|") -%}
                {{ stations[ns.index]}}
              {%- endif -%}
          - name: station_adress
            value_template: |
              {%- set arr_str = state_attr('sensor.gas_provider_michelin', 'e10_sp95_price') -%}
              {%- set prices = arr_str.split("|") -%}
              {%- set ns = namespace(min_p = 0.0, index = -1 ) -%}
              {%- for price in prices -%}
                  {%- if price|float(0) < ns.min_p|float(0) -%}
                    {%- set ns.min_p = price -%}
                    {%- set ns.index = loop.index -%}
                {%- endif -%}
              {%- endfor -%}
              {%- if ns.index > -1 -%}
                {%- set arr_str = state_attr('sensor.gas_provider_michelin', 'address') -%}
                {%- set stations = arr_str.split("|") -%}
                {{ stations[ns.index]}}
              {%- endif -%}

Is that possible to use an array in select_list:

 select_list: ".poi-item-fuel-price-4 .poi-item-fuel-value", ".poi-item-fuel-price-update em"

I could then match the ID of the store with each gas prices, then I can only show the stations that has GPL for exemple. So in one select list and x queries, I could have x arrays in return, that would allow to keep the context and make it safe.

Or mat be there is an other way ?

Is there any global variable I could use between queries, to limit the number of station for exemple, its hard codded to 20 and it sucks.

Thanks

I think this is a quite an extreme case :slight_smile:
I’m afraid it’s not possible to use arrays in select_list or set global variables…

Maybe time to write your own Michelin component?

Thank you for answering me Daniel.

Yes I am thinking about it :slight_smile:

Does this addon have an option to set random interval scraping attempts? To bypass scrape protection?

gr

You can create an automation which is triggered at random times and call the multiscrape service.

1 Like

Thanks, that’s an idea

I have been using HA Multiscrape for quite some time now. But now I am getting

Read from Hobolink # Updating failed with exception: Client error '403 Forbidden' for url 'https://www.hobolink.com/p/64a1f43abba604a54534b301a2663722' For more information check: https://httpstatuses.com/403

Is it the website that blocks my attempt to scrape? If I just open the site it’s ok.

Try adding a user agent in the request header like this:

headers:
  User-Agent: "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/109.0"

Works like a charm :slight_smile: thanks!

This seems like a relative common issue, it might be worth to mention it in the wiki.

Hi Matteo,

if You need…

I just came across this custom component and wondered if it could help me to scrap the data from EnPhase server. I’m not very code conscious so not sure if this component can make it easier to achieve. I reached out a few weeks ago for assistance about the cURL and how to pull the data in to HA but never got any reply from anyone.

Here is my request for help, with the examples of where I got to. Do you think Multi Scrape could get the data in to sensors?

This is my first attempt at the config but I’m really just guessing and using snips of code from other cURL setups that I have:

multiscrape:
  - resource: https://api.enphaseenergy.com/api/v4/systems/3793806/summary?key=51[redacted]
    scan_interval: 3600 #EnPhase only get data every 60 mins.
    headers:
      Authorization: Bearer 1NiJ9[redacted]
      

sensor:
  - name : "Ustou EnPhase system_id"
    value_template: >
        {%- for dict_item in value_json.dataList -%}
            {%- if dict_item.key == "system_id"-%} 
              {{dict_item.value}} 
            {%- endif -%}
        {%- endfor -%}
      
  - name : "Ustou EnPhase Current Consumption Wh"
    value_template: >
        {%- for dict_item in value_json.dataList -%}
          {%- if dict_item.key == "current_power"-%} 
            {{dict_item.value}} 
          {%- endif -%}
        {%- endfor -%}
    unit_of_measurement: 'Wh'
    device_class: energy
      
  - name : "Ustou EnPhase Energy Produced lifetime Wh"
    value_template: >
        {%- for dict_item in value_json.dataList -%}
          {%- if dict_item.key == "energy_lifetime"-%} 
            {{dict_item.value}} 
          {%- endif -%}
        {%- endfor -%}
    unit_of_measurement: 'Wh'
    device_class: energy
    
  - name : "Ustou EnPhase Solar Production Today Wh"
    value_template: >
        {%- for dict_item in value_json.dataList -%}
          {%- if dict_item.key == "energy_today"-%} 
            {{dict_item.value}} 
          {%- endif -%}
        {%- endfor -%}
    unit_of_measurement: 'kWh'
    device_class: energy
      
  - name : "Ustou EnPhase Number of Inverters"
    value_template: >
        {%- for dict_item in value_json.dataList -%}
          {%- if dict_item.key == "modules"-%} 
            {{dict_item.value}} 
          {%- endif -%}
        {%- endfor -%}

    
  - name : "Ustou EnPhase Total Array Size (W)"
    value_template: >
        {%- for dict_item in value_json.dataList -%}
          {%- if dict_item.key == "size_w"-%} 
            {{dict_item.value}} 
          {%- endif -%}
        {%- endfor -%}


  - name : "Ustou EnPhase System Status"
    value_template: >
        {%- for dict_item in value_json.dataList -%}
          {%- if dict_item.key == "status"-%} 
            {{dict_item.value}} 
          {%- endif -%}
        {%- endfor -%}

I’m not in a position to check your question deeper right now, but you need to redact your key and token ASAP.

Post edited to remove key and bearer.

1 Like

Thanx for your concern. I already removed sections of each the token and key so no risk. But thank you anyway.

Maybe i need to be more specific in my question here about Multiscrape. Can it handle this data that I get back from my CURL and if so how would I construct the sensors in the multiscrape configuration? (System id has been changed for protection)

{
    "system_id": 3793806,
    "current_power": 0,
    "energy_lifetime": 943,
    "energy_today": 8,
    "last_interval_end_at": 1672206300,
    "last_report_at": 1672207041,
    "modules": 0,
    "operational_at": 1671720300,
    "size_w": 0,
    "source": "meter",
    "status": "normal",
    "summary_date": "2022-12-28"
}

I don’t think you need this integration at all. If you’re getting a JSON response, just use a REST sensor. It can do what you want. Perhaps start a new post. You’re welcome to tag me for help, if needed, but I think you can just follow the examples in the docs.

1 Like

Looks like your value_templates are overly complicated. Doesn’t this work?

value_template: "{{ value_json['system_id'] }}"

Also note that the Bearer token will probably expire so you need to update your configuration each time that happens with a new token…

1 Like

Hi all,

Is there someone who can help me to get the wright select?

My config looks like this:

- resource: https://mijn.essent.nl/mijn-contract/tarieven?contract=000250578938&address=0700596204&contract_id=4002108673
  authentication: basic
  username: !secret username
  password: !secret password
  scan_interval: 86400
  log_response: true
  sensor:
    - unique_id: essent_kwh_prijs
      name: "kWh prijs"
#      select: "div > div:nth-child(2)" 
      select: "body > app-root > wl-jss-route > sc-placeholder > wl-row-container > div > section > wl-colset-mijn-363 > wl-experience-editor > article > div.col-12.col-lg-6.order-lg-1 > div > wl-show-tariffs-container > wl-experience-editor > wl-show-tariffs > wl-experience-editor > wl-show-tariffs-totals > wl-experience-editor > div > div > div.d-flex.justify-content-between.mb-1 > div.text-right.text-dark.text-nowrap"
      unit_of_measurement: €
      value_template: '{{ value.split("€&nbsp;")[1].split("/")[0]}}'

This is what I get when I use the Copy → Copy selector
body > app-root > wl-jss-route > sc-placeholder > wl-row-container > div > section > wl-colset-mijn-363 > wl-experience-editor > article > div.col-12.col-lg-6.order-lg-1 > div > wl-show-tariffs-container > wl-experience-editor > wl-show-tariffs > wl-experience-editor > wl-show-tariffs-totals > wl-experience-editor > div > div > div.d-flex.justify-content-between.mb-1 > div.text-right.text-dark.text-nowrap

Hope that someone can help me.
Thanx in advance!

Hi,

I am trying to scrape this page:

https://www.in-pocasi.cz/predpoved-pocasi/cz/praha/praha-324/

There are two tables/charts at the bottom of the page, which work similarly with tabs on top. In the first one (“Předpověď na další dny”), I can get different data by clicking the day of the week tabs.

Scraping the data from the first tab is fine – e.g. I can use the CSS selector: #h3_0_t to get the first temperature. But if I click on the second tab, the selector for the corresponding data is the same.

I’ve looked in the log of multiscrape, and all the data is present there, it’s not downloaded dynamically, but it is located in different parts of a table . Is there some way of getting at it with multiscrape?

Is there any way to use index: in multiscrape (like in regular scrape)?

Best regards,

Michał