Value template regular expression

I have this scrape sensor setup for getting bus arrivals and i’m getting the output i want. However, it shows me some extra digits for every bus stop for some reason. Is there any way i can remove these, maybe using a value_template?

64 64 to New Forest 5 mins 433 433 to Westcoombe 17 mins 64 64 to Westcoombe 18 mins

Thanks,

Perhaps it would be more efficient to fix the Scrape Sensor so that it doesn’t report duplicate values.

You mean by getting the right css selector? Yes that would be nice, but i tried many values already…

For example this is what i’m using, it’s strange as in the web inspector it doesn’t list these bus lines twice.

  - platform: scrape
    resource: "https://tfl.gov.uk/bus/stop/490005793E/old-palace-of-john-whitgift-school/"
    name: "test"
    select: "div:nth-child(4) > div.main > div.station-details > div:nth-child(5) > div > div > div > div > div > ol"
    #scan_interval: 86400
    headers:
      User-Agent: Mozilla/5.0

Screen scraping isn’t my strong suit so I’ll offer you a template to eliminate the duplicates.

Copy-paste this into the Template Editor and experiment with it to understand how it works:

{% set x = '64 64 to New Forest 5 mins 433 433 to Westcoombe 17 mins 64 64 to Westcoombe 18 mins' %}

{% set s = x.split() %}
{% set qty = s | count %}

{% set ns = namespace(s=[]) %}
{% for i in range(0, qty)  %}
  {% set next = i+1 if i < qty else 1 %}
  {% if s[i] != s[next] %}
    {% set ns.s = ns.s + [s[i]] %}
  {% endif %}
{% endfor %}
{{ ns.s | join(' ') }}

Let me know if you need me to explain its operation.

Screenshot showing the duplicate-free result:

To use it in a Template Sensor, it would look something like this (assumes your scrape sensor is named sensor.test. It contains some extra logic to ensure the value of sensor.test is valid before it attempts to de-duplicate it.

template:
  - sensor:
      - name: "Bus Schedule"
        state: >
          {% set s = states('sensor.test') %}
          {% if s not in ['unknown', 'unavailable', 'none'] and s | length > 1 %}
            {% set s = s.split() %}
            {% set qty = s | count %}
            {% set ns = namespace(s=[]) %}
            {% for i in range(0, qty)  %}
              {% set next = i+1 if i < qty else 1 %}
              {% if s[i] != s[next] %}
                {% set ns.s = ns.s + [s[i]] %}
              {% endif %}
            {% endfor %}
            {{ ns.s | join(' ') }}
          {% else %}
            {{ s }}
          {% endif %}

Thank you very much, i’ll take this all in and see if i understand it and make it work.

Yeah, it seems a bit much but basically it’s just comparing each word to the next one and rejecting it if its neighbour is identical.

  • It uses split to convert the string to a list. It splits the string at each space character. Therefore each item in the list is simply one of the words in the original string.

  • It create an empty “global” list that will be used to receive words from the original list.

  • It loops through the list, comparing each word to the next word in the list. If the word is not identical to its neighbour, it gets appended to the global list.

  • It continues looping through the list until it reaches the end (where the end is simply the total count of words in the list).

  • Finally, it displays the global list but first it converts it back to a string (by using the join filter and inserting a space character between each word).

1 Like

Love it, much obliged.

1 Like

You’re welcome!

Please consider marking my post above with the Solution tag. It will automatically place a check-mark next to the topic’s title which signals to other users that this topic has been resolved. This helps users find answers to similar questions. For more information, refer to guideline 21 in the FAQ.

1 Like