Regex Template nth occurance for HTML injection

dasforsyth · April 10, 2024, 9:19pm

Hi All,

I’ve got a rest sensor that scrapes the forecast from a website’s endpoints.

The only problem is that this comes out as a large blob of HTML:

<p><b>Nearby coastal warnings:</b> Gale warning for Raglan, Nil for Colville</p><p><b>Today: </b>Northeast 15 knots, rising to 20 knots gusting 30 knots this evening. Sea becoming moderate this evening, choppy when wind opposes tide. Fine.</p><p><b>Thursday: </b>Northeast 20 knots gusting 30 knots, rising to 25 knots gusting 35 knots in the morning and to 30 knots gusting 40 knots in the afternoon. Sea becoming rough in the morning. Poor visibility in morning showers, turning into rain later.</p><p><b>Friday: </b>Northerly 30 knots, easing to 25 knots early and to northwest 15 knots in the afternoon. Rain, turning to showers.</p><p><b>Saturday: </b>Northwest 15 knots, turning westerly 15 knots early. Partly cloudy, a few morning and afternoon showers.</p><p><b>Sunday: </b>Westerly 15 knots, turning southwest 15 knots early. Easing to southerly 10 knots later. Cloud clearing.</p>

This currently displays in a markdown card like so:

I’m looking to format the “Today” line (either make it bold, change colours, etc) but to do so I need to either split the blob of text into parts, or inject some HTML into it.

I’ve been trying the latter by making a template with regex_replace.

Sop far I’ve tried the following:

{{ "--HTML-GOES-HERE--" |regex_replace(find='<p><b>Today\: <\/b>', replace='<span style="color:blue"><b>Today:</b></span>', ignorecase=False) }}

and

{{ "--HTML-GOES-HERE--" |regex_replace(find='.*?<p><b>.*?/K<p><b>', replace='XXXXXXXXXXXXXX', ignorecase=False) }}

But am having a bit of trouble. The top one allows me to wrap the “Today:” in a tag so that I can style it from there, but I really want to capture the entire “Today: Northeast 15 knots, rising to 20 knots gusting 30 knots this evening. Sea becoming moderate this evening, choppy when wind opposes tide. Fine.”.

To do this I know I might need two regex_replace expressions (one to add the before Today: and another to add the  before the Thursday:.

The problem is that “Thursday:” isn’t always the next day, so I’m trying to find the second occurrence in the string of  and put it before that. I understand that the .*?.*?/K, should work, if HA used perl’s implementation of \K (to ignore everything preceding it), but it doesn’t.

My question to everyone is:

Is there a regex pattern and way to use regex_replace that can do this in one go so that I don’t need to have two regex statements? I’m thinking of something like "find the instance of Today:, then a wildcard, then find the next instance of , wrap it with span tags and keep the content in the middle the same.
If no to the above, then is there a regex pattern that can find the second occurrence of those  tags so that I can replace it with  at the end?
Finally, if there’s a better (hopefully simpler) way of doing this in HA - I’m all ears!

Cheers in advance.

dasforsyth · April 16, 2024, 5:01am

Ok, so I’ve solved my issue, by approaching it in a different way.

I’ve got the large html tagged text sitting in a sensor.

I’ve extracted out the component parts into other sensors like so:

- platform: template
  sensors:
    metservice_forecast_text_today:
      friendly_name: "Metservice Forecast - Waitemata - Today"
      entity_id: sensor.metservice_forecast_data
      value_template: "{% set string=state_attr('sensor.metservice_forecast_data', 'text') %}
{{
string.split('</p><p><b>')[1] | replace('</b>', '') | replace('<p><b>', '') | replace('</p>', '')
}}"
      attribute_templates:
        day: "{% set string=state_attr('sensor.metservice_forecast_data', 'text') %}
{% set string2= string.split('</p><p><b>')[1] | replace('</b>', '') | replace('<p><b>', '') | replace('</p>', '') %}
{{
string2.split(': ')[0]
}}"        
        forecast: "{% set string=state_attr('sensor.metservice_forecast_data', 'text') %}
{% set string2= string.split('</p><p><b>')[1] | replace('</b>', '') | replace('<p><b>', '') | replace('</p>', '') %}
{{
string2.split(': ')[1]
}}"

I just repeat this and I get a nice sensor list that I can start using in the FE:

petro · April 16, 2024, 1:21pm

I know you solved this, but you could have done this:

template:
- trigger:
  - platform: state
    entity_id: sensor.metservice_forecast_data
  variables:
    items:
      {% set value = state_attr('sensor.metservice_forecast_data', 'text') or '' %}
      {% set fmat = "<p><b>{0}: <\/b>([a-zA-Z .,0-9]+)<\/p>" %}
      {% set days = ['Today', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday'] %}
      {% set ns = namespace(items=[]) %}
      {% for day in days %}
        {% set text = value | regex_findall(fmat.format(day)) | first | default %}
        {% if text %}
          {% set ns.items = ns.items + [ {'day': day, 'text': text} ] %}
        {% endif %}
      {% endfor %}
      {{ ns.items }}
  sensor:
  - name: Weather Today
    unique_id: weather_today
    state: "{{ items[0].day }}: {{ items[0].text }}"
    attributes:
      day: "{{ items[0].day }}"
      text: "{{ items[0].text }}"
  - name: Weather {{ items[1].day }}
    unique_id: weather_day_1
    state: "{{ items[1].day }}: {{ items[1].text }}"
    attributes:
      day: "{{ items[1].day }}"
      text: "{{ items[1].text }}"
  - name: Weather {{ items[2].day }}
    unique_id: weather_day_2
    state: "{{ items[2].day }}: {{ items[2].text }}"
    attributes:
      day: "{{ items[2].day }}"
      text: "{{ items[2].text }}"
  - name: Weather {{ items[3].day }}
    unique_id: weather_day_3
    state: "{{ items[3].day }}: {{ items[3].text }}"
    attributes:
      day: "{{ items[3].day }}"
      text: "{{ items[3].text }}"
  - name: Weather {{ items[4].day }}
    unique_id: weather_day_4
    state: "{{ items[4].day }}: {{ items[4].text }}"
    attributes:
      day: "{{ items[4].day }}"
      text: "{{ items[4].text }}"

You’ll need to rename your entity_id’s. But the take away here is that your name will update and you’ll only do the calc once. It’s also easier to manage if things change.

Troon · April 16, 2024, 1:38pm

Might be worth having a look at the API service to get more machine-readable answers: there’s a free plan.

You could then build a template weather entity like mine:

dasforsyth · April 17, 2024, 2:44am

Thanks - this is exactly what I was after (and a lot more). I’ve reverted to using this as my solution as I think it’s a bit cleaner in my config than my fallback!