Help wanted with templating

Hello

i am trying to get dates from a pdf with ha-pdf from github. i have the regex working and now trying to make a template to get dates and show me the next date

{% set today = now().date() %}
{% set month_map = {
  'jan': 'Jan', 'feb': 'Feb', 'mar': 'Mar', 'apr': 'Apr',
  'maj': 'May', 'jun': 'Jun', 'jul': 'Jul', 'aug': 'Aug',
  'sep': 'Sep', 'okt': 'Oct', 'nov': 'Nov', 'dec': 'Dec'
} %}
{% set dates = ['21 jan 2025', '04 feb 2025', '18 feb 2025', '04 mar 2025', '18 mar 2025', '01 apr 2025', '15 apr 2025', '29 apr 2025', '13 maj 2025', '27 maj 2025', '10 jun 2025', '24 jun 2025', '08 jul 2025', '22 jul 2025', '05 aug 2025'] %}

{% set next_date = none %}
{% for date_str in dates %}
  {% set parts = date_str.split(' ') %}
  {% set day = parts[0] %}
  {% set month = month_map[parts[1] | lower] %}
  {% set year = parts[2] %}
  {% set match_date = strptime(day + ' ' + month + ' ' + year, '%d %b %Y').date() %}
  {% if match_date > today and next_date is none %}
    {% set next_date = match_date %}
  {% endif %}
{% endfor %}

{% if next_date %}
  Next match date: {{ next_date }}
{% else %}
  No future dates found
{% endif %}

Here is the code i am trying, and it gives me “No future dates found”
Does anyone have and idea?

It’s a scoping issue. The update you make to next_date inside the loop does not survive the next iteration.

You need to use a namespace (see the end of this section):

{% set today = now().date() %}
{% set month_map = {
  'jan': 'Jan', 'feb': 'Feb', 'mar': 'Mar', 'apr': 'Apr',
  'maj': 'May', 'jun': 'Jun', 'jul': 'Jul', 'aug': 'Aug',
  'sep': 'Sep', 'oct': 'Oct', 'nov': 'Nov', 'dec': 'Dec'
} %}
{% set dates = ['21 jan 2025', '04 feb 2025', '18 feb 2025', '04 mar 2025', '18 mar 2025', '01 apr 2025', '15 apr 2025', '29 apr 2025', '13 maj 2025', '27 maj 2025', '10 jun 2025', '24 jun 2025', '08 jul 2025', '22 jul 2025', '05 aug 2025'] %}

{% set ns = namespace(next_date = none) %}
{% for date_str in dates %}
  {% set parts = date_str.split(' ') %}
  {% set day = parts[0] %}
  {% set month = month_map[parts[1] | lower] %}
  {% set year = parts[2] %}
  {% set match_date = strptime(day + ' ' + month + ' ' + year, '%d %b %Y').date() %}
  {% if match_date > today and ns.next_date is none %}
    {% set ns.next_date = match_date %}
  {% endif %}
{% endfor %}

{% if ns.next_date %}
  Next match date: {{ ns.next_date }}
{% else %}
  No future dates found
{% endif %}
{% set value = ['21 jan 2025', '04 feb 2025', '18 feb 2025', '04 mar 2025', '18 mar 2025', '01 apr 2025', '15 apr 2025', '29 apr 2025', '13 maj 2025', '27 maj 2025', '10 jun 2025', '24 jun 2025', '08 jul 2025', '22 jul 2025', '05 aug 2025'] %}

      {% set today = now().date() %}
      {% set month_map = {
        'jan': 'Jan', 'feb': 'Feb', 'mar': 'Mar', 'apr': 'Apr',
        'maj': 'May', 'jun': 'Jun', 'jul': 'Jul', 'aug': 'Aug',
        'sep': 'Sep', 'oct': 'Oct', 'nov': 'Nov', 'dec': 'Dec'
      } %}
      {% set dates = value | regex_findall('(\\d{2} \\w{3} \\d{4})') %}

      {% set ns = namespace(next_date = none) %}
      {% for date_str in dates %}
      {% set parts = date_str.split(' ') %}
      {% set day = parts[0] %}
      {% set month = month_map[parts[1] | lower] %}
      {% set year = parts[2] %}
      {% set match_date = strptime(day + ' ' + month + ' ' + year, '%d %b %Y').date() %}
      {% if match_date > today and ns.next_date is none %}
        {% set ns.next_date = match_date %}
      {% endif %}
      {% endfor %}

      {% if ns.next_date %}
        {{ as_timestamp(ns.next_date) | timestamp_custom('%a %d %b %Y') }}
      {% else %}
        Uppdatera PDF för nästa år
      {% endif %}

This works in the template editor

  - platform: pdf
    name: Tunna 1
    file_path: sopor/sop_2025.pdf
    regex_search: 'Fyrfackskärl 1[\s\S]*?Fyrfackskärl 2'
    regex_match_index: 0
    value_template: >
      {% set today = now().date() %}
      {% set month_map = {
        'jan': 'Jan', 'feb': 'Feb', 'mar': 'Mar', 'apr': 'Apr',
        'maj': 'May', 'jun': 'Jun', 'jul': 'Jul', 'aug': 'Aug',
        'sep': 'Sep', 'oct': 'Oct', 'nov': 'Nov', 'dec': 'Dec'
      } %}
      {% set dates = value | regex_findall('(\\d{2} \\w{3} \\d{4})') %}

      {% set ns = namespace(next_date = none) %}
      {% for date_str in dates %}
      {% set parts = date_str.split(' ') %}
      {% set day = parts[0] %}
      {% set month = month_map[parts[1] | lower] %}
      {% set year = parts[2] %}
      {% set match_date = strptime(day + ' ' + month + ' ' + year, '%d %b %Y').date() %}
      {% if match_date > today and ns.next_date is none %}
        {% set ns.next_date = match_date %}
      {% endif %}
      {% endfor %}

      {% if ns.next_date %}
        {{ as_timestamp(ns.next_date) | timestamp_custom('%a %d %b %Y') }}
      {% else %}
        Uppdatera PDF för nästa år
      {% endif %}

But this does not work, i dont even get a sensor out of it

Where have you put that code? Can you provide a link to “ha-pdf from github” as it’s not obvious to me what repository that is.

https://github.com/emcniece/ha_pdf

Show me how that code is integrated into configuration.yaml: is it directly in that file or !included into another file?

Have you done a full restart?

Any errors relating to ha-pdf in the logs?

No !included

Full restart done

Logger: homeassistant.components.sensor
Källa: helpers/entity_platform.py:737
integration: Sensor (dokumentation, ärenden)
Inträffade först: 13:38:13 (1 händelser)
Senast loggade: 13:38:13

pdf: Error on device update!
Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/helpers/entity_platform.py", line 737, in _async_add_entity
    await entity.async_device_update(warning=False)
  File "/usr/src/homeassistant/homeassistant/helpers/entity.py", line 1320, in async_device_update
    await hass.async_add_executor_job(self.update)
  File "/usr/local/lib/python3.13/concurrent/futures/thread.py", line 59, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/config/custom_components/ha_pdf/sensor.py", line 130, in update
    matched_index = matches[match_index]
                    ~~~~~~~^^^^^^^^^^^^^
IndexError: no such group

Found this after a restart now

Assuming this is not due to a bug in the integration you are using, either your regular expression is not correct, or your match index is not correct.

Your configuration.yaml sets a regex_match_index: 1 parameter. Is that supposed to be the first/only match or the second match?

I cannot find the documentation for this integration, but my guess would be that you are looking for the first/only match but this parameter is zero-based.

Edit: Okay I missed that you linked to the documentation for the integration above, before looking at it I assumed that was your source data. And the situation was sort of but not exactly what I expected/guessed. As with most RegEx implementations, the documentation states this:

Index 0 returns the whole matched string. Indexes >= 1 return valid capture groups.

Your expression does not contain any capture groups, thus indexes other than 0 will always fail.


image of the PDF that i want information from

Github link

  - platform: pdf
    name: Tunna 1
    file_path: sopor/sop_2025.pdf
    regex_search: 'Fyrfackskärl 1[\s\S]*?Fyrfackskärl 2'
    regex_match_index: 0
    value_template: >
      {% set today = now().date() %}
      {% set month_map = {
        'jan': 'Jan', 'feb': 'Feb', 'mar': 'Mar', 'apr': 'Apr',
        'maj': 'May', 'jun': 'Jun', 'jul': 'Jul', 'aug': 'Aug',
        'sep': 'Sep', 'okt': 'Oct', 'nov': 'Nov', 'dec': 'Dec'
      } %}
      {% set dates = value | regex_findall(find='(\d{2})\s([a-zA-Z]{3})\s(\d{4})') %}

      {% set ns = namespace(next_date = none) %}
      {% for date_str in dates %}
      {% set parts = date_str.split(' ') %}
      {% set day = parts[0] %}
      {% set month = month_map[parts[1] | lower] %}
      {% set year = parts[2] %}
      {% set match_date = strptime(day + ' ' + month + ' ' + year, '%d %b %Y').date() %}
      {% if match_date > today and ns.next_date is none %}
        {% set ns.next_date = match_date %}
      {% endif %}
      {% endfor %}

      {% if ns.next_date %}
        {{ as_timestamp(ns.next_date) | timestamp_custom('%a %d %b %Y') }}
      {% else %}
        Uppdatera PDF för nästa år
      {% endif %}

So does it work now or…? See my edited answer above as for why the integration would throw the error. I would remove the regex_match_index parameter entirely as it defaults to 0.

  - platform: pdf
    name: Tunna 1
    file_path: sopor/sop_2025.pdf
    regex_search: 'Fyrfackskärl 1[\s\S]*?Fyrfackskärl 2'
    regex_match_index: 0
    value_template: >
      {% set today = now().date() %}
      {% set month_map = {
        'jan': 'Jan', 'feb': 'Feb', 'mar': 'Mar', 'apr': 'Apr',
        'maj': 'May', 'jun': 'Jun', 'jul': 'Jul', 'aug': 'Aug',
        'sep': 'Sep', 'okt': 'Oct', 'nov': 'Nov', 'dec': 'Dec'
      } %}
      {% set dates = value | regex_findall(find='(\d{2} \w{3} \d{4})') %}

      {% set ns = namespace(next_date = none) %}
      {% for date_str in dates %}
      {% set parts = date_str.split(' ') %}
      {% set day = parts[0] %}
      {% set month = month_map[parts[1] | lower] %}
      {% set year = parts[2] %}
      {% set match_date = strptime(day + ' ' + month + ' ' + year, '%d %b %Y').date() %}
      {% if match_date > today and ns.next_date is none %}
        {% set ns.next_date = match_date %}
      {% endif %}
      {% endfor %}

      {% if ns.next_date %}
        {{ as_timestamp(ns.next_date) | timestamp_custom('%a %d %b %Y') | replace('Mon', 'Mån') | replace('Tue', 'Tis') | replace('Wed', 'Ons') | replace('Thu', 'Tors') | replace('Fri', 'Fre') | replace('May', 'Maj') | replace('Oct', 'Okt')  }}
      {% else %}
        Uppdatera PDF för nästa år
      {% endif %}

Here is the working code, i changed to 0 and then i changed the regex also.