Extract info from email with HTML tables

Hi All,

Can someone help me out with this. I receive every week a email with my shift for next week. This is send in HTML tables format.

I want extract following to add them into a HA Calendar:

Datum (date)
Dienst (will be the subject) (numbers are shift numbers, can also be GVL or R)
Begin: (start of the agenda item)
Einde: (end of the agenda item)

The rest is not needed.

Then I want add those 7 days into a HA calendar.

In the past it was plain text and this trigger sensor I used:

- trigger:
    - trigger: event
      event_type: "imap_content"
      id: "custom_event"
      event_data:
        sender: !secret imap_sender_2
        initial: true
  sensor:
    - name: "Imap import DW NS" 
      state: >-
        {% if 'Persoonlijke' in trigger.event.data["subject"] %}
          Nieuwe DW
        {% else %}
          Geen nieuwe DW
        {% endif %}
      attributes:
        Message: "{{ trigger.event.data['text'] }}"
        Date: "{{ trigger.event.data['date'] }}"
        Server: "{{ trigger.event.data['server'] }}"
        Username: "{{ trigger.event.data['username'] }}"
        Search: "{{ trigger.event.data['search'] }}"
        Folder: "{{ trigger.event.data['folder'] }}"
        Sender: "{{ trigger.event.data['sender'] }}"
        Subject: "{{ trigger.event.data['subject'] }}"
        To: "{{ trigger.event.data['headers'].get('Delivered-To', ['n/a'])[0] }}"
        Return-Path: "{{ trigger.event.data['headers'].get('Return-Path',['n/a'])[0] }}"
        Received-first: "{{ trigger.event.data['headers'].get('Received',['n/a'])[0] }}"
        Received-last: "{{ trigger.event.data['headers'].get('Received',['n/a'])[-1] }}"

And this automation to add it into calendar:

- id: "1898c692-de40-4050-8a0b-a769681f8702"
  alias: "System - New DW NS into calendar"
  triggers:
    - trigger: state
      entity_id: sensor.imap_import_dw_ns
      attribute: Message

  conditions:
    - "{{ trigger.to_state.state not in ['unknown', 'unavailable', '', 'None']}}"

  actions:
    - variables:
        # message: "{{ trigger.to_state.attributes.Message | base64_decode }}"
        # message: "{{ state_attr('sensor.imap_import_dw_ns','Message') | base64_decode }}"
        message: "{{ state_attr('sensor.imap_import_dw_ns','Message') }}"
        event_list: >-
          {%- set m_list = message.rsplit('\n')|map('trim')|reject('eq', '')
          | map('replace', '   ', ' ')
          | map('replace', '  ', ' ') 
          | map('replace', 'E 1009','E1009')
          | map('replace', '= GVL','GVL 00:01 23:59 NRP NVT')
          | select('match', '^[a-z]{2}\s\d{2}-\d{2}') | list %}
          {%- set rest_list = m_list | select('search', ' R')|list %}
          {%- set work_list = m_list | reject('search', ' R')|list %}
          {%- set ns = namespace(events=[] ) %}

          {%- for item in work_list if item.split(' ')[3] != '--' %}
            {%- set date_list = item.split(' ')[1].split('-') %}
            {%- set (month, day) = (date_list[1]|int, date_list[0]|int)%}
            {%- set date = month~'-'~day %}
            {%- set start = now().year~'-'~date ~' '~ item.split()[3]~':00'~now().strftime('%z') %}
            {%- set end = now().year~'-'~date ~' '~ item.split()[4]~':00'~now().strftime('%z') %}
            {%- set summary = item.split()[2] ~' - VerbeterMee' %}
            {%- set ns.events = ns.events + [{"summary": summary, "start": start, "end": end}] %}
          {% endfor %}
          {{ ns.events }}

    - repeat:
        for_each: "{{ event_list }}"
        sequence:
          - action: calendar.create_event
            data:
              summary: "{{ repeat.item.summary }}"
              start_date_time: "{{ repeat.item.start }}"
              end_date_time: "{{ repeat.item.end }}"
              description: "Imported from DW NS"
            target:
              entity_id: calendar.work

Below the RAW Source of the email how it send now:

Subject: Persoonlijke donderdagse week voor Week 48-2025 voor standplaats WAL

 - Amsterdam Zuid

Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3826.700.81\))

Content-Type: multipart/alternative;

boundary="Apple-Mail=_7B96B5B0-E1ED-481F-8B0E-53296D4FDE1A"

X-Apple-Auto-Saved: 1

X-Universally-Unique-Identifier: 7C68FBB7-52D4-4143-88EF-7FB33140EC56

X-Apple-Mail-Remote-Attachments: YES

From: [email protected]

X-Priority: 3 (Normal)

Resent-Message-Id: <91ABE7845D5651664323CAA9126EF1CDDB046762@QSERVER0DEPLOYM>

Resent-From: example <[email protected]>

X-Apple-Windows-Friendly: 1

Date: Thu, 20 Nov 2025 11:14:48 +0000

X-Apple-Base-Url: x-msg://2/

X-Apple-Mail-Signature: SKIP_SIGNATURE

Resent-Date: Mon, 24 Nov 2025 17:52:06 +0100

Resent-To: Peter Oudenes <[email protected]>

Message-Id: <[email protected]

X-Uniform-Type-Identifier: com.apple.mail-draft

To: <[email protected]>

 <[email protected]>

  

  

--Apple-Mail=_7B96B5B0-E1ED-481F-8B0E-53296D4FDE1A

Content-Transfer-Encoding: 7bit

Content-Type: text/plain;

charset=us-ascii

  

  

--Apple-Mail=_7B96B5B0-E1ED-481F-8B0E-53296D4FDE1A

Content-Transfer-Encoding: quoted-printable

Content-Type: text/html;

charset=us-ascii

  

<html><head></head><body dir=3D"auto" style=3D"overflow-wrap: =

break-word; -webkit-nbsp-mode: space; line-break: =

after-white-space;"><meta http-equiv=3D"content-type" =

content=3D"text/html; charset=3Dus-ascii"><div style=3D"overflow-wrap: =

break-word; -webkit-nbsp-mode: space; line-break: =

after-white-space;"><meta charset=3D"UTF-8"><table role=3D"presentation" =

border=3D"0" cellpadding=3D"0" cellspacing=3D"0" class=3D"body" =

style=3D"border-collapse: separate; width: 611px; background-color: =

rgb(244, 245, 246); caret-color: rgb(0, 0, 0); font-family: Helvetica, =

sans-serif; font-size: 16px; font-style: normal; font-variant-caps: =

normal; font-weight: 400; letter-spacing: normal; text-align: start; =

text-transform: none; white-space: normal; word-spacing: 0px; =

-webkit-text-stroke-width: 0px; text-decoration: none;"><tbody><tr><td =

style=3D"font-family: Helvetica, sans-serif; font-size: 16px; =

vertical-align: top;">&nbsp;</td><td class=3D"container" =

style=3D"font-family: Helvetica, sans-serif; font-size: 16px; =

vertical-align: top; max-width: 800px; padding: 8px 0px 0px !important; =

width: 602.09375px; margin: 0px auto !important;"><div class=3D"content" =

style=3D"box-sizing: border-box; display: block; margin: 0px auto; =

max-width: 800px; padding: 0px !important;"><table role=3D"presentation" =

border=3D"0" cellpadding=3D"0" cellspacing=3D"0" class=3D"main" =

style=3D"border-collapse: separate; width: 602.09375px; background: =

rgb(255, 255, 255); border-top-width: 1px; border-right-width: 0px =

!important; border-bottom-width: 1px; border-left-width: 0px !important; =

border-style: solid; border-color: rgb(234, 235, 237); border-image: =

none; border-radius: 0px !important;"><tbody><tr><td class=3D"wrapper" =

style=3D"font-family: Helvetica, sans-serif; font-size: 16px !important; =

vertical-align: top; box-sizing: border-box; padding: 8px =

!important;"><h3>Donderdagse Week =

48-2025</h3><br><br>Personeelsnummer: 123456<br><br><table border=3D"1" =

style=3D"border-collapse: separate; width: 586.09375px;"><thead><tr><th =

scope=3D"col">Datum</th><th scope=3D"col">Dienst</th><th =

scope=3D"col">Begin</th><th scope=3D"col">Einde</th><th =

scope=3D"col">Fun</th><th =

scope=3D"col">Stdp</th></tr></thead><tbody><tr><th scope=3D"row">ma =

24-11-2025</th><td style=3D"font-family: Helvetica, sans-serif; =

font-size: 16px !important; vertical-align: top;">6004</td><td =

style=3D"font-family: Helvetica, sans-serif; font-size: 16px !important; =

vertical-align: top;">09:00</td><td style=3D"font-family: Helvetica, =

sans-serif; font-size: 16px !important; vertical-align: =

top;">17:00</td><td style=3D"font-family: Helvetica, sans-serif; =

font-size: 16px !important; vertical-align: top;">Servicemdw cv</td><td =

style=3D"font-family: Helvetica, sans-serif; font-size: 16px !important; =

vertical-align: top;">WAL - Amsterdam Zuid</td></tr><tr><th =

scope=3D"row">di 25-11-2025</th><td style=3D"font-family: Helvetica, =

sans-serif; font-size: 16px !important; vertical-align: =

top;">6032</td><td style=3D"font-family: Helvetica, sans-serif; =

font-size: 16px !important; vertical-align: top;">13:00</td><td =

style=3D"font-family: Helvetica, sans-serif; font-size: 16px !important; =

vertical-align: top;">21:00</td><td style=3D"font-family: Helvetica, =

sans-serif; font-size: 16px !important; vertical-align: top;">Servicemdw =

cv</td><td style=3D"font-family: Helvetica, sans-serif; font-size: 16px =

!important; vertical-align: top;">WAL - Amsterdam</td></tr><tr><th =

scope=3D"row">wo 26-11-2025</th><td style=3D"font-family: Helvetica, =

sans-serif; font-size: 16px !important; vertical-align: =

top;">6033</td><td style=3D"font-family: Helvetica, sans-serif; =

font-size: 16px !important; vertical-align: top;">13:00</td><td =

style=3D"font-family: Helvetica, sans-serif; font-size: 16px !important; =

vertical-align: top;">21:00</td><td style=3D"font-family: Helvetica, =

sans-serif; font-size: 16px !important; vertical-align: top;">Servicemdw =

cv</td><td style=3D"font-family: Helvetica, sans-serif; font-size: 16px =

!important; vertical-align: top;">WAL - Amsterdam</td></tr><tr><th =

scope=3D"row">do 27-11-2025</th><td style=3D"font-family: Helvetica, =

sans-serif; font-size: 16px !important; vertical-align: top;">R</td><td =

style=3D"font-family: Helvetica, sans-serif; font-size: 16px !important; =

vertical-align: top;"></td><td style=3D"font-family: Helvetica, =

sans-serif; font-size: 16px !important; vertical-align: top;"></td><td =

style=3D"font-family: Helvetica, sans-serif; font-size: 16px !important; =

vertical-align: top;"></td><td style=3D"font-family: Helvetica, =

sans-serif; font-size: 16px !important; vertical-align: =

top;"></td></tr><tr><th scope=3D"row">vr 28-11-2025</th><td =

style=3D"font-family: Helvetica, sans-serif; font-size: 16px !important; =

vertical-align: top;">R</td><td style=3D"font-family: Helvetica, =

sans-serif; font-size: 16px !important; vertical-align: top;"></td><td =

style=3D"font-family: Helvetica, sans-serif; font-size: 16px !important; =

vertical-align: top;"></td><td style=3D"font-family: Helvetica, =

sans-serif; font-size: 16px !important; vertical-align: top;"></td><td =

style=3D"font-family: Helvetica, sans-serif; font-size: 16px !important; =

vertical-align: top;"></td></tr><tr><th scope=3D"row">za =

29-11-2025</th><td style=3D"font-family: Helvetica, sans-serif; =

font-size: 16px !important; vertical-align: top;">6003</td><td =

style=3D"font-family: Helvetica, sans-serif; font-size: 16px !important; =

vertical-align: top;">10:00</td><td style=3D"font-family: Helvetica, =

sans-serif; font-size: 16px !important; vertical-align: =

top;">18:00</td><td style=3D"font-family: Helvetica, sans-serif; =

font-size: 16px !important; vertical-align: top;">Servicemdw cv</td><td =

style=3D"font-family: Helvetica, sans-serif; font-size: 16px !important; =

vertical-align: top;">WAL - Amsterdam</td></tr><tr><th =

scope=3D"row">zo 30-11-2025</th><td style=3D"font-family: Helvetica, =

sans-serif; font-size: 16px !important; vertical-align: =

top;">6001</td><td style=3D"font-family: Helvetica, sans-serif; =

font-size: 16px !important; vertical-align: top;">09:45</td><td =

style=3D"font-family: Helvetica, sans-serif; font-size: 16px !important; =

vertical-align: top;">17:45</td><td style=3D"font-family: Helvetica, =

sans-serif; font-size: 16px !important; vertical-align: top;">Servicemdw =

cv</td><td style=3D"font-family: Helvetica, sans-serif; font-size: 16px =

!important; vertical-align: top;">WAL - Amsterdam =
</td></tr></tbody><caption>Donderdagseweek =

planning</caption></table><p style=3D"font-family: Helvetica, =

sans-serif; font-size: 16px !important; font-weight: normal; margin: 0px =

0px 16px;"><br></p><p style=3D"font-family: Helvetica, sans-serif; =

font-size: 16px !important; font-weight: normal; margin: 0px 0px =

16px;"><span =

class=3D"Apple-converted-space">&nbsp;</span></p></td></tr></tbody></table=

><div class=3D"footer" style=3D"clear: both; padding-top: 24px; =

text-align: center; width: 602.09375px;"><table role=3D"presentation" =

border=3D"0" cellpadding=3D"0" cellspacing=3D"0" style=3D"border-collapse:=

 separate; width: 602.09375px;"><tbody><tr><td class=3D"content-block" =

style=3D"font-family: Helvetica, sans-serif; font-size: 16px; =

vertical-align: top; color: rgb(154, 158, 166); text-align: =

center;"><span class=3D"apple-link" style=3D"color: rgb(154, 158, 166); =

font-size: 16px; text-align: center;">BV n.v.</td></tr></tbody></table></div></div></td><td =

style=3D"font-family: Helvetica, sans-serif; font-size: 16px; =

vertical-align: =

top;">&nbsp;</td></tr></tbody></table></div></body></html>=

  

--Apple-Mail=_7B96B5B0-E1ED-481F-8B0E-53296D4FDE1A--

Not a pretty solution but splitting the table and regexing the contents does seem to work.
I assume the R means day off or something?
The regex currently skips them but that could obviously be changed.

{% set tbl = '<html><head></head><body dir=3D"auto" style=3D"overflow-wrap: =

break-word; -webkit-nbsp-mode: space; line-break: =

after-white-space;"><meta http-equiv=3D"content-type" =

content=3D"text/html; charset=3Dus-ascii"><div style=3D"overflow-wrap: =

break-word; -webkit-nbsp-mode: space; line-break: =

after-white-space;"><meta charset=3D"UTF-8"><table role=3D"presentation" =

border=3D"0" cellpadding=3D"0" cellspacing=3D"0" class=3D"body" =

style=3D"border-collapse: separate; width: 611px; background-color: =

rgb(244, 245, 246); caret-color: rgb(0, 0, 0); font-family: Helvetica, =

sans-serif; font-size: 16px; font-style: normal; font-variant-caps: =

normal; font-weight: 400; letter-spacing: normal; text-align: start; =

text-transform: none; white-space: normal; word-spacing: 0px; =

-webkit-text-stroke-width: 0px; text-decoration: none;"><tbody><tr><td =

style=3D"font-family: Helvetica, sans-serif; font-size: 16px; =

vertical-align: top;">&nbsp;</td><td class=3D"container" =

style=3D"font-family: Helvetica, sans-serif; font-size: 16px; =

vertical-align: top; max-width: 800px; padding: 8px 0px 0px !important; =

width: 602.09375px; margin: 0px auto !important;"><div class=3D"content" =

style=3D"box-sizing: border-box; display: block; margin: 0px auto; =

max-width: 800px; padding: 0px !important;"><table role=3D"presentation" =

border=3D"0" cellpadding=3D"0" cellspacing=3D"0" class=3D"main" =

style=3D"border-collapse: separate; width: 602.09375px; background: =

rgb(255, 255, 255); border-top-width: 1px; border-right-width: 0px =

!important; border-bottom-width: 1px; border-left-width: 0px !important; =

border-style: solid; border-color: rgb(234, 235, 237); border-image: =

none; border-radius: 0px !important;"><tbody><tr><td class=3D"wrapper" =

style=3D"font-family: Helvetica, sans-serif; font-size: 16px !important; =

vertical-align: top; box-sizing: border-box; padding: 8px =

!important;"><h3>Donderdagse Week =

48-2025</h3><br><br>Personeelsnummer: 123456<br><br><table border=3D"1" =

style=3D"border-collapse: separate; width: 586.09375px;"><thead><tr><th =

scope=3D"col">Datum</th><th scope=3D"col">Dienst</th><th =

scope=3D"col">Begin</th><th scope=3D"col">Einde</th><th =

scope=3D"col">Fun</th><th =

scope=3D"col">Stdp</th></tr></thead><tbody><tr><th scope=3D"row">ma =

24-11-2025</th><td style=3D"font-family: Helvetica, sans-serif; =

font-size: 16px !important; vertical-align: top;">6004</td><td =

style=3D"font-family: Helvetica, sans-serif; font-size: 16px !important; =

vertical-align: top;">09:00</td><td style=3D"font-family: Helvetica, =

sans-serif; font-size: 16px !important; vertical-align: =

top;">17:00</td><td style=3D"font-family: Helvetica, sans-serif; =

font-size: 16px !important; vertical-align: top;">Servicemdw cv</td><td =

style=3D"font-family: Helvetica, sans-serif; font-size: 16px !important; =

vertical-align: top;">WAL - Amsterdam Zuid</td></tr><tr><th =

scope=3D"row">di 25-11-2025</th><td style=3D"font-family: Helvetica, =

sans-serif; font-size: 16px !important; vertical-align: =

top;">6032</td><td style=3D"font-family: Helvetica, sans-serif; =

font-size: 16px !important; vertical-align: top;">13:00</td><td =

style=3D"font-family: Helvetica, sans-serif; font-size: 16px !important; =

vertical-align: top;">21:00</td><td style=3D"font-family: Helvetica, =

sans-serif; font-size: 16px !important; vertical-align: top;">Servicemdw =

cv</td><td style=3D"font-family: Helvetica, sans-serif; font-size: 16px =

!important; vertical-align: top;">WAL - Amsterdam</td></tr><tr><th =

scope=3D"row">wo 26-11-2025</th><td style=3D"font-family: Helvetica, =

sans-serif; font-size: 16px !important; vertical-align: =

top;">6033</td><td style=3D"font-family: Helvetica, sans-serif; =

font-size: 16px !important; vertical-align: top;">13:00</td><td =

style=3D"font-family: Helvetica, sans-serif; font-size: 16px !important; =

vertical-align: top;">21:00</td><td style=3D"font-family: Helvetica, =

sans-serif; font-size: 16px !important; vertical-align: top;">Servicemdw =

cv</td><td style=3D"font-family: Helvetica, sans-serif; font-size: 16px =

!important; vertical-align: top;">WAL - Amsterdam</td></tr><tr><th =

scope=3D"row">do 27-11-2025</th><td style=3D"font-family: Helvetica, =

sans-serif; font-size: 16px !important; vertical-align: top;">R</td><td =

style=3D"font-family: Helvetica, sans-serif; font-size: 16px !important; =

vertical-align: top;"></td><td style=3D"font-family: Helvetica, =

sans-serif; font-size: 16px !important; vertical-align: top;"></td><td =

style=3D"font-family: Helvetica, sans-serif; font-size: 16px !important; =

vertical-align: top;"></td><td style=3D"font-family: Helvetica, =

sans-serif; font-size: 16px !important; vertical-align: =

top;"></td></tr><tr><th scope=3D"row">vr 28-11-2025</th><td =

style=3D"font-family: Helvetica, sans-serif; font-size: 16px !important; =

vertical-align: top;">R</td><td style=3D"font-family: Helvetica, =

sans-serif; font-size: 16px !important; vertical-align: top;"></td><td =

style=3D"font-family: Helvetica, sans-serif; font-size: 16px !important; =

vertical-align: top;"></td><td style=3D"font-family: Helvetica, =

sans-serif; font-size: 16px !important; vertical-align: top;"></td><td =

style=3D"font-family: Helvetica, sans-serif; font-size: 16px !important; =

vertical-align: top;"></td></tr><tr><th scope=3D"row">za =

29-11-2025</th><td style=3D"font-family: Helvetica, sans-serif; =

font-size: 16px !important; vertical-align: top;">6003</td><td =

style=3D"font-family: Helvetica, sans-serif; font-size: 16px !important; =

vertical-align: top;">10:00</td><td style=3D"font-family: Helvetica, =

sans-serif; font-size: 16px !important; vertical-align: =

top;">18:00</td><td style=3D"font-family: Helvetica, sans-serif; =

font-size: 16px !important; vertical-align: top;">Servicemdw cv</td><td =

style=3D"font-family: Helvetica, sans-serif; font-size: 16px !important; =

vertical-align: top;">WAL - Amsterdam</td></tr><tr><th =

scope=3D"row">zo 30-11-2025</th><td style=3D"font-family: Helvetica, =

sans-serif; font-size: 16px !important; vertical-align: =

top;">6001</td><td style=3D"font-family: Helvetica, sans-serif; =

font-size: 16px !important; vertical-align: top;">09:45</td><td =

style=3D"font-family: Helvetica, sans-serif; font-size: 16px !important; =

vertical-align: top;">17:45</td><td style=3D"font-family: Helvetica, =

sans-serif; font-size: 16px !important; vertical-align: top;">Servicemdw =

cv</td><td style=3D"font-family: Helvetica, sans-serif; font-size: 16px =

!important; vertical-align: top;">WAL - Amsterdam =
</td></tr></tbody><caption>Donderdagseweek =

planning</caption></table><p style=3D"font-family: Helvetica, =

sans-serif; font-size: 16px !important; font-weight: normal; margin: 0px =

0px 16px;"><br></p><p style=3D"font-family: Helvetica, sans-serif; =

font-size: 16px !important; font-weight: normal; margin: 0px 0px =

16px;"><span =

class=3D"Apple-converted-space">&nbsp;</span></p></td></tr></tbody></table=

><div class=3D"footer" style=3D"clear: both; padding-top: 24px; =

text-align: center; width: 602.09375px;"><table role=3D"presentation" =

border=3D"0" cellpadding=3D"0" cellspacing=3D"0" style=3D"border-collapse:=

 separate; width: 602.09375px;"><tbody><tr><td class=3D"content-block" =

style=3D"font-family: Helvetica, sans-serif; font-size: 16px; =

vertical-align: top; color: rgb(154, 158, 166); text-align: =

center;"><span class=3D"apple-link" style=3D"color: rgb(154, 158, 166); =

font-size: 16px; text-align: center;">BV n.v.</td></tr></tbody></table></div></div></td><td =

style=3D"font-family: Helvetica, sans-serif; font-size: 16px; =

vertical-align: =

top;">&nbsp;</td></tr></tbody></table></div></body></html>' %}


{{ tbl.split('<tr>')[4] | replace('\n','') | regex_findall('(\d{2}-\d{2}-\d{4}).*?>(\w{1,4})<.*?>(\d{2}:\d{2})<.*?>(\d{2}:\d{2})') }}

So replacing the [4] with 5, 6 and so on gives you the days.

If you want to get the R-days then use:

{{ tbl.split('<tr>')[8] | replace('\n','') | regex_findall('(\d{2}-\d{2}-\d{4}).*?>(\w{1,4})<.*?>(\d{2}:\d{2})?<.*?>(\d{2}:\d{2})?') }}

In real life you need to split the html and then iterate it or something.
This is just a proof of concept.

1 Like

Would one of the parsing utilities like Beautiful Soup be able to process this?