Regex / imap help please?

Hi

I am trying to create a Garbage collection sensor by scraping information from a reminder emal that my local authority sends me.

The email looks like the attached screenshot - i have no idea how to go about creating the correct regex to extract the data.

I’d like to extract the date and bin types

The possible “bin” iterations are “General, food, recycling” or “food, recycling”

Can anyone help please?

It’s probably better if you show us the HTML that is builds that part of the email.

Are you using the IMAP Email Content integration?

If so, it probably is looking at the emails raw data. Take a look at the source of the email and post it here.

Also, probably good to key off the sender and the subject as well as some element of the body of the email.

Thanks for the replies - yes I am using imap sensor with sender specified.

Here is HTML from email

<html>
<head>
    <style type=3D"text/css">
        .center {
            display: block;
            margin-left: auto;
            margin-right: auto;
            width: 50%;
        }

        /* /\/\/\/\/\/\/\/\/ CLIENT-SPECIFIC STYLES /\/\/\/\/\/\/\/\/ */
        #outlook a {
            padding: 0;
        }
        /* Force Outlook to provide a "view in browser" message */
        .ReadMsgBody {
            width: 100%;
        }

        .ExternalClass {
            width: 100%;
        }
            /* Force Hotmail to display emails at full width */
            .ExternalClass, .ExternalClass p, .ExternalClass span, .Externa=
lClass font, .ExternalClass td, .ExternalClass div {
                line-height: 100%;
            }
        /* Force Hotmail to display normal line spacing */
        body, table, td, p, a, li, blockquote {
            -webkit-text-size-adjust: 100%;
            -ms-text-size-adjust: 100%;
        }
        /* Prevent WebKit and Windows mobile changing default text sizes */
        table, td {
            mso-table-lspace: 0pt;
            mso-table-rspace: 0pt;
        }
        /* Remove spacing between tables in Outlook 2007 and up */
        img {
            -ms-interpolation-mode: bicubic;
        }
        /* Allow smoother rendering of resized image in Internet Explorer *=
/

        /* /\/\/\/\/\/\/\/\/ RESET STYLES /\/\/\/\/\/\/\/\/ */
        body {
            margin: 0;
            padding: 0;
            color: black;
        }

        img {
            border: 0;
            height: auto;
            line-height: 100%;
            outline: none;
            text-decoration: none;
        }

        table {
            border-collapse: collapse !important;
        }

        body, #bodyTable, #bodyCell {
            height: 100% !important;
            margin: 0;
            padding: 0;
            width: 100% !important;
        }

        /* /\/\/\/\/\/\/\/\/ TEMPLATE STYLES /\/\/\/\/\/\/\/\/ */

        /* =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Page Styles =3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D */

        /**
            * @tab Page
            * @section background style
            * @tip Set the background color and top border for your email. =
You may want to choose colors that match your company's branding.
            * @theme page
            */
        body, #bodyTable {
            /*@editable*/
            /*Green background-color: #01383f; */
            background-color: transparent;
        }

        /* =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Body Styles =3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D */
    </style>
</head>
<body style=3D"margin: 0px; font-family: 'Arial', 'Helvetica';">

    <table cellpadding=3D"0" cellspacing=3D"0" style=3D"background: #084b5c=
; width: 100%; padding: 0px;">
        <tr>
            <td><img src=3D"https://www.cardiff.gov.uk/ENG/PublishingImages=
/Email_banner_English.jpg" style=3D"width: 100%; max-width: 500px; margin: =
0px;" /></td>
        </tr>
    </table>

    <br />
    <br />

    <p style=3D"font-size: 14px; margin: 20px;">Hello</p>
    <p style=3D"font-size: 14px; margin: 20px;">This is your recycling and =
waste collection reminder for <b>28Paliment Square, , Cardiff, CF55 3nh</b>.</p>
    <p style=3D"font-size: 14px; margin: 20px;">Your next collection is on =
<b>Monday 31 August.</b></p>

    <table width=3D"100%" border=3D"0" cellpadding=3D"15" style=3D"margin: =
20px; width: 90%; max-width: 700px;">
        <tbody>
            <tr align=3D"center" style=3D"background-color: #e0e0e0;">
                <td colspan=3D"3">What to put out for collection</td>
            </tr>
            <tr align=3D"center" style=3D"background-color: #ececec;">
                <td><img src=3D'https://www.cardiff.gov.uk/ENG/resident/Rub=
bish-and-recycling/When-are-my-bins-collected/PublishingImages/10.png' styl=
e=3D'width: 100%;max-width: 130px;' /></td><td><img src=3D'https://www.card=
iff.gov.uk/ENG/resident/Rubbish-and-recycling/When-are-my-bins-collected/Pu=
blishingImages/22.png' style=3D'width: 100%;max-width: 130px;' /></td><td><=
img src=3D'https://www.cardiff.gov.uk/ENG/resident/Rubbish-and-recycling/Wh=
en-are-my-bins-collected/PublishingImages/19.png' style=3D'width: 100%;max-=
width: 130px;' /></td>
            </tr>
            <tr align=3D"center" style=3D"font-weight: bold; background-col=
or: #dedede;">
                <td>General</td><td>Food</td><td>Recycling</td>
            </tr>
        </tbody>
    </table>

    <br />

Just add this value template to your IMAP content sensor:

    value_template: >-
      {% if 'General' in body %}
        General, Food, Recycling
      {% else %}
        Food, Recycling
      {% endif %}

Ah. Missed the bit about date.

1 Like

This is awesome thank you - just to be cheeky - do you happen to know how I would also get the collection date to come through?

That’s not so easy. I’m still thinking about it.

1 Like

I believe a regex pattern could work. Sadly I can’t get all of the string in to the template tool because there is both ’ and " in the string.
But with this limited part we get a result.


{{ str | regex_findall_index("\w+day \d{1,2} \w+") }}

It searches for any characters ending with “day”, a space, one or two digits, then another set of characters.
Perharps it will capture something else in the real string but this is the best I can do for now.

EDIT: It can actually be expanded to {{ str | regex_findall_index("<b>(\w+day \d{1,2} \w+)\.</b></p>") }}
That way it has to have the correct HTML tags around it too which make it less likely to be captured wrong.

Regex101 says the pattern should hold. https://regex101.com/r/lRodTl/1

Thank you for the REGEX string - would you happen to know how i fit that together with the rest of the script above?

No not really.
I don’t use IMAP in HA.
Maybe Tom knows.

Try this:

    value_template: >-
      {% if 'General' in body %}
        General, Food, Recycling on {{ body | regex_findall_index("<b>(\w+day \d{1,2} \w+)\.</b></p>") }} 
      {% else %}
        Food, Recycling on {{ body | regex_findall_index("<b>(\w+day \d{1,2} \w+)\.</b></p>") }} 
      {% endif %}
1 Like

Has anyone got any personal experience with imap content sensor? Mine seems to work fine for a day and then the sensor status changes to unknown.

So i get an email reminder once a week - once the email is recived the sensor staus changes to the correct values - i need this value to remain until i get the next email a week later.

Any ideas?

1 Like

So you forgot to take out the trash… :slight_smile:

I just recently started looking at it in Node red so not very experienced with it.

But perhaps you can use an automation to “copy” the vale to an input_text?

I always have the Wife to “gently” remind me when home assistant fails :slight_smile:

Thank you for your idea, sounds perfect - just need to work out how to implement that now - oh well off down another rabitt hole i go :rofl:

Have an automation trigger on state change of the sensor.
Condition template “not unknown” {{ state != "unknown" }}
Action set input_text.

1 Like

I ended up creating a MQTT sensor and using Node Red to populate the collection type and keep it persistant, This working fine but I cant get that regex string t work for the date. I have tried to create a different sensor - the example above doesnt get accepted in YAML apparently due to the backslashes - so i have tried this

 value_template: >-
      {{ str | regex_findall_index("\\w+day \\d{1,2} \\w+") }}

But all i get back is a value of “Unknown”

I doubt you should use double backslash since that means “escape”.
The backslash has a “special meaning” and by using two of them you say “this backslash is not special” which is what you do not want.
You want the “special backslash”

But if you did it in node red? Did you use a function node?
You can use JavaScript in function nodes and have it return pretty much anything you want.