I am trying to extract data from usage alerts that my electric provider sends out daily. I’d like to, at some point, add them up and be kind of a running monthly total that I can reference. Anyway, I am having trouble using the templates to accurately extract the data I need from the emails. The emails look like this:
Dear Customer,
The usage for account 123456789 is 63() kWh which is approximately 6.96 dollars.
Seems simple enough, but I have tried splitting on spaces and extracting the nth value, and for whatever reason, the splits aren’t adding up or pulling two words. The below template returns “6.96 dollars.” and not “6.96” as I would expect:
This should work with the sample data you provided and any other data that contains a number followed by the word “dollars”.
value_template: "{{ value | regex_findall_index(find='(\d+\.?\d+) dollars', index=0, ignorecase=False) }}"
This regex pattern
(\d+\.?\d+) dollars
finds one or more digits followed by an optional decimal point followed by one or more digits. Therefore it will match numbers such as:
6.96
1234.50
75
The number must be followed by a space character and the word dollars. If all requirements are met, the matching number is stored in a list. That’s because \d+\.?\d+ is wrapped in parentheses (\d+\.?\d+) which means it’s a regex capture group. The option index=0 means the function should report the zeroth item in the list (meaning the first item because lists are zero-based).
NOTE:
To provide further insight into how regex_findall_index works, if you were to remove the space character and the word dollars from the regex pattern it would find all three numbers in your sample data and store them in a list:
0 -> 123456789
1 -> 63
2 -> 6.96
So index=0 means the function would return 123456789.