Removing numbers from string

So I have a string that consists of words and numbers like The fox jumped over the hedge 3,19 and went for a walk in the 3,11 woods. He then met a raven and 12,6 had a chat.

I would like to replace these numbers with comma’s, but I am quickly running into problems with just using replace. Somehow If I chain replace statements, things get messed up. Not sure if there is a better way.

If I use replace(‘3,19’,‘,’) | replace(‘3,11’,‘’) and so on, after a while I get too many commas. I tried with all available digits as well (e.g. replace(3,19,11 etc) but that also did not get the expected results.

Is there a way maybe so I could do a replace(is digit() I could do or something similar?

With your input of:

The fox jumped over the hedge 3,19 and went for a walk in the 3,11 woods. He then met a raven and 12,6 had a chat.

and your description of “replace these numbers with comma’s”, what do you want the output to look like?

You could simply remove all digits:

{{ my_string|reject('in','0123456789')|list|join }}

which would give you:

The fox jumped over the hedge , and went for a walk in the , woods. He then met a raven and , had a chat.
1 Like

I suggest you use regex_replace with pattern matching as opposed to replace with hard-coded values.

{% set x = 'The fox jumped over the hedge 3,19 and went for a walk in the 3,11 woods. He then met a raven and 12,6 had a chat.' %}
{{ x | regex_replace('(\d+,\d+ )', '') }}

3 Likes

This actually works pretty well for my use case, as it keeps the comma’s in there. Thank you very much.

Yeah, I knew in the back of my mind that it wouldn’t really scale. I’m going to keep this regular expression in my tool chest, very handy indeed.

Much obliged.

That’s also possible with regex_replace but if you leave the commas they can appear in places that aren’t where one would normally put a comma (like between the words the and woods).

Or is the example sentence merely for testing purposes and not representative of the actual data?

Yes it’s just an example sentence. The actual data is a lunch menu that has menu items like:
Pasta 9
Soup with broccoli 1,3
Petit pois
Carrots
Potatoes 5,3,11

Where the digits are allergens listed down the menu.

As I am reading out the lunch menu in TTS I ideally would like to have comma’s in there, as long as they are at the right spot as it inserts a pause. As the allergens are always listed after the menu items, I thought it would be favourable to turns those into comma’s at least.

And I can’t just replace all whitespace with comma’s either, as sometimes lunch menu items consist of multiple words and that would cause a undesired delay.

It’s not perfect, because if allergens are not listed I don’t get a pause but I’ll take it, unless there’s a better option. I thought about replacing all the menu items with the same word with same word and a comma, but that is not very dynamic and would fail if menu items ever change.

How is that data represented in Home Assistant? Is it a true list or a string delimited by newlines?

Hmm, it’s comes in the form of a response_variable from a shell command. I can address it like response_variable.content or response_variable_content[0] so I guess a list or a dict?

Would the value of content[1] be Soup with broccoli 1,3?

If it is then content is a list.

Correct. Just now I also tried piping it to | replace('\n',',') but that did not seem to make a difference. There might not be newlines in there I guess an I think it’s because this menu is passed through easyOCR and I suspect it’s reading in separate words and not adding proper line breaks. I might have to adjust the parameters of this somewhat.

{% set food = ["Pasta 9","Soup with broccoli 1,3","Petit pois","Carrots","Potatoes 5,3,11"] %}
{{ food|map('reject','in','0123456789,')|map('join')|map('trim')|list|join(', ') }}

# returns
Pasta, Soup with broccoli, Petit pois, Carrots, Potatoes

Hmm that sure looks promising. I got this for now with your changes:

    Hi person, good {{ states('sensor.time_of_day') }}. You are having {{
    response_variable.content[now().weekday()+1]|map('reject','in','0123456789,')|map('join')|map('trim')|list|join(',
    ') |
    replace('Monday','')|replace('Tuesday','')|replace('Wednesday','')|replace('Thursday','')|replace('Friday','')
    }} for lunch today

I got this instead as output and I think it’s because your list consists of separate entries, and I only use the [1] (e.g. Tuesday) entry in the list, which has one long string of menu items on it. I guess if I could stuff all the words from response_variable.content[1] in another list, this could work ?

  service_data:
    message: >-
      Hi person, good Afternoon. You are having T, u, e, s, d, a, y, , C, h, i,
      c, k, e, n, , c, u, r, r, y, , , , , , , , V, e, g, e, t, a, r, i, a, n, ,
      c, o, c, o, n, u, t, , c, u, r, r, y, , , , ., , , , S, l, i, c, e, d, ,
      c, a, r, r, o, t, s, , p, e, t, i, t, , p, o, I, S, , R, i, c, e, , S, o,
      u, p, , o, f, , t, h, e, , d, a, y, , , , , , ,  for lunch today

I possibly made a typo. But this is done with the actual lunch menu.

If it’s a list, here’s how each one of its items can be stripped of any numbers and commas then presented as a comma-delimited string.

{% set content = 
['Pasta 9',
'Soup with broccoli 1,3',
'Petit pois',
'Carrots',
'Potatoes 5,3,11'] %}

{{ content | map('regex_replace', ' ?[0-9]+|,', '')
  | list | join(', ') }}

This example uses a variable named content but obviously your template will need to use whatever is your response_variable (which, based on your latest example, appears to be response_variable.content).


EDIT

The following version generates a proper sentence that includes the word “and” before the final menu item.

Reference

Apples, oranges and bananas

Try this version:

{%- set menu = (response_variable.content[now().isoweekday()] 
  | map('regex_replace', ' ?[0-9]+|,', '') | list)[1:]
  | join(', ') | capitalize -%}
Hi person, good {{ states('sensor.time_of_day') }}. You are having {{ menu }} for lunch today.

So this one gives me :
Message malformed: template value should be a string for dictionary value @ data[‘action’][3][‘data’]

When I paste it in an automation.

Post the full error and the full service call with the template please.

Odd, because what I suggested clearly does produce a string (as seen in the screenshots).

Like petro said, we will need to see how you’re using the template in the service call. Please post the entire automation.

BTW, this topic’s title understates what’s actually trying to be achieved by your application. :slightly_smiling_face: Removing numbers is just one aspect of how you’re trying to format the raw data into a proper sentence for TTS.


EDIT

Demo of template that strips out the first list item (which I assume contains the weekday).

@123 Sure, there’s lot of things that come into play with a seeming small issue!

Ok, that was an indentation problem. Works now.

So response_variable gives me this:

context:
  id: 01HSBKT7E1Q926KWQVJB1GRE7D
  parent_id: 01HSBKT7DA8P89BJX3JX7MQZCZ
  user_id: null
response_variable:
  content:
    - 'Lunch Menu Week commencing: Monday 4th March 2024'
    - >-
      Monday Meatfree monday Pasta Homemade tomalolcheese sauce 1,13 Sweelcorn
      Soup of the day 1,6,13
    - >-
      Tuesday Chicken curry 13,14 Vegetarian coconut curry 13.14 Sliced carrots
      petit poIS Rice Soup of the day 1,6,13
    - >-
      Wednesday Savoury mince 13,14 Savoury quorn 6,11,13,14 New potatoes
      Broccoli Soup of the day 1,6,.13.
    - >-
      Thursday Egg Noodles 11,13,14, Peppers, Broccoli and leeks Petit pois Soup
      of the day 1,6,13
    - Friday Pizza bakes French fries Petit pois Sweetcorn Baked beans
    - >-
      Everyday Wholewheat fusilli 3,15 JacketSweet potatoes Fresh fruitl 1/
      Jelly (V) Fresh salad bar
    - >-
      Allergen Key: Milk , 2. Fish; 3 Gluten; 4. Peanuts; 5. Tree nuts Soya
      Sesame 8 , Lupin, Shellfish , 10. Molluscs, 11, Egg; 12. Sulphite , 13.
      Celery; Mustard "MENU IS SUBJECT TO CHANGE SHOULD INGREDIENTS BE
      UNAVAILABLE" 15 Wheat:

I can index this like response_variable.content[now().weekday()+1] which would be today. If i do I then get a long string with all the menu items for today.

I can see both solutions work great on a list with separate words/entries, but as mine is a long string (below indexed string without the comma’s), I get this instead:

 Hi person, good Afternoon. You are having T, u, e, s, d, a, y, , C, h, i,
      c, k, e, n, , c, u, r, r, y, , , , , , , , V, e, g, e, t, a, r, i, a, n, ,
      c, o, c, o, n, u, t, , c, u, r, r, y, , , , ., , , , S, l, i, c, e, d, ,
      c, a, r, r, o, t, s, , p, e, t, i, t, , p, o, I, S, , R, i, c, e, , S, o,
      u, p, , o, f, , t, h, e, , d, a, y, , , , , , ,  for lunch today

I can think of the only way to apply these solutions is by putting every word found in the entry in a list, and then doing all of the maps and joins.

Something I guess like (pseudocode I need to look up the exact format)
{% for every word in response_variable.content[I] add_to_list %}

I was hoping I could apply your :

{%- set menu = (response_variable.content[now().isoweekday()] 
  | map('regex_replace', ' ?[0-9]+|,', '') | list)[1:]
  | join(', ') | capitalize -%}| replace('Monday','')|replace('Tuesday','')|replace('Wednesday','')|replace('Thursday','')|replace('Friday','')
    }}
Hi person, good {{ states('sensor.time_of_day') }}. You are having {{ menu }} for lunch today.

on response_variable.content instead and it would work. But it doesn’t seem to. If I do that I get:

  service_data:
    message: >-
      Hi person, good Afternoon. You are having Lunch Menu Week commencing:  th
      March,  Meatfree monday Pasta Homemade tomalolcheese sauce  Sweelcorn Soup
      of the day,  Chicken curry  Vegetarian coconut curry . Sliced carrots
      petit poIS Rice Soup of the day,  Savoury mince  Savoury quorn  New
      potatoes Broccoli Soup of the day ..,  Egg Noodles  Peppers Broccoli and
      leeks Petit pois Soup of the day,  Pizza bakes French fries Petit pois
      Sweetcorn Baked beans, Everyday Wholewheat fusilli  JacketSweet potatoes
      Fresh fruitl / Jelly (V) Fresh salad bar, Allergen Key: Milk  . Fish; 
      Gluten; . Peanuts; . Tree nuts Soya Sesame   Lupin Shellfish  . Molluscs 
      Egg; . Sulphite  . Celery; Mustard "MENU IS SUBJECT TO CHANGE SHOULD
      INGREDIENTS BE UNAVAILABLE"  Wheat: for lunch today

Instead, e.g. all the content of the list. Anyway, I’m happy with the solution I have for now, and I don’t want to take up more of anyone’s time. Thanks all so far!

Unfortunately, that’s because I was making assumptions about what your data looks like. Now that I see an actual example of it (in your latest post) it’s considerably different from what I had assumed.

When I asked this question about 10 posts ago:

Would the value of content[1] be Soup with broccoli 1,3?

If it is then content is a list

Your reply was “Correct” so everything I suggested after that was based on that reply. It proved to be incorrect because, based on your latest example of the actual data, response_variable.content[1] doesn’t contain a list, it contains a string.

Monday Meatfree monday Pasta Homemade tomalolcheese sauce 1,13 Sweelcorn Soup of the day 1,6,13

In other words, response_variable.content contains a list and each item in the list contains a string (not a list). Therefore everything I suggested is unusable because each day’s individual menu items are in a single string, not in a list.

An additional detail is that the string may contain no commas (or numbers). For example:

Friday Pizza bakes French fries Petit pois Sweetcorn Baked beans

If you want to insert commas in order for TTS to pause between food items, then the algorithm would be to insert a comma before each capitalized letter. Because if you simply use reject('in', '0123456789') it produces this less than ideal phrase for TTS purposes:

Monday Meatfree monday Pasta Homemade tomalolcheese sauce , Sweelcorn Soup of the day ,,

Anyway, lesson learned; instead of asking you if it’s a list or a string, I should have asked you to post the actual data.