Logspout add-on for sending HA logs to log management systems

I think it works! Thanks Bert

1 Like

Hi

Iā€™m trying to collect multiline log statements.

My pattern would be:

^(\x1B?\[([0-9]{1,2}(;[0-9]{1,2})?)?[m|K])(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2})

The above pattern detects the shell color-prefix that is added by the default containers. Iā€™ve verified the regex with some log lines shown in the System/Log section on regex101.com - there the pattern is highlighted.

However, HA breaks the regex pattern into 2 lines after saving the addon configuration. My doubt is now that this is causing the pattern to break and not recognize the multilines anymore. My interpretation of it is that when the pattern is interpreted by Logspout during log parsing it ā€œre-assemblesā€ the pattern into one line (removing the new line shown below) but then omits the whitespace between the day and the hour ā€œ\d{2} \d{2}ā€ - hence the pattern doesnā€™t work anymore.

As a consequence, in Graylog all the lines that should be part of an error stacktrace are logged as single events.

Has anyone experienced something similar? Any suggestions how to fix this / prevent the line break? (Iā€™ve tried replacing ā€œvalue: >-ā€ with just ā€œvalue:ā€ but then apparently HA is not happy with the config.

The problem may indeed be that ā€œ>-ā€ will replace a linebreak with a space. You can see this for example by using an online yaml to json converter.

Did you try do ā€˜Edit as YAML?ā€™ Maybe HA will preserve the format you type there, although it might break whenever you use the GUI editor again. I think the GUI editor is a bit broken and only works well for simple configuration options.

I must add however that multiline logging in Logspout is quite limited even when you do get the configuration right in HA. It might work well in a more homogeneous environment with containers logging in similar format. But HA add-ons typically log in very different formats. Creating a multiline pattern for some of them without breaking the output of others may be challenging at least.

I did intend to add a feature to configure multiline logging per container. But that would require a custom configuration file (to avoid problems like you have with the configurator) and some changes to the internals of Logspout. And when I thought about this I immediately thought of some other features that would be nice. But then I would end up with a lot of customization, which in turn would require documentation and maintenance. So at least for now I decided to not go there, itā€™s probably better for most users to have a simple well-maintained add-on than a complex one which in the end could become unmaintained simply because it grew too big.

Thanks for the response.

Yes, I tried to edit it in the YAML view. However, then the editor wonā€™t let me save. I think it does some validation in the background probably.

I understand that you would like to keep the addon in a simple way.

Would you have a suggestion/workaround how one could handle multiline logs (e.g., exception stacktraces).
In a ā€œregularā€ Docker environment it would be possible to send the container logs directly to Graylog with some syslog driver (and probably deal with multiline-issues directly - although Iā€™ve never done that). However, in the somewhat ā€œprotectedā€ environment of Home Assistant Operating System this is probably not possible(? :thinking:)

For me it does save, I wonder if there is something wrong with your yaml then. This config does save at least for me (HA version 2023.11.3):

routes:
  - gelf://graylog.home:12201
env:
  - name: MULTILINE_PATTERN
    value: ^(\x1B?\[[([0-9]{1,2}(;[0-9]{1,2})?)?[m|K])(\d{4}-\d{2}\d{2}:\d{2}:\d{2})
hostname: ha

As far as I know HA only validates for valid yaml and against this schema.

On the Home Assistant Operating System you are limited Iā€™m afraid. I know the Promptail addon can be used, but Promptail unfortunately only targets Loki. Graylog can also not combine the logs on its end, so I currently donā€™t know about any other solutions or workarounds.

1 Like

Thatā€™s odd - when I tried to save it again today, this time it worked. However, the editor applied some auto-formatting afterwards and it is broken again (showing a newline where there should be none).

This input:

routes:
  - gelf://graylog.local.domain.com:12201
env:
  - name: SYSLOG_HOSTNAME
    value: homeassistant
  - name: INACTIVITY_TIMEOUT
    value: 1m
  - name: EXCLUDE_LABELS
    value: io.hass.name:Logspout addon,
  - name: MULTILINE_MATCH
    value: first
  - name: MULTILINE_PATTERN
    value: ^(\x1B?\[([0-9]{1,2}(;[0-9]{1,2})?)?[m|K])(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2})
  - name: DEBUG
    value: "true"

Resulted again in this:

I happen to notice that there is a space in your pattern between the date and time. So this line break should actually leave the value intact (it will be converted back to a space). You can verify this in the log output of the logspout add-on.
So I guess itā€™s the pattern itself that doesnā€™t match. It may have to do with the color code, Iā€™m not sure if it can be matched like this in Logspout. But I would need to debug to be sure.

You are right - there was an error in the pattern. I was testing it with regex101.com but I seem to have missed something. Now the regex tester is showing all the results as desired.

However, the multiline logs are still not arriving properly in Graylog. My suspicion is now that Logspout and I are not using the same regex flavor which would render my tests incorrect. Would you happen to know which regex variation Logspout is using? I was not able to find it out (I understand it is written in Go, I think? but not an expert in that).
On regex101 I was using the Python flavor as Iā€™ve read that many things in HA are written in Python.

I had some time to debug today. I think itā€™s going wrong already in the translation from the configuration yaml to environment variables, which is done by shell scripting and gives problems with escapes. This is also vissible in the log output from the addon, where the logged pattern string differs from the actual pattern string. I didnā€™t manage to find a way around this so Iā€™m afraid configuring patterns with escape characters is not possible this way due to all the conversioning happening before the value even arrives at Logspout.

A custom multiline adapter reading the config from yaml directly (and configurable per container) would be nice, I will reconsider adding this feature.

Thanks a lot for the investigation! Much appreciated!

A custom multiline adapter reading the config from yaml directly (and configurable per container) would be nice, I will reconsider adding this feature.

Iā€™ll patiently await this feature in that case :blush:

Anyone knows how to force INCLUDE an addon? I am trying exclude all addons by using ā€œio.hass.typeā€ in the EXCLUDE_LABELS, but there doesnā€™t seem to be a way to supersede this to include a specific addon.
e.g. Exclude all addons except ESPHome.

If you didnā€™t manage to get this working with a combination of EXCLUDE_LABELS and a filter parameter (?filter.name=*_db) then probably this is not supported. Not sure how important this is for you, on most log servers you can also perform filtering on the server side, but this will use unnecessary bandwidth of course.

Note that you can specify the same route url multiple times with different filter options. But this may still not be enough for you to achieve what you want.

Hi!

Is anybody currently using this addon actively with Grafana Cloud? I am trying to set it up but something is going wrong silently.

I did enable debug logging through the envvar and it does show errors 401 or 404 if I misconfigure my Loki URL on purpose. So it looks like the actual URL does work and is accepted by Loki.

Nevertheless Grafana still shows me 0 Bytes of logsā€¦

Anybody using Grafana Cloud with success rigth now? I might now see the obviousā€¦

The logs of the addon show this periodically, but I am not sure this is indicating an error condition.

2024/04/27 09:12:52 pump.Run() event: d06b88c180ac exec_create: /bin/sh -c curl --fail http://127.0.0.1:1337/healthz || exit 1
2024/04/27 09:12:53 pump.Run() event: d06b88c180ac exec_die
2024/04/27 09:12:54 pump.Run() event: 5d3dd10da205 exec_create: /bin/sh -c curl --fail http://127.0.0.1:1337/api/health || exit 1
2024/04/27 09:12:54 pump.Run() event: 5d3dd10da205 exec_die
2024/04/27 09:13:18 pump.pumpLogs(): cf9a97f46524 stopped with error: inactivity time exceeded timeout
2024/04/27 09:13:18 pump.pumpLogs(): cf9a97f46524 started, tail: all
2024/04/27 09:13:18 pump.pumpLogs(): 8cc79fcdf321 stopped with error: inactivity time exceeded timeout

Communication for this issue was done in this github issue. The conclusion was that the Loki adapter does work correctly with Grafana Cloud. The timestamps were incorrect though, this has been fixed in version 1.6.3.

I was able to get the Logspout Add On running from Bertā€™s repo and I have Graylog running via Docker on my Mac at 192.168.0.6
Iā€™m seeing this in the logs in HA AddOn for LogSpout and I donā€™t see any logs in Graylog.
Help please??

2024/06/30 18:53:47 pump.Run(): using inactivity timeout:  1m0s
#   ADAPTER	ADDRESS			CONTAINERS	SOURCES	OPTIONS
#   gelf	192.168.0.6:12201				map[]
2024/06/30 18:53:47 pump.pumpLogs(): ef22779a99e4 started, tail: all
2024/06/30 18:53:47 pump.pumpLogs(): ef3325ffc574 started, tail: all
2024/06/30 18:53:47 pump.pumpLogs(): 6e6b6e181bd0 started, tail: all
2024/06/30 18:53:47 pump.pumpLogs(): d6190b6085ec started, tail: all
2024/06/30 18:53:47 pump.pumpLogs(): 13cea94e01fa started, tail: all
2024/06/30 18:53:47 pump.pumpLogs(): 2bc8681581ef started, tail: all
2024/06/30 18:53:47 pump.pumpLogs(): adb67ce5acaa started, tail: all
2024/06/30 18:53:47 pump.pumpLogs(): 0679f7bda39a started, tail: all
2024/06/30 18:53:47 pump.pumpLogs(): a928caa33a17 started, tail: all
2024/06/30 18:53:47 pump.pumpLogs(): ea808e02ff56 started, tail: all
2024/06/30 18:53:47 pump.pumpLogs(): 0706cdb307b3 started, tail: all
2024/06/30 18:53:47 pump.pumpLogs(): b9179a705cba started, tail: all
2024/06/30 18:53:47 pump.pumpLogs(): 9d304064bd7a started, tail: all
2024/06/30 18:53:47 pump.pumpLogs(): f033c564d217 started, tail: all
2024/06/30 18:53:47 pump.pumpLogs(): 0bb3342a373f started, tail: all
2024/06/30 18:53:47 pump.pumpLogs(): 9056a39f9a95 started, tail: all
2024/06/30 18:53:47 pump.pumpLogs(): 6f9d38ebb544 started, tail: all
2024/06/30 18:53:47 pump.pumpLogs(): f3ec650d362a started, tail: all
2024/06/30 18:53:47 pump.pumpLogs(): d708ab7f38c3 started, tail: all
2024/06/30 18:53:47 pump.pumpLogs(): a6a4c67716f5 started, tail: all
2024/06/30 18:53:47 pump.pumpLogs(): 18fb5e872cec started, tail: all
2024/06/30 18:53:52 pump.Run() event: 9d304064bd7a exec_create: /bin/sh -c curl --fail http://127.0.0.1:1337/healthz || exit 1
2024/06/30 18:53:52 pump.Run() event: f033c564d217 exec_create: /bin/sh -c if ! curl --fail http://127.0.0.1:9541  && ! curl --fail --insecure https://127.0.0.1:9541; then exit 1; fi
2024/06/30 18:53:52 pump.Run() event: 9d304064bd7a exec_die
2024/06/30 18:53:52 pump.Run() event: f033c564d217 exec_die
2024/06/30 18:54:04 pump.Run() event: b9179a705cba exec_create: /bin/sh -c curl --fail http://127.0.0.1:1337/api/health || exit 1
2024/06/30 18:54:04 pump.Run() event: b9179a705cba exec_die
2024/06/30 18:54:15 pump.Run() event: 13cea94e01fa exec_create: /bin/sh -c echo '{ "type": "describe" }'         | nc -w 1 localhost 10200         | grep -q "piper"         || exit 1
2024/06/30 18:54:17 pump.Run() event: 13cea94e01fa exec_die
2024/06/30 18:54:22 pump.Run() event: 9d304064bd7a exec_create: /bin/sh -c curl --fail http://127.0.0.1:1337/healthz || exit 1
2024/06/30 18:54:22 pump.Run() event: f033c564d217 exec_create: /bin/sh -c if ! curl --fail http://127.0.0.1:9541  && ! curl --fail --insecure https://127.0.0.1:9541; then exit 1; fi
2024/06/30 18:54:22 pump.Run() event: 9d304064bd7a exec_die
2024/06/30 18:54:22 pump.Run() event: f033c564d217 exec_die
2024/06/30 18:54:34 pump.Run() event: b9179a705cba exec_create: /bin/sh -c curl --fail http://127.0.0.1:1337/api/health || exit 1
2024/06/30 18:54:34 pump.Run() event: b9179a705cba exec_start: /bin/sh -c curl --fail http://127.0.0.1:1337/api/health || exit 1
2024/06/30 18:54:35 pump.Run() event: b9179a705cba exec_die
2024/06/30 18:54:47 pump.Run() event: 13cea94e01fa exec_create: /bin/sh -c echo '{ "type": "describe" }'         | nc -w 1 localhost 10200         | grep -q "piper"         || exit 1
2024/06/30 18:54:49 pump.Run() event: 13cea94e01fa exec_die
2024/06/30 18:54:52 pump.Run() event: 9d304064bd7a exec_create: /bin/sh -c curl --fail http://127.0.0.1:1337/healthz || exit 1
2024/06/30 18:54:52 pump.Run() event: f033c564d217 exec_create: /bin/sh -c if ! curl --fail http://127.0.0.1:9541  && ! curl --fail --insecure https://127.0.0.1:9541; then exit 1; fi
2024/06/30 18:54:52 pump.Run() event: f033c564d217 exec_start: /bin/sh -c if ! curl --fail http://127.0.0.1:9541  && ! curl --fail --insecure https://127.0.0.1:9541; then exit 1; fi
2024/06/30 18:54:52 pump.Run() event: 9d304064bd7a exec_die
2024/06/30 18:54:52 pump.Run() event: f033c564d217 exec_die
2024/06/30 18:55:05 pump.Run() event: b9179a705cba exec_create: /bin/sh -c curl --fail http://127.0.0.1:1337/api/health || exit 1
2024/06/30 18:55:05 pump.Run() event: b9179a705cba exec_die
2024/06/30 18:55:19 pump.Run() event: 13cea94e01fa exec_create: /bin/sh -c echo '{ "type": "describe" }'         | nc -w 1 localhost 10200         | grep -q "piper"         || exit 1
2024/06/30 18:55:21 pump.Run() event: 13cea94e01fa exec_die
2024/06/30 18:55:22 pump.Run() event: 9d304064bd7a exec_create: /bin/sh -c curl --fail http://127.0.0.1:1337/healthz || exit 1
2024/06/30 18:55:22 pump.Run() event: f033c564d217 exec_create: /bin/sh -c if ! curl --fail http://127.0.0.1:9541  && ! curl --fail --insecure https://127.0.0.1:9541; then exit 1; fi
2024/06/30 18:55:23 pump.Run() event: 9d304064bd7a exec_die
2024/06/30 18:55:23 pump.Run() event: f033c564d217 exec_die
2024/06/30 18:55:35 pump.Run() event: b9179a705cba exec_create: /bin/sh -c curl --fail http://127.0.0.1:1337/api/health || exit 1
2024/06/30 18:55:35 pump.Run() event: b9179a705cba exec_die
2024/06/30 18:55:47 pump.pumpLogs(): 6e6b6e181bd0 stopped with error: inactivity time exceeded timeout
2024/06/30 18:55:47 pump.pumpLogs(): ef3325ffc574 stopped with error: inactivity time exceeded timeout
2024/06/30 18:55:47 pump.pumpLogs(): ef3325ffc574 started, tail: all
2024/06/30 18:55:47 pump.pumpLogs(): d6190b6085ec stopped with error: inactivity time exceeded timeout
2024/06/30 18:55:47 pump.pumpLogs(): 6e6b6e181bd0 started, tail: all
2024/06/30 18:55:47 pump.pumpLogs(): d6190b6085ec started, tail: all
2024/06/30 18:55:47 pump.pumpLogs(): 13cea94e01fa stopped with error: inactivity time exceeded timeout
2024/06/30 18:55:47 pump.pumpLogs(): 13cea94e01fa started, tail: all
2024/06/30 18:55:47 pump.pumpLogs(): 2bc8681581ef stopped with error: inactivity time exceeded timeout
2024/06/30 18:55:47 pump.pumpLogs(): 2bc8681581ef started, tail: all
2024/06/30 18:55:47 pump.pumpLogs(): 0679f7bda39a stopped with error: inactivity time exceeded timeout
2024/06/30 18:55:47 pump.pumpLogs(): 0679f7bda39a started, tail: all
2024/06/30 18:55:47 pump.pumpLogs(): a928caa33a17 stopped with error: inactivity time exceeded timeout
2024/06/30 18:55:47 pump.pumpLogs(): a928caa33a17 started, tail: all
2024/06/30 18:55:47 pump.pumpLogs(): ea808e02ff56 stopped with error: inactivity time exceeded timeout
2024/06/30 18:55:47 pump.pumpLogs(): ea808e02ff56 started, tail: all
2024/06/30 18:55:47 pump.pumpLogs(): 0706cdb307b3 stopped with error: inactivity time exceeded timeout
2024/06/30 18:55:47 pump.pumpLogs(): 0706cdb307b3 started, tail: all
2024/06/30 18:55:47 pump.pumpLogs(): b9179a705cba stopped with error: inactivity time exceeded timeout
2024/06/30 18:55:47 pump.pumpLogs(): b9179a705cba started, tail: all
2024/06/30 18:55:47 pump.pumpLogs(): 9d304064bd7a stopped with error: inactivity time exceeded timeout
2024/06/30 18:55:47 pump.pumpLogs(): 9d304064bd7a started, tail: all
2024/06/30 18:55:47 pump.pumpLogs(): f033c564d217 stopped with error: inactivity time exceeded timeout
2024/06/30 18:55:47 pump.pumpLogs(): f033c564d217 started, tail: all
2024/06/30 18:55:47 pump.pumpLogs(): 9056a39f9a95 stopped with error: inactivity time exceeded timeout
2024/06/30 18:55:47 pump.pumpLogs(): 6f9d38ebb544 stopped with error: inactivity time exceeded timeout
2024/06/30 18:55:47 pump.pumpLogs(): 9056a39f9a95 started, tail: all
2024/06/30 18:55:47 pump.pumpLogs(): 6f9d38ebb544 started, tail: all
2024/06/30 18:55:47 pump.pumpLogs(): f3ec650d362a stopped with error: inactivity time exceeded timeout
2024/06/30 18:55:47 pump.pumpLogs(): f3ec650d362a started, tail: all
2024/06/30 18:55:47 pump.pumpLogs(): d708ab7f38c3 stopped with error: inactivity time exceeded timeout
2024/06/30 18:55:47 pump.pumpLogs(): 18fb5e872cec stopped with error: inactivity time exceeded timeout
2024/06/30 18:55:47 pump.pumpLogs(): a6a4c67716f5 stopped with error: inactivity time exceeded timeout
2024/06/30 18:55:47 pump.pumpLogs(): d708ab7f38c3 started, tail: all
2024/06/30 18:55:47 pump.pumpLogs(): a6a4c67716f5 started, tail: all
2024/06/30 18:55:47 pump.pumpLogs(): 18fb5e872cec started, tail: all
2024/06/30 18:55:51 pump.Run() event: 13cea94e01fa exec_create: /bin/sh -c echo '{ "type": "describe" }'         | nc -w 1 localhost 10200         | grep -q "piper"         || exit 1
2024/06/30 18:55:53 pump.Run() event: 9d304064bd7a exec_create: /bin/sh -c curl --fail http://127.0.0.1:1337/healthz || exit 1
2024/06/30 18:55:53 pump.Run() event: f033c564d217 exec_create: /bin/sh -c if ! curl --fail http://127.0.0.1:9541  && ! curl --fail --insecure https://127.0.0.1:9541; then exit 1; fi
2024/06/30 18:55:53 pump.Run() event: f033c564d217 exec_start: /bin/sh -c if ! curl --fail http://127.0.0.1:9541  && ! curl --fail --insecure https://127.0.0.1:9541; then exit 1; fi
2024/06/30 18:55:53 pump.Run() event: 9d304064bd7a exec_die
2024/06/30 18:55:53 pump.Run() event: f033c564d217 exec_die
2024/06/30 18:55:53 pump.Run() event: 13cea94e01fa exec_die
2024/06/30 18:56:05 pump.Run() event: b9179a705cba exec_create: /bin/sh -c curl --fail http://127.0.0.1:1337/api/health || exit 1
2024/06/30 18:56:05 pump.Run() event: b9179a705cba exec_die```

Everything in the log seems fine, so I guess itā€™s a configuration or usage error on the Graylog side. When you configure an incorrect port or when the Graylog server is down for example you should see an error, also with debug logging disabled.
I once wrote about my setup. Its a bit outdated but maybe it will help you identifying the problem.

Thanks Bert, Iā€™m familiar with that post of yours and have reviewed it.
This is my Logspout config in HA

I created a System Input in Graylog. Not sure if that was needed. System over looks like this.

Anything I should check or other steps I might have missed?
Easy way to verify that 192.168.0.6:12201 can receive messages from the HA containers?

Thanks!
-Logan