InfluxDB stopping - can't work out why

I’ve not had chance to look properly until now, but I don’t think InfluxDB has been stable for me for a while. It will start OK, but stops again for no obvious reason soon afterwards. Initially it would run for a few hours, but now it is pretty much stopping again as soon as it starts. I suspect this may have been the case since upgrading the add-on to the current version (3.6.2). This is what I see in the log view:

	/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/compact.gen.go:1191 +0xec
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*tsmBatchKeyIterator).combineFloat(0x133eee60, 0x1ce3d501, 0x1e, 0x1ce3d500, 0x2)
	/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/compact.gen.go:1108 +0x874
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*tsmBatchKeyIterator).mergeFloat(0x133eee60)
	/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/compact.gen.go:1034 +0x260
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*tsmBatchKeyIterator).merge(0x133eee60)
	/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/compact.go:1827 +0x34
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*tsmBatchKeyIterator).Next(0x133eee60, 0x1ddbf8a)
	/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/compact.go:1687 +0xf44
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Compactor).write(0x4582fc0, 0xa1639f0, 0x42, 0x1abb930, 0x133eee60, 0xe95d01, 0x0, 0x0)
	/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/compact.go:1140 +0x148
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Compactor).writeNewFiles(0x4582fc0, 0x13, 0x2, 0xae18720, 0x3, 0x4, 0x1abb930, 0x133eee60, 0x1, 0x1abb930, ...)
	/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/compact.go:1044 +0x144
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Compactor).compact(0x4582fc0, 0x18e5000, 0xae18720, 0x3, 0x4, 0x0, 0x0, 0x0, 0x0, 0x0)
	/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/compact.go:952 +0x4f4
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Compactor).CompactFull(0x4582fc0, 0xae18720, 0x3, 0x4, 0x0, 0x0, 0x0, 0x0, 0x0)
	/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/compact.go:970 +0x108
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*compactionStrategy).compactGroup(0x139ef040)
	/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/engine.go:2158 +0xdf4
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*compactionStrategy).Apply(0x139ef040)
	/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/engine.go:2135 +0x2c
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Engine).compactFull.func1(0xb70c140, 0x4c9ce40, 0x139ef040)
	/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/engine.go:2104 +0xac
created by github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Engine).compactFull
	/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/engine.go:2100 +0xd0

goroutine 611 [select]:
net/http.setRequestCancel.func3(0x0, 0x471e200, 0xd1b75c0, 0xb70c30c, 0x109c2c40)
	/usr/local/go/src/net/http/client.go:321 +0x78
created by net/http.setRequestCancel
	/usr/local/go/src/net/http/client.go:320 +0x21c

goroutine 858 [runnable]:
github.com/influxdata/influxdb/vendor/github.com/jwilder/encoding/simple8b.canPack(0x4715e00, 0x30, 0x30, 0x14, 0x3, 0x6e000)
	/go/src/github.com/influxdata/influxdb/vendor/github.com/jwilder/encoding/simple8b/encoding.go:444 +0x184
github.com/influxdata/influxdb/vendor/github.com/jwilder/encoding/simple8b.Encode(0x4715e00, 0x30, 0x30, 0x321c0ea1, 0xf0000002, 0x1, 0x0, 0x0)
	/go/src/github.com/influxdata/influxdb/vendor/github.com/jwilder/encoding/simple8b/encoding.go:315 +0x1bc
github.com/influxdata/influxdb/vendor/github.com/jwilder/encoding/simple8b.(*Encoder).flush(0x44835f0, 0x0, 0x0)
	/go/src/github.com/influxdata/influxdb/vendor/github.com/jwilder/encoding/simple8b/encoding.go:101 +0x68
github.com/influxdata/influxdb/vendor/github.com/jwilder/encoding/simple8b.(*Encoder).Bytes(0x44835f0, 0x43f28c60, 0x2, 0x0, 0x0, 0x0)
	/go/src/github.com/influxdata/influxdb/vendor/github.com/jwilder/encoding/simple8b/encoding.go:128 +0x28
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*encoder).encodePacked(0x47f0d80, 0x1, 0x0, 0x474e000, 0x3e8, 0x3e8, 0x474e000, 0x3e8, 0x3e8, 0x40, ...)
	/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/timestamp.go:158 +0x138
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*encoder).Bytes(0x47f0d80, 0x64d4e00, 0x40, 0x4951220, 0x11f4c, 0x9fb70)
	/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/timestamp.go:138 +0xec
github.com/influxdata/influxdb/tsdb/engine/tsm1.encodeStringBlockUsing(0x0, 0x0, 0x0, 0x17d7c000, 0x3e8, 0xc00, 0x1ab3610, 0x47f0d80, 0xbe2a6000, 0xfde8, ...)
	/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/encoding.go:870 +0xd8
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*cacheKeyIterator).encode.func1(0x1428ba20, 0x1, 0x462, 0x4edd880)
	/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/compact.go:1987 +0x654
created by github.com/influxdata/influxdb/tsdb/engine/tsm1.(*cacheKeyIterator).encode
	/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/compact.go:1941 +0x80

goroutine 646 [runnable]:
github.com/influxdata/influxdb/tsdb/engine/tsm1.FloatArrayDecodeAll(0x54867739, 0x896, 0x502d1eb, 0x0, 0x0, 0x0, 0xbe3f6000, 0x3e8, 0x3e8, 0x0, ...)
	/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/batch_float.go:507 +0x438
github.com/influxdata/influxdb/tsdb/engine/tsm1.DecodeFloatArrayBlock(0x54864ef4, 0x30db, 0x502f12f, 0x1a01174c, 0x4, 0xbd81e1ac)
	/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/array_encoding.go:47 +0x1f0
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*tsmBatchKeyIterator).combineFloat(0x14440000, 0x1ce3d601, 0x25, 0x1ce3d620, 0x2)
	/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/compact.gen.go:1074 +0x4f0
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*tsmBatchKeyIterator).mergeFloat(0x14440000)
	/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/compact.gen.go:1034 +0x260
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*tsmBatchKeyIterator).merge(0x14440000)
	/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/compact.go:1827 +0x34
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*tsmBatchKeyIterator).Next(0x14440000, 0x5eb58f9)
	/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/compact.go:1695 +0xe8c
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Compactor).write(0x4494ae0, 0x8db6050, 0x42, 0x1abb930, 0x14440000, 0xe95d01, 0x0, 0x0)
	/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/compact.go:1140 +0x148
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Compactor).writeNewFiles(0x4494ae0, 0x15, 0x2, 0x47a9e00, 0x3, 0x4, 0x1abb930, 0x14440000, 0x1, 0x1abb930, ...)
	/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/compact.go:1044 +0x144
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Compactor).compact(0x4494ae0, 0x18e5000, 0x47a9e00, 0x3, 0x4, 0x0, 0x0, 0x0, 0x0, 0x0)
	/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/compact.go:952 +0x4f4
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Compactor).CompactFull(0x4494ae0, 0x47a9e00, 0x3, 0x4, 0x0, 0x0, 0x0, 0x0, 0x0)
	/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/compact.go:970 +0x108
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*compactionStrategy).compactGroup(0x51d9e00)
	/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/engine.go:2158 +0xdf4
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*compactionStrategy).Apply(0x51d9e00)
	/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/engine.go:2135 +0x2c
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Engine).compactFull.func1(0xb70c120, 0x4e0c480, 0x51d9e00)
	/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/engine.go:2104 +0xac
created by github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Engine).compactFull
	/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/engine.go:2100 +0xd0

goroutine 854 [chan receive]:
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*cacheKeyIterator).Next(0x4edd880, 0xa21a12)
	/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/compact.go:2026 +0xe8
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Compactor).write(0x45e69c0, 0x9d30690, 0x42, 0x1abb8e0, 0x4edd880, 0xe95d01, 0x0, 0x0)
	/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/compact.go:1140 +0x148
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Compactor).writeNewFiles(0x45e69c0, 0x8, 0x1, 0x0, 0x0, 0x0, 0x1abb8e0, 0x4edd880, 0x1, 0x0, ...)
	/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/compact.go:1044 +0x144
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Compactor).WriteSnapshot.func1(0x10d03400, 0x45e69c0, 0x1030caec, 0x4edd480, 0x55be380)
	/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/compact.go:859 +0x94
created by github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Compactor).WriteSnapshot
	/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/compact.go:857 +0x16c
[cont-finish.d] executing container finish scripts...
[cont-finish.d] 99-message.sh: executing... 
[cont-finish.d] 99-message.sh: exited 0.
[cont-finish.d] done.
[s6-finish] waiting for services.
[s6-finish] sending all processes the TERM signal.
[s6-finish] sending all processes the KILL signal and exiting.

Here are the version details:

 Add-on version: 3.6.2
 You are running the latest version of this add-on.
 System: Raspbian GNU/Linux 10 (buster)  (armv7 / raspberrypi3)
 Home Assistant Core: 0.109.6
 Home Assistant Supervisor: 224

(I’m actually on a Pi4 with 4GB RAM)

It’s not obvious to me what the acutal error in there is!

Can anyone help me debug this?

I also have issues with InfluxDB – running completely separate from my main Home Assistance instance on an old i3 machine with Ubuntu in Docker.

I noticed a while back that the docker container kept on restarting. Couldn’t figure out quite what was the issue and deleted the DB and started again.

Today I noticed after a couple of days that it is doing the same again :frowning:

My logs do not say much – it looks like it was busy compacting a shard and then it restarted.

2020-07-04T19:24:08.316690691Z ts=2020-07-04T19:24:08.316312Z lvl=info msg="Sending usage statistics to usage.influxdata.com" log_id=0NnpPyFW000
2020-07-04T19:24:09.252656606Z ts=2020-07-04T19:24:09.252136Z lvl=info msg="TSM compaction (start)" log_id=0NnpPyFW000 engine=tsm1 tsm1_level=2 tsm1_strategy=level trace_id=0NnpQOe0000 op_name=tsm1_compact_group op_event=start
2020-07-04T19:24:09.252746226Z ts=2020-07-04T19:24:09.252212Z lvl=info msg="Beginning compaction" log_id=0NnpPyFW000 engine=tsm1 tsm1_level=2 tsm1_strategy=level trace_id=0NnpQOe0000 op_name=tsm1_compact_group tsm1_files_n=4
2020-07-04T19:24:09.252790693Z ts=2020-07-04T19:24:09.252241Z lvl=info msg="Compacting file" log_id=0NnpPyFW000 engine=tsm1 tsm1_level=2 tsm1_strategy=level trace_id=0NnpQOe0000 op_name=tsm1_compact_group tsm1_index=0 tsm1_file=/var/lib/influxdb/data/home_assistant/autogen/4/000000008-000000002.tsm
2020-07-04T19:24:09.252830683Z ts=2020-07-04T19:24:09.252266Z lvl=info msg="Compacting file" log_id=0NnpPyFW000 engine=tsm1 tsm1_level=2 tsm1_strategy=level trace_id=0NnpQOe0000 op_name=tsm1_compact_group tsm1_index=1 tsm1_file=/var/lib/influxdb/data/home_assistant/autogen/4/000000016-000000002.tsm
2020-07-04T19:24:09.252863728Z ts=2020-07-04T19:24:09.252305Z lvl=info msg="Compacting file" log_id=0NnpPyFW000 engine=tsm1 tsm1_level=2 tsm1_strategy=level trace_id=0NnpQOe0000 op_name=tsm1_compact_group tsm1_index=2 tsm1_file=/var/lib/influxdb/data/home_assistant/autogen/4/000000024-000000002.tsm
2020-07-04T19:24:09.252899500Z ts=2020-07-04T19:24:09.252337Z lvl=info msg="Compacting file" log_id=0NnpPyFW000 engine=tsm1 tsm1_level=2 tsm1_strategy=level trace_id=0NnpQOe0000 op_name=tsm1_compact_group tsm1_index=3 tsm1_file=/var/lib/influxdb/data/home_assistant/autogen/4/000000032-000000002.tsm
2020-07-04T19:25:21.657775048Z ts=2020-07-04T19:25:21.657612Z lvl=info msg="InfluxDB starting" log_id=0NnpUoUG000 version=1.8.0 branch=1.8 commit=781490de48220d7695a05c29e5a36f550a4568f5
2020-07-04T19:25:21.672153320Z ts=2020-07-04T19:25:21.657644Z lvl=info msg="Go runtime" log_id=0NnpUoUG000 version=go1.13.8 maxprocs=4
2020-07-04T19:25:21.833566138Z ts=2020-07-04T19:25:21.833457Z lvl=info msg="Using data dir" log_id=0NnpUoUG000 service=store path=/var/lib/influxdb/data
2020-07-04T19:25:21.839714885Z ts=2020-07-04T19:25:21.833502Z lvl=info msg="Compaction settings" log_id=0NnpUoUG000 service=store max_concurrent_compactions=2 throughput_bytes_per_second=50331648 throughput_bytes_per_second_burst=50331648
2020-07-04T19:25:21.839752819Z ts=2020-07-04T19:25:21.839624Z lvl=info msg="Open store (start)" log_id=0NnpUoUG000 service=store trace_id=0NnpUpBl000 op_name=tsdb_open op_event=start
2020-07-04T19:25:22.266085790Z ts=2020-07-04T19:25:22.265842Z lvl=info msg="Opened file" log_id=0NnpUoUG000 engine=tsm1 service=filestore path=/var/lib/influxdb/data/_internal/monitor/2/000000001-000000001.tsm id=0 duration=35.983ms

Interesting, I’m having issues with the InfluxDB add-on going to 180% CPU - I wonder if it is HA rather than the add-on or InfluxDB.

How do you change the InfluxDB logging level?

Could you tell me how you set it us as a separate container? That is my next move. Can you migrate the database?

I eventually figured out that mine was quitting because the Docker container was running out of memory.

I tried to activate the TSI (Time Series Index) which writes the indexes out to disk instead of keeping it in memory – https://www.influxdata.com/blog/how-to-overcome-memory-usage-challenges-with-the-time-series-index/

Unfortunately this didn’t have an effect. Next I decided to limit the sensor values I was writing to InfluxDB to only essentials, in an effort to keep memory usage low. Unfortunately I couldn’t save my existing dataset, so I eventually deleted the data and started afresh.

Will see how long it lasts this time.

I followed this guide to set up Grafana and InfluxDB: Complete guide on setting up Grafana/InfluxDB with Home assistant using official Docker images

1 Like

How were you able to see that? I thought the point was the containers expanded as required.

Apparently not. I saw it with the docker ps command – showed Status = “Exited (137) x minutes ago”. Also with docker inspect, and I Googled docker exit code 137.

https://bobcares.com/blog/error-137-docker/#:~:text=Error%20137%20in%20Docker%20denotes,container%20for%20running%20the%20process.&text=When%20the%20MySQL%20process%20running,it%20exited%20with%20code%20137.

My InfluxDb CPU use went crazy again yesterday. I had a couple of databases with high retention periods, reduced those and it all settled back down again.

Followed the link above and had a look at the disk size of the container. Shame I didn’t look before changing the retention period!

I’m hitting into this as well Really frustrating. Pi 4B 4GB. I’ve noticed the spikes to ~180% CPU whiles it’s running as well. My Pi has plenty of available RAM and Diskspace (of which Influx is only using 1.1GB), so it seems really weird for it to by an OOM kill…

Edit: Graph of CPU and Memory Usage. Green is CPU%, Red is Memory%. The spike in the middle is where it died, the spike on the right is when I noticed and started it again.

So, I’ve been poking around the Influx files a bit and discovered that the _internal.monitor series was by far the largest one - about 1GB while all of my other ones were no more than 25MB each. Additionally, I saw the on-disk size (du -sh) of that series constantly fluctuating between ~800MB to ~1GB - I’m sure that was doing absolute wonders for the longevity of my uSD card… Anyhow, some research indicated that the Influx Monitor feature can be turned off (they actually recommend having it off in production). This can be done by adding an ENV var to the InfluxDB Container. If you’re using Supervisor, that can be accomplished in the config like so:

envvars:
  - name: INFLUXDB_MONITOR_STORE_ENABLED
    value: 'false'

After that, I restarted the Influx Container and used it’s WebUI to run

DROP DATABASE _internal;

So far, I’ve saved a GB on my uSD, Influx doesn’t appear to be loading on the write-cycles anymore, my CPU is no longer constantly spiking, and the CPU sits at ~10% as opposed to ~35%! The next couple of days will indicate whether this has fixed the crashing issue.

Update 9/29 - That has indeed worked! I haven’t had Influx crash since.

3 Likes

I just wanted to say thank you for this solution. I have been having trouble with InfluxDB each and every time I install it and after almost three years of HomeAssistant and two years of tinkering with InfluxDB off and on, this was the first thing that worked.