How to keep container state in docker swarm

This question is a bit conceptual…

I start a service,

Docker runs container(s) on node(s) for this service,

I progress on this container(s),

As some point container(s) gets an exception an enters in an unrecovarable state…

Hi,
At this point, I am not able to manage that one container or containers manually (to recover it, stop-start for instance) since Swarm is the manager of containers.

What is the best practice of keeping the state of containers? There is “docker container commit” for instance, however, am I supposed to find on which node containers are started, find their container ids and commit them manually? Should I define cron jobs for this purpose? Otherwise, shouldn’t I rely on Docker for such applications?

Thank you.

You’ll have more luck on the docker forums.

I would like a bit more information to provide a more complete explanation. However…

Using anything (home assistant) or any other application within a docker swarm or even kubernetes setup requires one of two things.

  1. Stateless application
    or
  2. Resilient shared filesystem (generally a cluster/parallel) file system for stateful data.

Home Assistant is not a “stateless” application by nature and thus to run effectively within any cluster environment, be it a Docker Swarm, Kubernetes or anything else a shared file system will be required. It is then the “job” of the file system to ensure consistency of itself and the application must also ensure consistency of it’s own commit of data to the file system. From what I know currently HA is good about checking/validating it’s own consistency, but it’s only as good as what is stored on the disk.

The idea behind ha running in a cluster is a good idea in a High Availability scenario of a active/passive scenario, but Home Assistant to my knowledge was not designed for an active/active multi-writer setup and that could pose problems with data consistency if multiple instances are trying to write to the data files at the same time. If proper locking is in place, this may not be a problem, but it’s taking something and trying to cram it into a scenario that I don’t believe it was intended for. (I could be wrong).