To Proxmox or not to Proxmox

tteck · January 14, 2022, 4:59am

whenever you see a widget-toolkit update, it breaks the no-nag
have to re-run post install script and re-install the dark theme

tteck · January 14, 2022, 5:00am

yes, under options > startup/shutdown order

alexdelprete · January 14, 2022, 5:07am

great. I hope they’ll also implement true dependencies…

another thing missing is the ability to shrink disk size…I had to redo some containers/vms multiple times because I was wasting a lot of space…

tteck · January 14, 2022, 5:09am

start small, increase as needed

Most of my scripts start small

alexdelprete · January 14, 2022, 5:10am

yes…now I know.

but with ZFS, LVMs etc. I thought shrinking was possible…

tteck · January 14, 2022, 5:13am

when you leave the HAOS VM, you’ll see a big resource savings
2gb in ram alone

With my HAOS VM script, I set everything to the Minimum recommended assignments except RAM… 4GB RAM - 32GB Storage - 2vCPU

alexdelprete · January 14, 2022, 5:30am

I have 2 VMs left: HA and Compreface. HA for some addons that would require a redesign of some things…so for now I’ll leave it as-is. Compreface unfortunately has severe problems when installed in a container…on VM it’s a snap…

I’ll move HA to LXC soon…

tteck · January 14, 2022, 5:35am

You’ve made unbelievable progress in a very short time! Kudos

alexdelprete · January 14, 2022, 6:04am

Well, I have 25+ years of experience in enterprise system/network administration…even though I don’t do that since 10y, when I moved to sales. So I’m an old dinosaur…but the fundamentals are pretty good, I just need to get some rust off. Without the ready-made scripts it would’ve been way harder…

viciovb · January 14, 2022, 9:40am

How do you use the HA development environment along with your prod one. Do you shut prod down and run dev to test thing and then go back?

I want to do the same but I’m not sure I can run two instances at the same time because they’re gonna have two different IPs and device and entities will talk to just one of the two. Am I wrong?

fmarkf · January 14, 2022, 6:17pm

Nice setup, congrats.
I do see the benefits of splitting into small containers, however I think there is also one big practical downside. In case of a hardware failure you’ll probably face much longer system downtime than if you have HA OS Supervised restored from a single backup to any hardware that can run HA OS - almost no downtime. Please let me know if I’m missing something here.

alexdelprete · January 14, 2022, 6:24pm

The downtime though would be isolated to a single service, when HA goes crazy (it happens sometimes) I found nothing was working anymore. Plus, it’s easier to debug issues, because you don’t have other “layers” involved. Even the zigbee network seems to respond faster.

And let alone the startup times…HA starts very very fast now…and its backup is 10 times smaller. HA gui now is snappier, very quick. I can’t be more happy about this setup.

The only problem I just discovered is that in LXC containers, with ZFS, if you use docker inside of them, the default storage driver is VFS, instead of overlay2, and in terms of space allocation it’s very very inefficient.

I should have analyzed better the docker integration with LXC when I chose ZFS, probably btrfs would’ve been a better choice. What do you think @tteck? I hope I don’t have to redo everything to change the setup.

I just completed the first version of the services dashboard:

photo64 · January 14, 2022, 8:16pm

I’ve actually been meaning to ask this for a while. I used your post install script to remove the no-nag notice however I still get it. I see the below in my setup, should the last two items be enabled?

If the pve-no-subsription is enabled why would I still be getting the warning?

tteck · January 14, 2022, 9:22pm

When you first run the script, the nag has been cached, give it time to clear.
The warning that you’re showing above is normal and will always be shown while a non-enterprise repo is enabled
Never enable the Enterprise Repo unless you’re paying. As far as enabling the pvetest repo, that’s totally up to you.

photo64 · January 14, 2022, 9:25pm

How much time will it need to clear? I’ve been running your script for about 2 months already.

tteck · January 14, 2022, 9:30pm

photo64 · January 14, 2022, 9:36pm

Ok, I’ll try re-running the post install again. I still have the dark theme running though, I assume you mean it would change it back to the light theme if that widget-toolkit update occurred. Thanks for the info.

Bohemian · January 15, 2022, 12:18am

Yeah, I shut down the prod environment and then spin up Dev. The different IPs have not given me any grief yet.

But, depending on your network setup, you could always set a static IP on the Dev to match the Prod. That should work even if your router has assigned the same static IP to Prod based on MAC address. Just a thought.

alexdelprete · January 15, 2022, 1:11pm

@tteck I finally decided to migrate HA to LXC. I found a solution to the addons I still needed, so I was proceeding to setup the LXC using your script, but after 4 minutes it was running I had this at the end:

[INFO] Using 'pve-data' for storage location.
[INFO] Container ID is 100.
Updating LXC template list...
Downloading LXC template...
Creating LXC container...
[WARNING] Some containers may not work properly due to ZFS not supporting 'fallocate'.
Starting LXC container...
Setting up container OS...
Updating container OS...
Installing prerequisites...
Customizing Docker...
Installing Docker...
Installing Portainer...
Installing Home Assistant...
[ERROR:LXC] 125@72 Unknown failure occured.
[ERROR] 125@176 Unknown failure occured.

I’ll try one more time and let you know…

UPDATE: I tried one more time, failed again after 4 minutes…I looked in journalctl and found this while the script was still completing, right before the error:

Jan 15 14:13:01 pve kernel: overlayfs: upper fs does not support RENAME_WHITEOUT.
Jan 15 14:13:01 pve kernel: overlayfs: upper fs missing required features.
Jan 15 14:13:01 pve kernel: overlayfs: upper fs does not support RENAME_WHITEOUT.
Jan 15 14:13:01 pve kernel: overlayfs: upper fs missing required features.

I installed other docker instances in 2-3 LXC containers, and they’re running fine, but yesterday I found out there are some issues with ZFS for docker + overlay2. Could that be the problem? When I first installed proxmox I chose to create a ZFS datastore, if I only had known there were incompatibilities…

Any suggestion is more than welcome…hope I don’t have to go back again…

UPDATE2: I tried with the podman script…same problem…I guess I’ll have to first check how to make overlay2 work with ZFS, then I’ll have to install it manually or I have to modify your script once I know what config is needed.

alexdelprete · January 15, 2022, 5:02pm

Solved it, with fuse-overlayfs. Credit to this guide.

I modified your script to automatically download the static version of fuse-overlayfs and inject it in the container. Then I added the required parameters to the LXC config: fuse=1,keyctl=1,mknod=1,nesting=1

I tested the script and it worked beautifully, here you can see that docker will automatically use the fuse-overlayfs storage driver, that works fine with ZFS and doesn’t have all the drawbacks of VFS:

Docker info

root@hass:/# docker info
Client:
 Context:    default
 Debug Mode: false
 Plugins:
  app: Docker App (Docker Inc., v0.9.1-beta3)
  buildx: Docker Buildx (Docker Inc., v0.7.1-docker)
  scan: Docker Scan (Docker Inc., v0.12.0)

Server:
 Containers: 2
  Running: 2
  Paused: 0
  Stopped: 0
 Images: 2
 Server Version: 20.10.12
 Storage Driver: fuse-overlayfs
 Logging Driver: journald
 Cgroup Driver: systemd
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 7b11cfaabd73bb80907dd23182b9347b4245eb5d
 runc version: v1.0.2-0-g52b36a2
 init version: de40ad0
 Security Options:
  seccomp
   Profile: default
  cgroupns
 Kernel Version: 5.13.19-3-pve
 Operating System: Debian GNU/Linux 11 (bullseye)
 OSType: linux
 Architecture: x86_64
 CPUs: 2
 Total Memory: 2GiB
 Name: hass
 ID: GZOM:2ZY6:QZPR:T3TI:ZJLQ:Q5WJ:7VIU:XAI5:S5IR:BEN2:ISRY:6X3K
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

Here’s the updated code of ha_setup.sh:

ha_setup.sh

#!/usr/bin/env bash

while true; do
    read -p "This will create a New Home Assistant Container LXC. Proceed(y/n)?" yn
    case $yn in
        [Yy]* ) break;;
        [Nn]* ) exit;;
        * ) echo "Please answer yes or no.";;
    esac
done

# Setup script environment
set -o errexit  #Exit immediately if a pipeline returns a non-zero status
set -o errtrace #Trap ERR from shell functions, command substitutions, and commands from subshell
set -o nounset  #Treat unset variables as an error
set -o pipefail #Pipe will exit with last non-zero status if applicable
shopt -s expand_aliases
alias die='EXIT=$? LINE=$LINENO error_exit'
trap die ERR
trap cleanup EXIT

function error_exit() {
  trap - ERR
  local DEFAULT='Unknown failure occured.'
  local REASON="\e[97m${1:-$DEFAULT}\e[39m"
  local FLAG="\e[91m[ERROR] \e[93m$EXIT@$LINE"
  msg "$FLAG $REASON"
  [ ! -z ${CTID-} ] && cleanup_ctid
  exit $EXIT
}
function warn() {
  local REASON="\e[97m$1\e[39m"
  local FLAG="\e[93m[WARNING]\e[39m"
  msg "$FLAG $REASON"
}
function info() {
  local REASON="$1"
  local FLAG="\e[36m[INFO]\e[39m"
  msg "$FLAG $REASON"
}
function msg() {
  local TEXT="$1"
  echo -e "$TEXT"
}
function cleanup_ctid() {
  if [ ! -z ${MOUNT+x} ]; then
    pct unmount $CTID
  fi
  if $(pct status $CTID &>/dev/null); then
    if [ "$(pct status $CTID | awk '{print $2}')" == "running" ]; then
      pct stop $CTID
    fi
    #pct destroy $CTID
  elif [ "$(pvesm list $STORAGE --vmid $CTID)" != "" ]; then
    pvesm free $ROOTFS
  fi
}
function cleanup() {
  popd >/dev/null
  rm -rf $TEMP_DIR
}
function load_module() {
  if ! $(lsmod | grep -Fq $1); then
    modprobe $1 &>/dev/null || \
      die "Failed to load '$1' module."
  fi
  MODULES_PATH=/etc/modules
  if ! $(grep -Fxq "$1" $MODULES_PATH); then
    echo "$1" >> $MODULES_PATH || \
      die "Failed to add '$1' module to load at boot."
  fi
}
TEMP_DIR=$(mktemp -d)
pushd $TEMP_DIR >/dev/null

# Download setup script
wget -qL https://raw.githubusercontent.com/tteck/Proxmox/main/ha_setup.sh

# Download the static version of fuse-overlayfs
wget -qL -O fuse-overlayfs https://github.com/containers/fuse-overlayfs/releases/download/v1.8/fuse-overlayfs-x86_64

# Detect modules and automatically load at boot
load_module overlay

# Select storage location
while read -r line; do
  TAG=$(echo $line | awk '{print $1}')
  TYPE=$(echo $line | awk '{printf "%-10s", $2}')
  FREE=$(echo $line | numfmt --field 4-6 --from-unit=K --to=iec --format %.2f | awk '{printf( "%9sB", $6)}')
  ITEM="  Type: $TYPE Free: $FREE "
  OFFSET=2
  if [[ $((${#ITEM} + $OFFSET)) -gt ${MSG_MAX_LENGTH:-} ]]; then
    MSG_MAX_LENGTH=$((${#ITEM} + $OFFSET))
  fi
  STORAGE_MENU+=( "$TAG" "$ITEM" "OFF" )
done < <(pvesm status -content rootdir | awk 'NR>1')
if [ $((${#STORAGE_MENU[@]}/3)) -eq 0 ]; then
  warn "'Container' needs to be selected for at least one storage location."
  die "Unable to detect valid storage location."
elif [ $((${#STORAGE_MENU[@]}/3)) -eq 1 ]; then
  STORAGE=${STORAGE_MENU[0]}
else
  while [ -z "${STORAGE:+x}" ]; do
    STORAGE=$(whiptail --title "Storage Pools" --radiolist \
    "Which storage pool you would like to use for the container?\n\n" \
    16 $(($MSG_MAX_LENGTH + 23)) 6 \
    "${STORAGE_MENU[@]}" 3>&1 1>&2 2>&3) || exit
  done
fi
info "Using '$STORAGE' for storage location."

# Get the next guest VM/LXC ID
CTID=$(pvesh get /cluster/nextid)
info "Container ID is $CTID."

# Download latest Debian 10 LXC template
msg "Updating LXC template list..."
pveam update >/dev/null
msg "Downloading LXC template..."
OSTYPE=debian
OSVERSION=${OSTYPE}-11
mapfile -t TEMPLATES < <(pveam available -section system | sed -n "s/.*\($OSVERSION.*\)/\1/p" | sort -t - -k 2 -V)
TEMPLATE="${TEMPLATES[-1]}"
pveam download local $TEMPLATE >/dev/null ||
  die "A problem occured while downloading the LXC template."

# Create variables for container disk
STORAGE_TYPE=$(pvesm status -storage $STORAGE | awk 'NR>1 {print $2}')
case $STORAGE_TYPE in
  dir|nfs)
    DISK_EXT=".raw"
    DISK_REF="$CTID/"
    ;;
  zfspool)
    DISK_PREFIX="subvol"
    DISK_FORMAT="subvol"
    ;;
esac
DISK=${DISK_PREFIX:-vm}-${CTID}-disk-0${DISK_EXT-}
ROOTFS=${STORAGE}:${DISK_REF-}${DISK}

# Create LXC
msg "Creating LXC container..."
DISK_SIZE=8G
pvesm alloc $STORAGE $CTID $DISK $DISK_SIZE --format ${DISK_FORMAT:-raw} >/dev/null
if [ "$STORAGE_TYPE" == "zfspool" ]; then
  warn "Some containers may not work properly due to ZFS not supporting 'fallocate'."
else
  mkfs.ext4 $(pvesm path $ROOTFS) &>/dev/null
fi
ARCH=$(dpkg --print-architecture)
HOSTNAME=homeassistant
TEMPLATE_STRING="local:vztmpl/${TEMPLATE}"
pct create $CTID $TEMPLATE_STRING -arch $ARCH -features fuse=1,keyctl=1,mknod=1,nesting=1 \
  -hostname $HOSTNAME -net0 name=eth0,bridge=vmbr0,ip=dhcp -onboot 1 -cores 2 -memory 2048 \
  -ostype $OSTYPE -rootfs $ROOTFS,size=$DISK_SIZE -storage $STORAGE >/dev/null

# Modify LXC permissions to support Docker
LXC_CONFIG=/etc/pve/lxc/${CTID}.conf
cat <<EOF >> $LXC_CONFIG
lxc.cgroup2.devices.allow: a
lxc.cap.drop:
EOF

# Set container description
pct set $CTID -description "Access Portainer interface using the following URL.

http://<IP_ADDRESS>:9000"

# Set container timezone to match host
MOUNT=$(pct mount $CTID | cut -d"'" -f 2)
ln -fs $(readlink /etc/localtime) ${MOUNT}/etc/localtime
pct unmount $CTID && unset MOUNT

# Setup container
msg "Starting LXC container..."
pct start $CTID
pct push $CTID fuse-overlayfs /usr/local/bin/fuse-overlayfs -perms 755
pct push $CTID ha_setup.sh /ha_setup.sh -perms 755
pct exec $CTID /ha_setup.sh

# Get network details and show completion message
IP=$(pct exec $CTID ip a s dev eth0 | sed -n '/inet / s/\// /p' | awk '{print $2}')
info "Successfully created Home Assistant Container LXC to $CTID."
msg "

Home Assistant is reachable by going to the following URLs.

      http://${IP}:8123
"

I just restored HA backup manually, I didn’t thought about the fact that without supervisor, you lose some features.

HA is working beautifully, and using amazingly low resources: