Well, poop, my SSD tanked. Need advice for new setup

After upgrading to 2026.4.4 on Saturday morning it did not come backup. Uh-oh. I stewed over it during the weekend and plugged it into a monitor this morning and the screen would not come up. Luckily, I had a 4B 16GB spare and moved the MicroSD card to it. As it booted, it started screaming about certain sectors on the MicroSD card…so I got a 2 for 1 boo-boo. Not going to blame the update. It was just luck of the draw. Parts have Mean Time Between Failures.

So I am at a crossroads: Go docker or go Raspberry PI 5.

I have a QNAP TVS-h1688X that is currently running a Windows Server VM and in the past I did run a docker of Pi-Hole that I removed after I got my bigger and more functional business class firewall that did what Pi-Hole did. I have tagged VLANs so I have my IoT VLAN (VLAN 253) going to it but I am not sure how to tag the docker VLAN for HA to use VLAN 253. I am going to ask QNAP about that. Then, I would have to convert my Z-Wave and Zigbee adapters to either be USB over IP or replace them with IP ones. Not opposed to doing that. It just seems like a lot of work. The hardware is clearly a single point of failure but I do not have the resources, yet, to build two or three Proxmox or other virtual hosts to allow containers to trade back and forth between them plus, it really bothers me that the device has a single power supply. I am used to enterprise servers with dual everything…just hate it when I have to pay for it. This NAS is running 96GB of memory, dual 4TB M.2 drives in RAID for the OS, dual 4TB SSDs drives for the VMs and containers, dual 4TB SSDs for caching and 12 18TB platters for storage in RAID6 with a hot spare…all drives Wester Digital Reds so it is no slouch. The VMs are running on the LACP teamed 2.5GB adapters running at 1GB each because my switch does not have four 2.5GB ports available but, the dual 10GB ports are teamed up in LACP for Plex and file storage. It screams.

Or…

I could buy a Raspberry PI5 but I would want to outfit it with DUAL M.2 drives in a RAID1 (mirror) setup AND run int PoE as my Cisco switch can supply up to 30-Watts on each port. I have one with a single M.2 drive and PoE HAT but that HAT cannot seem to negotiate with the switch that it needs more than 15.4-WATTs and I have to hard-set it in the switch to tell it to give the device the full 30-Watts. That is running my Ubiquity server for my access points and a headless PlexAmp for holiday music. I was thinking about building another one of those for Retro-Pi video games but the PoE+ negotiation spooks me.

Looking for advice on HATs if I go that option. I have seen a few dual M.2s but no PoE so that might be a second HAT? Theoretically, I could replace the Raspberry PI5 if it ever died by just moving the HATs and drives like you could do with a MicroSD card so the single point of failure on that is not too bad. Can HA support RAID1? Trying not to have it run on an external power supply if at all possible because the PoE on the switch is supplied by the switch’s dual power supplies PLUS UPSes on each power supply.

Willing to pay for help as far as setting up the docker option if anyone knows anyone if that is my best option. It seems like the option that would allow me to move it to new hardware in the future if that ever became a possibility.

Some important things I do have in my environment are: Z-Wave, Zigbee, Govee, Tempest Weather Station, Davis Technologies weather station, KASA power strips over WiFI, Sense, Ring, NVDIA Shield, Plex.

Since I basically have to start from brand new, given it your environment, those with a ton of experience, what would you do? I am looking for a consensus.

MiniPC all day, most recently I went from RPI5 w/SSD hat to miniPC. Loving it! Get a used one for about $150 USD.

Why NOT to buy a PI to run Home Assistant

Could I have RAID1 on the M.2 SSD? Does HA support that? Plus, I do not think I could get one that runs over PoE+.

That was one of the options but I want a PI to run RAID1 and be PoE.

Why do you want Raid?

If you go minipc with commoditity parts you can have one if each overnight and I bet money they are within three hours if you venture into a store.

Reason. - the cost of getting gear to do this will likely double your cost. While just keeping a spare SSD around is peanuts.

I ran my ha on a NUC bare metal for a year. And the restore path worst case was. Drop latest ha image to SSD. Install. Boot. Login. Pull restore. Wait 25 minutes. Login. Approx 1 hr. From hardware.

I run it in Proxmox on a NUC now and it’s… Pull current HA VM from script. Everything is identical. Approx 45 min.

If I need brand new gear because it’s dead. Worst case 24 hrs. Most likely 4.

My cost - current lowest cost on best case commodity gear (big SSD and a 4 core NUC) couple hundred bucks. Now if I want to raid 1?
Probably add another hundred or so bucks to the setup and a second SSD.

You already said mtbf is a real thing. Mtbf on a SSD is waaaay different than an SD. So odds are they outlast the iron this time.

Cheap but quality prosumer gear. Consistent tested backups in easy reach and know where your fastest path to replacement is - is my vote.

1 Like

RAID because I do not want a failed storage device to do this again. I do not necessarily care about cost if it is reasonable.

Rhays fine but be sure of what you’re really buying there…

First. Backups are the ultimate protection… End of story. All hardware fails.

Raid. Ok One of those drives fails you’re still fixing something. (you’re not leaving the array with issues so you’re still replacing the drive)

You just bought uptime… Not resiliency. You also doubled your hardware error exposure at the same time. And complexity and requirements.

Two drives fail you’re still on the backup.

You bought the ability for one drive to fail and stay up while you source the replacement.

(mind you I’m telling you this while I listen to 12TB of rust spin on my Synology and it’s array. I don’t put expensive disk on my ha box. It’s disposable… There’s nothing in my home that isn’t designed for HA to be offline for up to a day before stuff gets weird…)

I saw where I can input an OVA file and make it a virtual machine…I might go this option than go with the Dockers and change my Z-Wave and Zigbee to IP enabled.

1 Like

My next set of radios will be IP connected. It’s the step that takes me the longest and I haven’t automated. punch through the dongle’s usb port to the VM. I’d love to not care about THAT part.

1 Like

I am looking for suggestions on IP enabled Z-Wave or Zigbee or I could do a USB over IP option, especially with the PoE I have. These are pretty old.

General comment:
Your SDCard had a few errors, and now you want to build a five nines data center?
You had a backup/contingency strategy failure. It will probably happen again.
Fix that.

Thank you but I want to do it the way I requested.

I don’t have a ton of experience but my Ha installation is going on it’s 6th year and in this time I was able to recover a couple of total failures (not hw related in my env…).

I got a couple of identical thin clients that I upgraded with an internal SATA ssd, 128Gb are cheap and enough form me, the thin client I bought had the internal SATA port, I had only to stuff a longer cable and the SSD in it’s tiny case.

I have a primary one that runs docker with HA+mariadb then a second one where I push via daily backup script all the data (docker setup script, data folder and DB).
Being a little lazy and a little cautious I did not use BTRFS so for my 20GB DB I stop it during the night, then do a local clone of the data folder then restart the docker, when it is back online I do a rsync copy to the secondary system.

With this setup in case of an HW or SW failure it is just a matter of minutes to swap the thin clients, give the secondary one the identity of the primary and then restart all the goods.

Tested and working, this gives me also an env to do a limited upgrade test before committing the primary system.

A couple of used miniPC should be better but being on 24/7 with the TC power consumption I ran 2 at less the power needed for a single NUC.

Mine have only 4GB RAM but it is more than enough for my load (no video feed)

I have multiple raspberry used them for other projects but I would not use sd card for a running transactional DB, even SQLite. One of these are used as a remote serial gateway via USB gadget mode to access the serial port of the primary thin client to have a serial console access in case I lose network connectivity (eg, during IP config tinkering).

I chose not to use virtualization because for my use case (limited RAM) the cons beat the pros.

I already virtualized it as a VM and not as a docker. Sometimes you need to throw out all the old and start from scratch. I was able to get a bulk of things added that ran over the network and I was able to build a Raspberry PI 4B I had laying around with a PoE hat and brand new MicroSD card as a Z-Wave-JS UI server and restore my original keys from the old setup for Z-Wave and am still cleaning that up.

Tomorrow, I hope to work on my Zigbee network with a brand new coordinator on that same PI or, if worse came to worse, get a IP Zigbee coordinator that runs off of PoE.