HA in Proxmox - Connectivity issues

I have been running HA in a Proxmox VM for a couple years and it has been great. However, I often saw errors related to what seemed networking issues (disconnections from the internet) and chalked it up to Unifi beta firmware issues or actions I took. Recently, I tried running the command dmesg -w in the shell of the Proxmox machine running HA and saw lots of these errors:

[90602.829933] e1000e 0000:00:1f.6 eno2: Detected Hardware Unit Hang:
                 TDH                  <e2>
                 TDT                  <d>
                 next_to_use          <d>
                 next_to_clean        <e1>
               buffer_info[next_to_clean]:
                 time_stamp           <101587504>
                 next_to_watch        <e2>
                 jiffies              <101587c20>
                 next_to_watch.status <0>
               MAC Status             <80083>
               PHY Status             <796d>
               PHY 1000BASE-T Status  <3800>
               PHY Extended Status    <3000>
               PCI Status             <10>
[90602.957665] e1000e 0000:00:1f.6 eno2: Reset adapter unexpectedly
[90603.054299] e1000e 0000:00:1f.6: Some CPU C-states have been disabled in order to enable jumbo frames
[90603.054757] vmbr0: port 1(eno2) entered disabled state
[90606.820053] e1000e 0000:00:1f.6 eno2: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
[90606.820125] vmbr0: port 1(eno2) entered blocking state
[90606.820129] vmbr0: port 1(eno2) entered forwarding state

From what I’ve read they appear to be related to a bug in a driver for a common Intel ethernet chipset. I believe my Lenovo Tiny P360 has an “Intel Jacksonville I219LM (w/ AMT 16.x)” (from Lenovo’s site).

Anyhow, I am going to add a Mellanox CX322A (2 x SFP+ 10Gbit) to the computer later today and move HA to use that card as I think that will solve the issue for HA… but I’d still like to fix the issue if at all possible.

I am mentioning it here as there are lots of HA users running HA in Proxmox and given the popularity of the Intel NIC chipset, this may be a really common issue amongst Proxmox + HA users.

This will be a fix for you: https://first2host.co.uk/blog/how-to-fix-proxmox-detected-hardware-unit-hang/

2 Likes

Thank you for the solution. I was hoping this would also fix the issue where the network connection goes down, and stays down (only for Proxmox nodes) when the switch power cycles, but the two issues don’t seem connected.