Observations from the Mitsubishi Kumo Cloud service event on Friday April 12th around 5:45 pm EDT
Sometime after 5:45pm EDT I started to notice that my Kumo devices in HA started to oscillated between Unknown and Unavailable. While not uncommon given the general WiFi flakieness of the devices, the simultaneous occurrence of it was odd, and it didn’t seem to be recovering on its own. I went through a full HVAC system power cycle, and that having not fixed the problem, I also rebooted all of my access points, even though I wasn’t seeing any other Wifi issues.
The devices were not staying on the network. They were not responding to pings for the most part, seeming to come up for 20-30 seconds every few minutes but for the most part were offline. At this point, I checked Downdetector and saw the problem reports for Kumo Cloud. Also checked with a couple of people I know who have it who also confirmed their systems were offline in the app (unfortunately don’t know anyone else doing local access via HA or Homebridge).
Now, fearing the worst, did Mitsubishi just pushed out a software update that bricked all the devices? I started looking at active flows from the devices to hosts on the internet from my firewall. It looked like they were resolving the hostname geo-rev2-b.kumocloud.com. On a lark, I put in a firewall rule blocking all outbound connectivity from the Kumo devices to TCP/80 and 443. Within five minutes, all of the devices were back online, pingable, and communicating with Home Assistant reliably.
My suspicion is that whatever event was happening at Mitsu was causing the devices to get caught in an infinite reboot loop. Blocking them from communicating with Mitsu’s cloud service seems to avoid whatever the trigger was.
Some other observations from my investigation:
-
The Kumo Cloud devices appear to make DNS queries to hard coded Google DNS (8.8.8.8) bypassing local DNS configuration. Using DNSBL lists will then not be an effective technique for blocking their outbound communication. You’ll need to block it on the firewall.
-
The endpoint that the device is trying to communicate with geo-rev2-b.kumocloud.com resolves to AWS EC2 IP addresses in us-west-2. The DNS responses wee rotating through multiple sets of 4 IPs, so I suspect hat this might be a CLB/ALB where the IPs will be dynamic and change as the ELB scales in and out or replaces nodes. Therefore, Firewall rules can’t block on IP address. For this event, I just blocked all outbound TCP 80/443 traffic from the Kumos.
-
The endpoint is serving up a self-signed certificate (not uncommon for IoT applications), but with an expiration date of May 7, 2045. Assuming Mitsubishi has no mechanism to update the trust stores on these devices, we now know that built-in drop-dead date for them. I’m not too worried about my HVAC system, let alone the controllers, lasting that long…
-
During the event, the units that were on appeared to be cycling every couple of minutes. Two of the units were heating the room when the MHK2 thermostat was several degrees above the setpoint. I’ve seen this behavior before where connectivity is temporarily lost with the MHK2 and the unit switches to the internal temperature sensor. Given that the MHK2 receiver is connected via the Wifi module, I think this supports the theory that the device was caught in a reboot loop during the event.
I’m strongly considering leaving this firewall rule in place and see if it improves my overall system reliability. I have no need for Kumo app based control. My only concern is allowing the units to continue to sync their clocks (not needed for schedules as that is all done through HA, but to keep the clock on the MHK2 updated). Hopefully, that uses standard NTP as I’m only blocking HTTP/HTTPS traffic currently.
Hopefully this is helpful to anyone else who observed the outage as well.
I am curious to here from folks that are preventing their Kumo devices from reaching the internet, how is that working out for you? What specifically are you blocking?