TP-Link Switches going "Unavailable"

guice · February 16, 2020, 3:15am

I just want to say in recent days, my TP Link discoveries have been far more reliable. Now, occasionally would go offline. But most of the time, upon reboots, all my devices are now getting properly discovered.

I don’t if there was a bug or improvement related or not. I do know as of 1.105.2, things responded. I’m now on 1.105.4 and still as good, even with 4 newly added devices.

CaptTom · February 16, 2020, 1:03pm

I had one TPLink smart plug throw warnings 8 times in the past hour. But unless I look in the log, it doesn’t seem to matter. The data (on/off status and energy usage) appear to be mostly all there

I assume it retries and comes back quickly. For my application, “mostly” is good enough. But I think for some users this would be unacceptable.

It’s easy to blame the user or a weak WiFi signal. But other WiFi devices work fine in the area of my two smart plugs, and even they seem to work fine most of the time. I keep coming back to the idea that somehow these things simply don’t always respond to whatever HA is doing to contact them, and a retry should be considered normal, not an error condition.

guice · February 16, 2020, 4:41pm

It’s not really the wifi signal, though, but more about MIMO, how traffic communicates through your router. The Hook Up goes got a explanation on the how wifi communication works (link goes directly to the point of interest). The most IoT devices you add, the larger your “queues” become – you will need a good router, not some single-antenna ISP supplied one.

After learning this, I added a new router with multiple points, separate from my primary router, and moved almost all my IoT devices to that.

This wasn’t a factor for my switches (when I did it vs my last complaint post), but I do know, it’s definitely seems to have helped my Google devices continue their stream - we almost never get drops now. Before, used to be once or twice a day.

CaptTom · February 16, 2020, 8:27pm

How many is too many? I have exactly two smart plugs. I also have three smart thermostats which don’t communicate with HA, but I guess those count as IoT devices. So that’s five.

Right. The problem seems to be with the way HA handles the switches.

JOHN3 · March 2, 2020, 5:49pm

jono Bit off topic, but how did you connect an old router to your system?

jono · March 2, 2020, 6:53pm

My main router is upstairs. I plugged the other router (AirPort Extreme) in downstairs via Ethernet, then went through the software for the AirPort Extreme to extend the network (I think that’s how I did it, it was quite a while ago).

riverrunner · March 12, 2020, 2:26am

My TP-Link experience has been varied, as have most of yours. I have 6 HS-200 switches and 2 HS-105 plugs. Often one or more would go ‘not available’ while Kasa worked fine, and a re-start of HA did not help. Re-starting the switch/plug did - just a re-start, no need to re-set and reconfigure. This was with all having static DHCP addresses and HA TPLink discovery set to ‘false’.
When I tried to ‘ping’ the ‘not available’ component, the static address did not respond.
My Router and router-turned-access point are both running DD-WRT. I had been running the access point as a ‘DHCP Forwarder’. Apparently this is wrong. See DD-WRT forums here.
By going back to ‘DHCP Server’ - and then disabling it - on the access point, my TP-Link devices have all be error-free for 4 days now. And counting.
DD-WRT build r34610 on both.

CaptTom · March 12, 2020, 1:43pm

I think this is a good first troubleshooting step. It’s good to know if HA can ping the device before moving on to other steps. Obviously if it’s off line for any length of time, the focus needs to be on fixing that.

What I suspect you’ll find is that these devices do drop out briefly, even with a good network connection. It seems Kasa is OK with that, just waiting a bit and continuing on, while HA throws its hands up and declares the device unavailable.

Obviously having a rock-solid network connection is an improvement. But the real solution is to update the HA component to work more like the Kasa app.

Apparently the development on this component has gone cold and there’s no interest in making this update.

riverrunner · March 13, 2020, 12:32am

From what I know, the Kasa app is cloud-based (as are Google Home and Alexa), while HA makes every effort not to depend on the cloud, for various reasons. I think the TPLink integration in HA is a local-only implementation; please correct me if I am wrong.
Still error-free after day 5. Any idea how I might be able to stress-test the connection(s)?
I wonder if the current COVID-19 problems might go so far as to make the Internet (and various other services we take for granted) a little less stable than we would like. That would make the local-only features of HA that much more valuable.
I personally value the local-only features, where they can be made to work well. Many people think of it only as a security issue, while I just like to “paddle my own canoe” and depend less on others for daily services where possible.

CaptTom · March 13, 2020, 1:32am

You connect the Kasa app to the local devices over the LAN. No cloud needed. Like so many of these types of app, they encourage you to use their cloud solution. But you can skip that step.

The HA integration uses the local address of the TP-Link devices, not the cloud.

Totally agree!!

riverrunner · March 16, 2020, 12:33am

Just an update on the ‘error’ state - 8 days now. I did find a few ‘unavailable’ points in these switches, but only for 30 secs or so at a time (1 polling interval, I believe) once per 24hr period or so. Not on all, but some. Might cause an issue for some automations. Not sure how HA handles the ‘status’ of an ‘unavailable’ switch, it might hold the previous state as current. I have not yet seen an automation problem with the current configuration (sure did before!), and I have HA use the status of the switch in several automations.

CaptTom · March 16, 2020, 1:42am

I have an update too. After a couple of relatively trouble-free weeks, today I noticed one had been “unavailable” for over 12 hours. This is unusual. Since I have auto-detect on, they usually come back after a while. While it was still unavailable, I SSH’d in to HA and did a ping from there. Absolutely rock solid. Then I tried the Kasa app. No indication of any problem. I was able to control the switch normally. HA was still saying unavailable.

I ended up deleting the integration and re-starting HA. The switch (and the whole integration) was auto-detected and came back normally, all with the same friendly names and custom icons.

So, yes, the HA integration is buggy. I understand that none of the developers care. I don’t mean to be a complainer. I’m just documenting it here so others don’t have to find all this out the hard way.

DougAmes · March 16, 2020, 3:04pm

All my TP-Link devices individually go Unavailable once in a while, I presume due to WiFi issues. Not often enought to be a big problem. Usually Google Home and Alexa will report them offline as well if a I do a quick voice command test. Either they recover on their own, or flicking the switch on and off will do it. In Home Assistant the bulbs and plug switches usually recover their normal status on the next poll, but my HS200 wall switches remain Unavailable permanently for some reason, probably a bug affecting those devices only. The only fix is to restart Home Assistant. If any TP-Link device is not detected normally during startup, it will remain permanently Unavailable, and another immediate restart is required.

None of this is a huge issue, but it is an example of the general fragility of the whole infrastructure, including Home Assistant, TP-Link Kasa, and even WiFi networking. By the time you have dozens of different smart home devices, and you start layering services on top of each other, it’s just too unreliable to depend on it. We’re a long way from dependable maturity in this field.

guice · March 16, 2020, 3:50pm

This would be a wifi thing. It is true, depending on router, there are limitations you can hit when having more than a dozen or few devices (not just iot: cellphone, laptop, computers, anything with wifi).

The main issue is on HA restart - if a TP Device isn’t discovered, it won’t be “re-discovered” without another reboot because discovery only works on HA start.

Devices going offline/online while HA is running is HA timing out polling the known IP Address: that’s wifi connectivity. IP addresses are discovered on HA startup.

CaptTom · March 16, 2020, 5:38pm

Actually, it’s been reported here on a number of other devices, including my HS110’s.

And yes, WiFi issues can initiate the problem. I’d especially avoid random DHCP IP address assignments. Always reserve an address for each device.

Which is pretty drastic. Other integrations don’t require this, nor do other apps, including TP-Link’s own Kasa.

Agreed that this is a minor annoyance in the big picture of life.

But it is a totally unnecessary one. The fragility is not with the network, or even with Home Assistant. It’s this particular integration which simply isn’t designed to work the way the devices expect. Clearly, Kasa can tolerate occasional network issues, as can just about every other IP-based system in the world. Reliability and flexibility is sort of the whole point of IP. But for some reason this particular integration was rigidly coded to fail miserably, and not recover, on minor interruptions that most any other program would handle.

DougAmes · March 16, 2020, 6:20pm

WiFi issues can initiate the problem. I’d especially avoid random DHCP IP address assignments. Always reserve an address for each device.

Yes, the developers of TCP/.IP networking shouldn’t have bothered with that whole unreliable DHCP thing, when it’s really just an unnecessary convenience saving us the work of assigning and managing fixed IP addresses. We’ve got loads of free time to do that stuff for ourselves.

But it is a totally unnecessary one. The fragility is not with the network, or even with Home Assistant. It’s this particular integration which simply isn’t designed to work the way the devices expect.

I see similar fragility problems related to IP address assignment all over the place with Home Assistant, including with Google Home speakers and with Sonos.

guice · March 16, 2020, 6:50pm

It has noting to do with ip address assignment. Your ip issues aren’t DHCP related, not unless your modem is set to intentionally reassign ips after X period of time. DCHP is very reliable and doesn’t really have anything to do with wifi networking. TCP networking is very reliable, multiple packets are sent to/from to insure all data is received. UDP, however, is an intentional “not as reliable” method of package delivery.

WIFI instabilities has to do with your router and surrounding networks (wifi noise). The wifi protocol is still very much synchronous communication, meaning other devices have to “wait in line” to send their data through your wifi router. Initially, things move so fast, you won’t noticing anything with a dozen or so devices. But as you increase those devices, that “line” will start experiences delays.

Only solution there is to solidify / upgrade your wifi, add access points, add more wifi entry points, switch up listening channels.

CaptTom · March 16, 2020, 7:58pm

I’m sorry if it sounded like I was blaming TCP/IP. But for this particular implementation, between HA and the TP-Link devices, it seems that changing IP addresses isn’t handled well. Reserving dedicated addresses avoids that, and so seems to help. We’re not likely to change the way TP-Link stuff is designed, but it should be possible to change HA to work with it.

That’s interesting. I’ve only seen it with the TP-Link devices so far. But either way it would seem to make sense to make addressing this a priority. There’s a lot of exciting development going on at the higher levels in this project, but sometimes it seems the foundation of core functionality could use a little attention. Another example is the utter failure of HA to reliably detect state changes on GPIO pins. If basic hardware functions simply don’t work, all the fancy themes, automations, and sensors become pointless.

jeckels · March 25, 2020, 12:53pm

I would like to add my TP-Link experience to this discussion. The HA TP-Link integration has some issues. I have six HS200 switches and two KP400 outdoor dual outlet plugs. I have set this up via the integration and manually configured in yaml configuration. Using the integration, rarely are all of my TP-Link devices discovered. If I manually configure them never are all of them available on a reboot. It appears the KP400 (which are configured as a strip) are the source of the problem. When I remove them from the config all the HS200 switches appear available. When I add them back in the availability of all my TP-Link devices is random at best. Sometimes the KP400s show up and a couple of switches are unavailable. Sometimes the KP400s are available and a few of the switches are unavailable. Appears to be completely random. So for now I have stopped using the KP400s. I love the TP-Link devices and they are always available in the KASA app. I would just like to see HA work better with TP-Link. I have been using HA for about two years and am a HUGE fan, so a few minor issues along the way are completely acceptable to me.

thoughton · April 6, 2020, 3:33pm

I was struggling with this recently, an HS110 kept going ‘Unavailable’ after being pretty solid for a year or so.

I also have a HS100. Both of them occasionally become ‘Unavailable’ but it’s for a few seconds here and there, a few times a day.

I was recently doing some housekeeping and changed both of them to static IPs. The HS100’s performance was the same as before, but the HS110 suddenly started becoming ‘Unavailable’ for hours at a time. It worked normally at the other times. And it worked normally in the Kasa app throughout the ‘Unavailable’ periods.

After messing around with restarts and the integration and reading forum pages I still could’t fix it. I knew it wasn’t a wifi problem as the wifi in that location is very good. Eventually realised I had changed the entity name of the HS100 during housekeeping, but not the HS110. Went into the states page, clicked on the HS110, changed the entity name, updated a few config files with the new name, restarted yet again, and it’s behaviour has gone back to normal.

Just my 2c in case it helps anyone.