Moving Home Assistant to AWS

Very cool! I’m thinking doing the same but with Azure as I’m a solution architect on MS side ;).

I have the same question as MSmithHA, what do you do for your zigbee/zwave sensors?
I would also like to know for the VPN, does your VPN server run on your netgate or on AWS. Which one is the client?

Thanks!

2 Likes

Thank you, @MSmithHA!

I was planing a second post to discuss the IoT stack I used, but the lack of interest on this post made me scrap the ideia.

Let me give you the reader’s digest version and go straight to the final implementation:

Yes, I’m using a Raspberry PI to send data to AWS. One of the containers I have, runs AWS IoT Greengrass, a platform that extends some of the AWS capabilities to the edge. Once I moved all my other containers to the cloud, I decided to turn it into a “server” running only Greengrass and working as a MQTT client, sending messages to AWS IoT Core. To integrate things in a way that I was happy about it, I wrote custom python script that imports the Greengrass SDK and it’s able to read data from HA, from my Zigbee hub and from other endpoints that I have here at my home.

Once the data is sent, it arrives on AWS IoT Core and, from there, I can run the analytics I mentioned before.

@maxperron, here we are only HA users! :wink:

The Netgate device runs pfSense, a software that is both a firewall and a VPN client. The VPN endpoint is on AWS and all I have to do is to point pfSense to the VPC endpoint. It’s essentially a site-to-site VPN.

Cheers!

3 Likes

Do you route all your traffic throught the vpn or on only iot traffic?

Only IoT traffic. The reality is that data transfer costs can get out of control in a home environment where you have people doing all sorts of things, like streaming, gaming and downloads. Also, I see no technical reason to do it so as you don’t benefit much from it.

If, for some reason, I need privacy, I have a VPN account on a major provider and the pfSense is also configured as a client to this VPN and I can easily switch profiles depende on where I want my endpoint to be.

I also run an openVPN box so I can easily connect to my network from my mobile. This gives me peace of mind when working on sensitive documents over public wifis.

2 Likes

Hello, @rafaborges

Did you manage to get everything working?

I was thinking about moving my Home Assistant instance to Amazon EC2. My plan was to install HA on EC2, have a MQTT broker at my home (raspberry pi) and a few MQTT IoT devices at home (lamps, and stuff).

So, the IoT devices comunicate on my LAN with the MQTT on the raspberry pi, and the raspberry pi communicate with HA on EC2.

I know, since I am still using a raspberry pi, why don’t I have HA’s instance on it? Well, I want to use something on the cloud for my particular reasons :slight_smile:

I got two questions:

1 - Will I be able to access my HA instance via a web browser on EC2?

2 - You said you were using VPC on a VPN and stuff… is that necessary? Can’t I just run everything using only EC2?

Thanks
And best regards from another HUEBR fella
:+1:

Why do you say it will not be cost-effective? Surely it doesn’t require much resources. Can you give more info on this?

It all depends on your goals.

On my tests I was using a VPN so I could have an EC2 instanca on my local network so HA could see all devices easily. How expensive is to use the VPN? Well, considering US East (Ohio) and that the connection is active for 30 days, 24 hours a day. Also, let’s say that we transfer about 500 GB out through that connection each month. The site-to-site VPN is charged on an hourly basis, for each hour the connection is active. For this AWS Region, the rate is $0.05 per hour (total of $40.00). Now you add the data transfer out ( $0.09 per GB, first GB is free). This will result in a charge of $44.91. So, only for the VPN, you will pay $80.91 per month. An I’m not including the dedicated firewall you need for this.

For a home user I can’t call $80 cost-effective. And I’m not even talking about the t2.micro ($10 per month) for your HA instance, maybe RDS as a managed service for your database and S3 for backups.

Of course you can bypass the VPN by opening some ports on your network to allow remote connections, but you still need a good hardware (like a router with a built-in firewall) and a static IP address (or DynDNS). But hey, do you really want to open your home network to the world? Are the security compromises worth all of this?

Chances are your home automation system will also include some workloads that are better handled by local processing, like video streaming and local media management. So while the cloud can be interesting in some scenarios, if you have more ambitious goals, you will probably get a better bang for your buck ratio with a local server.

2 Likes

Nice terms being used at AWS :rofl: guess you meant shift :upside_down_face:

2 Likes

Do you have any kind of fallback solution in the case internet connectivity goes down? Not sure how well keepalived / vrrp works through a VPN. :thinking:

I’m embarrassed now. That was a very bad typo… :smiley:

1 Like

I don’t know much about VRRP to tell you how it works with VPNs, but when it comes to Home Assistant, as I have already mentioned, I see a VPN to AWS as an overkill.

That said, a high availability solution could be achieved easily if your router supports multiple WANs. On mine WAN1 is an ethernet port that connects directly to my ISP and WAN2 is 5G USB stick with an unlimited data plan (I use TPLink WR902AC to connect the USB dongle to the ethernet port).

@rafaborges Thanks for the post! You’ve inspired me to use home assistant for my office and thought this would be a good route. I currently have the “supervised home assistant” running on a t2.micro instance. I would like to stay with the free tier if possible. I am have an issue with the status check failing every 24 hours. This is my first endeavor into using AWS, but my suspicion is that the t2.micro does not have enough resources to run home assistant, grafana, influxdb and node red. I noticed you moved your database to RDS in the OP, which I am beginning to explore. Currently everything is running on the EC2 instance. Do you have any other recommendation for improvements?

Hi Charlie, when you say “status check failing every 24 hours”, what do you mean by that? Are you talking about the status check tab on the EC2 console or are you talking about any specific log message on the hassio log? You are right that a t2.micro is not very powerful to run all of these, but if you don’t mind some lag and waiting, it should not be a problem. My first deployment was on a Raspberry PI 3 B+ that has a slightly worse configuration than a t2.micro.

Regarding the free tier, keep in mind that it’s only free for a year. After the trial period is over, you pay the running costs for the resources you are using.

Here some recommendations:

  1. Get away from the Supervised Hassio because it was deprecated a couple of months ago.

  2. Be mindful about security: rest assured that people will try to hack their way into your server. Two factor authentication is a must have and certificate based SSH. Ingress only add-ons.

  3. AWS charges for data getting out of AWS, so make sure you only stream out data that is relevant.

  4. I used the recorder component with RDS for Postgres running on a db.t3.small, but a db.t3.micro will suffice for most use-cases.

The status check tab on the EC2 console fails and restarting is the only way to resolve the problem. Thanks for the recommendations. I will continue to experiment with it to improve stability!

@rafaborges nice to stumble accross this thread, as I have been experimenting with a hybrid rpi/cloud setup. I created integrations for rekognition and S3 which might be of interest. I previously experimented with cloud RDB on google and found that not to be cost effective, but I am interested in finding a low cost cloud solution for storage if you can suggest one? In my approach the RPi is the gateway to the cloud, and cloud is leveraged where there are compute/storage intensive services required. The RPi 4 is quite capable hardware, but things like graphing with grafana become slow when the history gets large, and on the pi you cannot do object detection as fast as rekognition! Cheers

Here some questions to help you understand what’s going on:

  1. What metric are you using to monitor the health of your instance?
  2. If the status check alerts you, can you still access your EC2 instance?
  3. Does the time of the alert match the time of an event on hassio?

I don’t believe the size here is the problem. I used to wok this very same configuration on a Raspberry PI 3 B+ with no issues at all, so I guess it’s just a problem related to something else rather than the instance itself.

I don’t have much information about RDB on Google, but I can say that an average hassio database, using Postgres on RDS (a single db.t3.micro instance), will set you back around 15 USD/month. Does this price tag qualifies as low to you? It doesn’t sound much, but over time can be complicated to justify for the average residential user when, after a yer, you will have spent enough money to buy a small computer.

Specifically for me, I ended up buying a powerful NUC to run all those services. I paid around $500 and, converting to cloud native services, I would have a monthly bill of around $40.

In the future, when we make Amazon Timestream publicly available it will be much cheaper to store time-series database. Currently this would be a challenge becase hassio uses SQLAlchemy, so the recorder component does not support time-series databases. But with enough interest of the community, we can think about building Recorder support for time-series databases, like influxdb, Timescale, and even Timestream.

1 Like

I suppose using AWS for HA is ok if your internet is never down, but I moved to HA just to get rid of the dependency of being connected to the internet. I still want all my automatons, etc. to work when Spectrum is down 2-3 times a month.

Cool work none the less!

I am not able to access the EC2 instance via SSH and the frontend of home assistant is not reachable. I have run home assistant on a pi without issue before as well. After doing some research I have found some similar issues. I had installed docker.io rather than docker ce which I believe is causing issues. I will follow this guide and see if that fixes things.

@rafaborges the only ‘free’ solution I am aware of is google BigQuery, which gives 10GB per month on free tier. I think you are charged for queries however.

RE micro instance, 15 USD/month probably is not justified for a residential user. I also am running a local db on a Synology.

RE timestream, my experience of the HA community is that there are people interesed in almost every niche imaginable, so I am sure there would be some level of interest. Probably this would be most of interest for commercial users, perhaps who want to combine data feeds from multiple instances. Like you say this could be implemented via Recorder, and I imagine it would be very similar to influxdb