Hi folks,
A couple of weeks ago I lost my RPB, so I decided to do an experiment: move my HA stack to AWS. I will report what a did and open the thread for discussions, in case you are considering doing the same. Feel free to ask me questions about it!
Caveat emptor: I’m a Solutions Architect at AWS and this post has absolutely nothing to do with my job or the company I work for. I did it to understand how a home user could leverage the cloud. It’s important to mention that I used an internal AWS account, so the costs associated with this exercise were not paid by me.
The first step for this experiment was what we call lift-and-shift, i.e., move the infrastructure as it is to the cloud.
I started with a single EC2 instance (t2.small) running everything (I was using Hass.io, so it was a bunch of containers). This instance was inside a VPC connected to my network through a dedicated VPN channel that landed on my home network though a netgate sg-3100 running pfSense. For security reasons, the instance was deployed on a private subnet, but the web frontend was accessible through a public ALB. This ALB was pointing to an Auto Scaling Group that spans over 2 AZs and, in case of the main instance becoming unavailable, it would spin up another instance on a different AZ. The configuration files were stored on an EBS attached to the instance.
Because my previous platform was a Raspberry Pi, I though that any performance improvement that I get would be cancelled by the response time. But gosh I was so wrong… the improvement was massive and the app became much more responsive. From my laptop the ping to the instance was faster than the ping to the RPB standing a couple of feet from me! That makes sense as this RBP was also being used for a lot of other services and it had a huge IO problem.
Moving on with my experiment, I decided to test wether the performance would increase if I removed the database from the EC2 instance. To achieve this I moved the database from the standard SQLlite to Amazon RDS for PostgreSQL and configured the recorder component to use it. The result was what I expected: data intensive pages were able to run as fast as any other page (this was extremely important later when I deployed a Grafana server). This had an unexpected result: I couldn’t see a performance difference, but the instance was running on a much lower CPU usage.
Another test I did was to send all the logs to Amazon CloudWatch and use SNS to notify me whenever I had system problems. To make things extra speedy, I moved the logs to tempfs and, from there, I used a custom script to send data to CloudWatch.
Because the data was really available on the cloud, I was able to use other AWS services, like the creation, training, and deployment of prediction models. I also used Amazon API Gatway to create a custom REST API that would allow me to control parts of the house through some bash scripts. Eventually I got rid of API Gateway and replaced with some lambda functions.
Finally I decide to move all the containers to ECS and stop using EC2. This proved to be the best decision as it became much simpler to manage all the containers and I could shut them down when not in use.
I understand that this is not a cost-effective approach and I would strongly suggest you to not do it if your use-case is only to have basic controls and data storage for your home. I’m aware it’s an overkill, but I did it just for fun.
I’m not going to discuss costs because your mileage may vary depending on several factor, like how much data do you send to the cloud, what kind of reports and dashboards are you creating, what instances are you using, what region are you deploying your workliad, and much more. You can use this calculator to determine your monthly fee.
I hope this post is insightful to anyone considering this approach and feel free to
Cheers,
Rafa