We’re Alex and Alessandro (@ale_sciarrillo) , two freshly graduated CS & AI engineers with a passion for computer vision and home automation. Over the past several months we’ve been building SUPERHOME — a new, complementary way to interact with your smart home that we think you’ll find pretty natural.
No voice commands . No app . Just a hand gesture .
Why we built this
We wanted to remove friction from everyday home control — something silent, effortless, and fast. After using it in our own homes for a while, friends who saw it started asking for one too. Their feedback pushed us to make it significantly more robust, and now we’re ready to share it with the broader HA community.
What it does
A single hand gesture lets you toggle devices, dim lights, trigger automations, and activate scenes — even in complete darkness, since the system is fully vision-based with a night vision camera.
Two main gestures (open palm and “L” thumb-index) handle target selection and toggle actions, keeping the interaction simple and intentional enough to avoid false triggers.
Features
2-gesture system — palm for toggle, L-shape for target selection
Night vision — IR LEDs + NoIR camera, works in total darkness
Wide 130° FOV — covers large rooms
6+ meter range
Multi-person support
On-device screen for visual feedback and target selection
Fully local — runs on a Raspberry Pi 5 (≥4GB RAM), no cloud, privacy-first
Native HA integration — custom component + dedicated dashboard view
Where we are
The system works well and we use it daily, but we’re still actively developing it. We’re sharing it now because we want your input to shape its future — feature priorities, use cases, edge cases we might have missed.
If there’s enough interest, we’re exploring making this available beyond our circle of friends.
Stay Updated
If this sounds interesting to you, let us know in the comments and sign upfor our mailing list to follow the development.
What would you control with this? Any use case or feature you’d want to see?
hand up/ thumb out , close palm(like fist up thumb out) for up arrow
Hand up/thumb in, close palm over thumb(like fist up with thumb in) for down arrow
Hand up/ finger spread for enter
if it could detect hand and repeat function it would make menu navigation easy; for example:
I using Netflix
I want to scroll down in menu 3 slot and select show
Hand up/thumb in >> Close palm over thumb >> action: down_arrow >> continue hold palm close over thumb 1 second >> action: down_arrow >> continue hold palm close over thumb 1 second >> action: down_arrow >> Hand up/ finger spread for enter >> action: enter
In the end it would be best if there was just a set of hand gestures that appear as sensor or switch in HA and user can set the action based on their needs. Sign language alphabet may be good start since it may have real users with need and it’s universal language with many learning guide
Thanks for this, media player control is a great use case! Your hold-to-repeat mechanic for scrolling is a really smart UX insight.
Good news: we already have a gesture mapping dashboard where you can assign any HA action to each gesture. We’ve also implemented hold detection with a configurable time threshold, so triggering repeated actions by holding a gesture is already supported, we use it ourselves to dim lights by holding a gesture, for example. Everything you described is totally doable. The sign language alphabet as a universal reference set is something we’ll keep in mind as we expand the gesture library.
What’s your main use context — couch, bedroom? Would help us prioritize!
Media Room/theatre
Loud Room isn’t great for voice control.
family room. 700 - 900sqft
Single voice device like HAVPE struggle to hear from across room and centered in room not possible. Vision based or motion control may be better.
Totally agree🤝, loud rooms are one of our primary use cases, and honestly so are quiet ones: being able to control your home without making a sound (late at night, during a movie, when others are sleeping) is just as valuable. Voice assistants are great but they have real limitations in both extremes, which is exactly why we see SuperHome🦸♂️ as complementary rather than a replacement.
A 700–900 sqft family room sounds like a perfect fit — our 130° FOV and 6m+ range were designed with exactly this kind of space in mind. Would you be interested in trying it out?
Really interested in this - the use cases of loud communal living spaces seem pretty evident, but I can think of others. For example, I share a household with someone living with MS, and they find buttons on a small interface screen a bit irritating to use, constantly looking up and down.
They immediately responded positively to the idea of controlling lights and media with gestures, as they would find it both more instinctive and more forgiving.
Thank you for sharing this — accessibility is a use case we find really interesting. You’re actually not the first to bring up accessibility, which suggests it could make a real difference for a wider range of people.
We’d love to understand more: what would be the primary actions needed — lights, media, something else? And are there specific gestures that would feel natural or comfortable for them, or any movements that might be difficult?
Would you or the person you mentioned be open to trying it out in that scenario?
I think that would depend on how easy it is to pick up, but potentially things like controlling covers, and using the property intercom (assuming similar gestures could be context related to different tasks).
Holding arms up for too long would be an issue, or anything that required maintaining tension in the hands or fingers. Depends on how precise or accurate the gestures need to be.
These things are best tried out in practice than theory, so yes, absolutely.
I’m sure there were research papers out there already to talk about effectiveness of different gestures, pros and cons for each, and use cases. Those papers can save you a lot or trials and errors.
I have also seen some computer vision models & vision hardware that could detect thumb touching index & middle fingertips for different commands.
Some cameras with mics can also tell thumb rubbing index finger (like the “money money” gesture) up or down, to turn volume up or dim the room. Not sure if that is still possible / distinguishable from 6 meters away.
That’s really useful to know — gesture duration and required precision are exactly the kind of details that are hard to anticipate without real-world input like this, especially since the context of use can influence things quite a bit. Keeping the interaction as low-effort and forgiving as possible is one of our core focuses.
When we’re ready to move to broader testing, we’d love to have you involved.
Great point — we did go through the literature on gesture ergonomics before settling on our set, aiming for gestures that are natural and unambiguous in a real home context. That said, real-world feedback beats papers every time, which is a big reason we’re sharing this now.
On the model side, we explored pretty much every available solution, but since everything runs fully local on low-power edge devices, off-the-shelf products and models weren’t an option because they are not suitable for the home domain (dark, long-range, complex poses) — we built a custom, heavily optimized model from scratch that honestly punches well above its weight class.
Thumb-rubbing at 6m is a cool idea but reliably detecting subtle finger contact at that range is genuinely hard — something to keep an eye on as hardware evolves!