I’ve spent a couple of hours with Horizon Beta. I did not ask it general questions that require knowledge (sorry, no penguins). What I did do was use the 100% standard HA prompt, supply it with over 200 selected entities and asked al sorts of things about that. And that was where I believe it shines.
Things that stood out: I could ask if our house was safe and I got a good answer. It mentioned the dishwasher door was open but said that this was just for info, it because was not a problem. It knew only doors in the home were relevant, unless I talked about the car. Then it knew about the car doors too. It grasped the difference about indoor and outdoor: it noted having lights on outside helped for safety too.
When I asked about air quality, I got the right info too. It knows what CO2 levels are safe. When I asked about allergies, it used data from pollen sensors (as well as provide some general info).
I could ask numbers of doors and average temperatures. It did not matter that not all my door and window sensors are not the typical binary sensors but sensors (they distinguish smal openings or wide open). I could ask for the hottest room. It knew a room had multiple temperature sensors and provided a range. When I asked it about open blinds, and told it to consider anything below 15 percent open as closed, it did that. It knew 15% open is 85% closed.
I have sensors with day counts to a birthday. It not only understood it enough to say who has a birthday next month - it also knew to add the days to today to reconstruct the actual date. It knew it was unable to tell age without the year of birth, so it asked for that.
Mind you: I did no crafty prompting, did not supply any additional tools. This is what a regular user would get without particular AI knowledge.
So my guess is it is geared toward tool use, math and reasoning more than it is about ready knowledge. Which is fine for HA. If the penguin question was asked through HA, I would not be surprised if it took more time because HA did not have penguin sensors and it preferred to use the tools.
In those couple of hours, it never hallucinated or gave wrong answers. I did hit a rate limit. Then I stopped.
As for speed: With only very few entities answers were almost instant. With lots of entities I got answers in 1 or 2 seconds, the wait was not in any way a hindrance. but I have no clue what hardware or model size was behind it all.