Ollama+Qwen2.5:7B tool calls to control Home Assistant

NIUB · September 22, 2024, 2:03pm

In the Chinese language environment, Qwen2.5 surpasses Llama3.1 in both accuracy and speed. Even if my pronunciation is incorrect or the device name is incomplete, LLM will still accurately control it.
The GPU I am using is RTX3060, and if it were RTx4090, the speed would be even faster.
Tool calls will add icing on the cake to the big model.

github.com

tannisroot/home-assistant-datasets/blob/qwen2.5/reports/README.md

# Home LLM Leaderboard
| Model | assist (n=80) | assist-mini (n=49) | intents (n=165) |
| --- | --- | --- | --- |
| gemini-1.5-flash | 91.2% (CI:&nbsp;6.2%, 2024.6.3) | 98.0% (CI:&nbsp;4.0%, 2024.8.0dev) | 63.0% (CI:&nbsp;7.4%, 2024.8.0b) |
| gpt-4o-mini | 90.0% (CI:&nbsp;6.6%, 2024.8.0b) | 98.0% (CI:&nbsp;4.0%, 2024.8.0dev) | 63.6% (CI:&nbsp;7.3%, 2024.8.0b) |
| claude-3-haiku | 88.2% (CI:&nbsp;10.8%, 2024.9.0b2) | 98.0% (CI:&nbsp;4.0%, 2024.9.0b2) |  |
| gpt-4o | 87.5% (CI:&nbsp;7.2%, 2024.6.3) |  | 81.2% (CI:&nbsp;6.0%, 2024.6.3) |
| qwen2.5 | 81.2% (CI:&nbsp;8.6%, 2024.9.2) | 85.7% (CI:&nbsp;9.8%, 2024.9.2) |  |
| gpt-3.5 | 75.0% (CI:&nbsp;9.5%, 2024.6.3) |  | 67.9% (CI:&nbsp;7.1%, 2024.6.3) |
| assist-llm | 67.5% (CI:&nbsp;10.3%, 2024.9.0dev) | 81.6% (CI:&nbsp;10.8%, 2024.9.0dev) |  |
| llama3.1 | 66.2% (CI:&nbsp;10.4%, 2024.9.0dev) | 83.7% (CI:&nbsp;10.3%, 2024.8.0b0) | 43.6% (CI:&nbsp;7.6%, 2024.9.0dev) |
| functionary-small-v2.5 | 56.2% (CI:&nbsp;10.9%, 2024.7.0) | 63.3% (CI:&nbsp;13.5%, 2024.8.0dev) | 37.6% (CI:&nbsp;7.4%, 2024.6.3) |
| xlam-7b | 51.2% (CI:&nbsp;11.0%, 2024.9.0dev) | 85.7% (CI:&nbsp;9.8%, 2024.8.0b0) |  |
| home-llm | 45.0% (CI:&nbsp;10.9%, 2024.6.3) | 34.7% (CI:&nbsp;13.3%, 2024.8.0dev) | 25.5% (CI:&nbsp;6.6%, 2024.6.3) |
| assistant | 37.5% (CI:&nbsp;10.6%, 2024.6.3) | 63.3% (CI:&nbsp;13.5%, 2024.8.0dev) | 98.8% (CI:&nbsp;1.7%, 2024.6.3) |
| llama3-groq-tool-use | 20.0% (CI:&nbsp;8.8%, 2024.8.0b) | 51.0% (CI:&nbsp;14.0%, 2024.8.0b0) | 11.5% (CI:&nbsp;4.9%, 2024.8.0b) |
| mistral-v3 | 3.8% (CI:&nbsp;4.2%, 2024.8.0b) | 2.0% (CI:&nbsp;4.0%, 2024.8.0dev) | 10.3% (CI:&nbsp;4.6%, 2024.8.0b) |
| mistral-nemo |  | 81.6% (CI:&nbsp;10.8%, 2024.9.2) |  |
| xlam-1b |  | 27.1% (CI:&nbsp;12.6%, 2024.8.0b0) |  |
| claude-3-5-sonnet |  | 95.9% (CI:&nbsp;5.5%, 2024.9.0b2) |  |

This file has been truncated. show original

Here is my test video: