# Home LLM Leaderboard
| Model | assist (n=80) | assist-mini (n=49) | intents (n=165) |
| --- | --- | --- | --- |
| gemini-1.5-flash | 91.2% (CI: 6.2%, 2024.6.3) | 98.0% (CI: 4.0%, 2024.8.0dev) | 63.0% (CI: 7.4%, 2024.8.0b) |
| gpt-4o-mini | 90.0% (CI: 6.6%, 2024.8.0b) | 98.0% (CI: 4.0%, 2024.8.0dev) | 63.6% (CI: 7.3%, 2024.8.0b) |
| claude-3-haiku | 88.2% (CI: 10.8%, 2024.9.0b2) | 98.0% (CI: 4.0%, 2024.9.0b2) | |
| gpt-4o | 87.5% (CI: 7.2%, 2024.6.3) | | 81.2% (CI: 6.0%, 2024.6.3) |
| qwen2.5 | 81.2% (CI: 8.6%, 2024.9.2) | 85.7% (CI: 9.8%, 2024.9.2) | |
| gpt-3.5 | 75.0% (CI: 9.5%, 2024.6.3) | | 67.9% (CI: 7.1%, 2024.6.3) |
| assist-llm | 67.5% (CI: 10.3%, 2024.9.0dev) | 81.6% (CI: 10.8%, 2024.9.0dev) | |
| llama3.1 | 66.2% (CI: 10.4%, 2024.9.0dev) | 83.7% (CI: 10.3%, 2024.8.0b0) | 43.6% (CI: 7.6%, 2024.9.0dev) |
| functionary-small-v2.5 | 56.2% (CI: 10.9%, 2024.7.0) | 63.3% (CI: 13.5%, 2024.8.0dev) | 37.6% (CI: 7.4%, 2024.6.3) |
| xlam-7b | 51.2% (CI: 11.0%, 2024.9.0dev) | 85.7% (CI: 9.8%, 2024.8.0b0) | |
| home-llm | 45.0% (CI: 10.9%, 2024.6.3) | 34.7% (CI: 13.3%, 2024.8.0dev) | 25.5% (CI: 6.6%, 2024.6.3) |
| assistant | 37.5% (CI: 10.6%, 2024.6.3) | 63.3% (CI: 13.5%, 2024.8.0dev) | 98.8% (CI: 1.7%, 2024.6.3) |
| llama3-groq-tool-use | 20.0% (CI: 8.8%, 2024.8.0b) | 51.0% (CI: 14.0%, 2024.8.0b0) | 11.5% (CI: 4.9%, 2024.8.0b) |
| mistral-v3 | 3.8% (CI: 4.2%, 2024.8.0b) | 2.0% (CI: 4.0%, 2024.8.0dev) | 10.3% (CI: 4.6%, 2024.8.0b) |
| mistral-nemo | | 81.6% (CI: 10.8%, 2024.9.2) | |
| xlam-1b | | 27.1% (CI: 12.6%, 2024.8.0b0) | |
| claude-3-5-sonnet | | 95.9% (CI: 5.5%, 2024.9.0b2) | |
This file has been truncated. show original