Gemini Pro API

I turned it into an “anything-sensor” yesterday, as proof of concept.

Looks like the update will be available in 2024.1! I found it in the beta release notes.

I also stumbled across this inspiration for the future of LLMs controlling Home Assistant!

So far no luck with gemini-pro in 2024.1.0. I tried removing and re-configuring the integration. Do I need a new API key? Unsure what I’m doing wrong.

The PRs haven’t been merged yet. Hopefully they will make it in the next monthly release.

1 Like

@tronikos Can we pull your repo from git via HACS to get your fixes early?

Also I’m guessing @User87 wants function calling like the OpenAI conversation agent to control and send commands. That would be amazing to not have to pay for credits on OpenAI.

1 Like

You can but it’s much easier to wait. It will be included in the 2024.2 release.

1 Like

You mean 2024.2 of course :slight_smile:

One question though - Gemini Pro is not directly available in my country yet. Would it be possible to use Vertex AI API with Gemini Pro as a model instead - described here: https://ai.google.dev/examples?hl=en&keywords=vertexai

Yes I meant 2024.2. Fixed.

I don’t think it will work with Vertex AI API without significant changes that I don’t plan on making them.

It’s a pity, because the direct version does not work almost anywhere in Europe and Vertex AI version would work.

Is it working now in Europe? According to Google Bard update: Image generation and Gemini Pro adds more languages it’s supposed to be globally available since February 1.

I have the service working perfect in dev tools/services. i cant figure out how to store the response to something i can use? Any chance you could provide an example?

Here is a script that takes a text prompt, camera to take a snapshot, and media player to play back the response:

alias: Gemini Pro Vision
fields:
  prompt:
    selector:
      text:
        multiline: true
    name: prompt
  media_player:
    selector:
      entity:
        filter:
          - domain: media_player
    name: media player
  camera:
    selector:
      entity:
        filter:
          - domain: camera
    name: camera
sequence:
  - service: camera.snapshot
    data:
      entity_id: "{{ camera }}"
      filename: /media/snapshot.jpg
  - service: google_generative_ai_conversation.generate_content
    data:
      prompt: "{{ prompt }}"
      image_filename: /media/snapshot.jpg
    response_variable: content
  - service: tts.speak
    target:
      entity_id: tts.piper
    data:
      media_player_entity_id: {{ media_player }}
      message: "{{ content.text }}"
      cache: false
  - variables:
      content: "{{ content }}"
  - stop: end
    response_variable: content
mode: single
icon: mdi:message-image

No, it does not work, at least the API version - still getting an error for the location of the user.

yea, also I can’t use the nest api as an image_filename: either. I don’t totally understand response_variable: or blueprints either. I think what you provided is a blueprint? This is what I came up with as I believe I am also going to have to store the thumbnail in a response_variable: ?

- id: 'c12'
  alias: Doorbell Camera Snapshot Notification
  trigger:
    platform: device
    device_id: feb17d26775a5xxxxxxxxxxxxxxxxx
    domain: nest
    type: camera_person
  action:
    - service_template: >-
        {%- if is_state('input_boolean.home', 'off') or
               not is_state('device_tracker.iphone', 'home') and
               is_state('sensor.ipad_ssid', 'wirelessfun') -%}
              notify.mobile_app_iphone
        {% else %}
              notify.ios
        {% endif %}
      data:
        message: Person Detected at the Front Door.
        data:
          image: >-
            /api/nest/event_media/{{ trigger.event.data.device_id }}/{{ trigger.event.data.nest_event_id }}/thumbnail
	    response_variable: thumbnail
    - service: google_generative_ai_conversation.generate_content
      data:
        prompt: "Very briefly describe what you see in this image from my doorbell camera. Your message needs to be short enough to fit in a phone notification. Do not describe stationary objects or buildings."
        image_filename: {{ thumbnail}}
      response_variable: content
    - service: tts.speak
      target:
        entity_id: tts.piper
      data:
        media_player_entity_id: media_player.google_mini
        message: "{{ content.text }}"
        cache: false
    - variables:
        content: "{{ content}}"
    - stop: end
      response_variable: content    
  mode: queued

@tronikos I have no idea, how you coded the integration, but I wonder, if it would be possible to specify the endpoint for the integration manually? It seems, european endpoint blocks the integration at the moment, but US one should work: python - Google Generative AI API error: "User location is not supported for the API use." - Stack Overflow

I am having the same problem…

There’s both a language AND country list there. Even though one’s language may be on the list of supported languages, not all regions are supported. I’m Dutch and thus live in The Netherlands. Dutch is supported, but th region Netherlands is not :frowning:

1 Like

I’ve looked around, but I’m finding trouble getting tips on configuring the prompt template:

This smart home is controlled by Home Assistant.

An overview of the areas and the devices in this smart home:
{%- for area in areas() %}
  {%- set area_info = namespace(printed=false) %}
  {%- for device in area_devices(area) -%}
    {%- if not device_attr(device, "disabled_by") and not device_attr(device, "entry_type") and device_attr(device, "name") %}
      {%- if not area_info.printed %}

{{ area_name(area) }}:
        {%- set area_info.printed = true %}
      {%- endif %}
- {{ device_attr(device, "name") }}{% if device_attr(device, "model") and (device_attr(device, "model") | string) not in (device_attr(device, "name") | string) %} ({{ device_attr(device, "model") }}){% endif %}
    {%- endif %}
  {%- endfor %}
{%- endfor %}

Answer the user's questions about the world truthfully.

If the user wants to control a device, reject the request and suggest using the Home Assistant app.

What is the syntax for supplying the integration with more than one area, entity, device, device_attr? If someone would be willing to share an example of their template, that would help me lots!

TIA

I’ve used this very successfully to send informative, time-saving notifications. The next big thing would be to be able to use the responses as conditiones.
One example would be to disarm the alarm system when no cars are in the garage, and arm it when at least one car arrives.
Has anyone figured out this part?

You may need to create a helper to store the garage car count. I wouldn’t be terribly surprised if you would be able to count the number of cars with frigate alone - not using Gemini. If you did want to use Gemini, then you would run the automation with a time frequency trigger and store it as a helper value for use as a condition of another automation.

1 Like