Gemini Pro API

User87 · December 13, 2023, 6:05pm

Has anyone had success in calling the new Google AI Studio (previously MakerSuite) Gemini API?

I would like to use the gemini-pro LLM model. At presently I can only seem to get the integration to use the PaLM Bison model.

Configuring the Google Generative AI Conversation Integration, I am unable to use the gemini model

models/gemini-pro

but can still use the bison model

models/chat-bison-001

The following is the error message:

Sorry, I had a problem talking to Google Generative AI: 404
models/gemini-pro is not found for API version v1beta2, or is not supported for generateMessage. Call ListModels to see the list of available models and their supported methods.

Do I need to generate a new API key?

tronikos · December 13, 2023, 10:01pm

I’m the author of the Google Generative AI Conversation integration. It requires a few small changes to switch to Gemini Pro. I also plan on adding support for Gemini Pro Vision for the next core release.

User87 · December 13, 2023, 11:13pm

Wonderful! Thank you for the fantastic integration. I look forward to the upgraded support.

bloody2k · December 14, 2023, 10:07pm

Is it possible to male a wish for function calling, like it is used in extended openai conversation. So it is possible the control entities?

tronikos · December 15, 2023, 9:01am

Google Generative AI: Add support for Gemini Pro by tronikos · Pull Request #105787 · home-assistant/core · GitHub fixes this.

Google Generative AI: Add a service for prompts consisting of text and images using Gemini Pro Vision by tronikos · Pull Request #105789 · home-assistant/core · GitHub adds support for Gemini Pro Vision.

@bloody2k I don’t have such plans

kennydd001 · December 21, 2023, 6:40am

I turned it into an “anything-sensor” yesterday, as proof of concept.

User87 · January 2, 2024, 2:16am

Looks like the update will be available in 2024.1! I found it in the beta release notes.

I also stumbled across this inspiration for the future of LLMs controlling Home Assistant!

User87 · January 3, 2024, 10:15pm

So far no luck with gemini-pro in 2024.1.0. I tried removing and re-configuring the integration. Do I need a new API key? Unsure what I’m doing wrong.

tronikos · January 6, 2024, 8:48am

The PRs haven’t been merged yet. Hopefully they will make it in the next monthly release.

neebski · January 12, 2024, 6:22am

@tronikos Can we pull your repo from git via HACS to get your fixes early?

Also I’m guessing @User87 wants function calling like the OpenAI conversation agent to control and send commands. That would be amazing to not have to pay for credits on OpenAI.

tronikos · January 12, 2024, 7:25am

You can but it’s much easier to wait. It will be included in the 2024.2 release.

mkotek · January 25, 2024, 10:29pm

You mean 2024.2 of course

One question though - Gemini Pro is not directly available in my country yet. Would it be possible to use Vertex AI API with Gemini Pro as a model instead - described here: https://ai.google.dev/examples?hl=en&keywords=vertexai

tronikos · January 26, 2024, 8:28pm

Yes I meant 2024.2. Fixed.

I don’t think it will work with Vertex AI API without significant changes that I don’t plan on making them.

mkotek · January 30, 2024, 10:58pm

It’s a pity, because the direct version does not work almost anywhere in Europe and Vertex AI version would work.

tronikos · February 4, 2024, 9:28am

Is it working now in Europe? According to Google Bard update: Image generation and Gemini Pro adds more languages it’s supposed to be globally available since February 1.

pdobrien3 · February 7, 2024, 11:16am

I have the service working perfect in dev tools/services. i cant figure out how to store the response to something i can use? Any chance you could provide an example?

tronikos · February 7, 2024, 10:20pm

Here is a script that takes a text prompt, camera to take a snapshot, and media player to play back the response:

alias: Gemini Pro Vision
fields:
  prompt:
    selector:
      text:
        multiline: true
    name: prompt
  media_player:
    selector:
      entity:
        filter:
          - domain: media_player
    name: media player
  camera:
    selector:
      entity:
        filter:
          - domain: camera
    name: camera
sequence:
  - service: camera.snapshot
    data:
      entity_id: "{{ camera }}"
      filename: /media/snapshot.jpg
  - service: google_generative_ai_conversation.generate_content
    data:
      prompt: "{{ prompt }}"
      image_filename: /media/snapshot.jpg
    response_variable: content
  - service: tts.speak
    target:
      entity_id: tts.piper
    data:
      media_player_entity_id: {{ media_player }}
      message: "{{ content.text }}"
      cache: false
  - variables:
      content: "{{ content }}"
  - stop: end
    response_variable: content
mode: single
icon: mdi:message-image

mkotek · February 8, 2024, 12:09am

No, it does not work, at least the API version - still getting an error for the location of the user.

pdobrien3 · February 9, 2024, 12:35pm

yea, also I can’t use the nest api as an image_filename: either. I don’t totally understand response_variable: or blueprints either. I think what you provided is a blueprint? This is what I came up with as I believe I am also going to have to store the thumbnail in a response_variable: ?

- id: 'c12'
  alias: Doorbell Camera Snapshot Notification
  trigger:
    platform: device
    device_id: feb17d26775a5xxxxxxxxxxxxxxxxx
    domain: nest
    type: camera_person
  action:
    - service_template: >-
        {%- if is_state('input_boolean.home', 'off') or
               not is_state('device_tracker.iphone', 'home') and
               is_state('sensor.ipad_ssid', 'wirelessfun') -%}
              notify.mobile_app_iphone
        {% else %}
              notify.ios
        {% endif %}
      data:
        message: Person Detected at the Front Door.
        data:
          image: >-
            /api/nest/event_media/{{ trigger.event.data.device_id }}/{{ trigger.event.data.nest_event_id }}/thumbnail
	    response_variable: thumbnail
    - service: google_generative_ai_conversation.generate_content
      data:
        prompt: "Very briefly describe what you see in this image from my doorbell camera. Your message needs to be short enough to fit in a phone notification. Do not describe stationary objects or buildings."
        image_filename: {{ thumbnail}}
      response_variable: content
    - service: tts.speak
      target:
        entity_id: tts.piper
      data:
        media_player_entity_id: media_player.google_mini
        message: "{{ content.text }}"
        cache: false
    - variables:
        content: "{{ content}}"
    - stop: end
      response_variable: content    
  mode: queued

mkotek · February 10, 2024, 11:04am

@tronikos I have no idea, how you coded the integration, but I wonder, if it would be possible to specify the endpoint for the integration manually? It seems, european endpoint blocks the integration at the moment, but US one should work: python - Google Generative AI API error: "User location is not supported for the API use." - Stack Overflow