Control of Nvidia graphics card parameters

I made myself sensors to track the parameters of the nvidia graphics card, which I use in conjunction with jellyfin in the openmediavault operating system. I want to share these sensors. Maybe it will be useful to someone.

  - platform: command_line
    name: 'OMV_HA graphics card Temp'
    command: "ssh -i /config/id_rsa -o StrictHostKeyChecking=no [email protected] -t 'nvidia-smi --query-gpu=temperature.gpu --format=csv,noheader,nounits'"
    unit_of_measurement: '°C'
    scan_interval: 30
    
  - platform: command_line
    name: 'OMV_HA graphics card Load'
    command: "ssh -i /config/id_rsa -o StrictHostKeyChecking=no [email protected] -t 'nvidia-smi --query-gpu=utilization.gpu --format=csv,noheader,nounits'"
    unit_of_measurement: '%'
    scan_interval: 29
    
  - platform: command_line
    name: 'OMV_HA graphics card used RAM'
    command: "ssh -i /config/id_rsa -o StrictHostKeyChecking=no [email protected] -t 'nvidia-smi --query-gpu=memory.used --format=csv,noheader,nounits'"
    unit_of_measurement: 'MiB'
    scan_interval: 28
    
  - platform: command_line
    name: 'OMV_HA graphics card free RAM'
    command: "ssh -i /config/id_rsa -o StrictHostKeyChecking=no [email protected] -t 'nvidia-smi --query-gpu=memory.free --format=csv,noheader,nounits'"
    unit_of_measurement: 'MiB'
    scan_interval: 27
    
  - platform: command_line
    name: 'OMV_HA graphics driver version'
    command: "ssh -i /config/id_rsa -o StrictHostKeyChecking=no [email protected] -t 'nvidia-smi --query-gpu=driver_version --format=csv,noheader'"
    scan_interval: 26
    
  - platform: command_line
    name: 'OMV_HA graphics card GPU FAN'
    command: "ssh -i /config/id_rsa -o StrictHostKeyChecking=no [email protected] -t 'nvidia-smi --query-gpu=fan.speed --format=csv,noheader,nounits'"
    unit_of_measurement: '%'
    scan_interval: 25
    
  - platform: command_line
    name: 'OMV_HA graphics card GPU clock'
    command: "ssh -i /config/id_rsa -o StrictHostKeyChecking=no [email protected] -t 'nvidia-smi --query-gpu=clocks.current.graphics --format=csv,noheader,nounits'"
    unit_of_measurement: 'MHz'
    scan_interval: 24
    
  - platform: command_line
    name: 'OMV_HA graphics card GPU clock MAX'
    command: "ssh -i /config/id_rsa -o StrictHostKeyChecking=no [email protected] -t 'nvidia-smi --query-gpu=clocks.max.graphics --format=csv,noheader,nounits'"
    unit_of_measurement: 'MHz'
    scan_interval: 23
    
  - platform: command_line
    name: 'OMV_HA graphics card used process'
    command: "ssh -i /config/id_rsa -o StrictHostKeyChecking=no [email protected] -t 'nvidia-smi --query-compute-apps=name,used_memory --format=csv'"
    scan_interval: 22
    
  - platform: command_line
    name: 'OMV_HA graphics card perf'
    command: "ssh -i /config/id_rsa -o StrictHostKeyChecking=no [email protected] -t 'nvidia-smi --query-gpu=pstate --format=csv,noheader'"
    scan_interval: 300
    
  - platform: command_line
    name: 'OMV_HA graphics card name'
    command: "ssh -i /config/id_rsa -o StrictHostKeyChecking=no [email protected] -t 'nvidia-smi --list-gpus'"
    scan_interval: 300

How to create and copy ssh keys can be viewed here: How to monitor Proxmox CPU temp

Thank you for sharing! I have been searching for it for months!

However, I have adopted a different approach. Instead of executing multiple queries with varying intervals, I modified the syntax of ‘nvidia-smi’ to retrieve multiple columns without headers. Then, I utilized a single command-line sensor to store the data in JSON format.

- sensor:
      name: 'GPU Data'
      command: "ssh -o UserKnownHostsFile=/config/.ssh/known_hosts username@IPAddress -i /config/.ssh/id_rsa 'nvidia-smi --query-gpu=power.draw,temperature.gpu,utilization.gpu,utilization.memory --format=csv,noheader,nounits'"
      scan_interval: 30
      command_timeout: 10
      value_template: >-
        {% set lines = value.split('\n') %}
        {% set values = lines[0].split(',') %}
        {{
          {
            "gpu_power_draw": values[0]         | trim,
            "gpu_temperature": values[1]        | trim,
            "gpu_utilization": values[2]        | trim,
            "gpu_memory_utilization": values[3] | trim,
          } | to_json
        }}

Then, I proceeded to create a sensor template for each JSON value:

- name: "GPU Power"
      unique_id: gpu_power_draw
      unit_of_measurement: W
      state_class: measurement
      state: >-
       {{ ((states('sensor.gpu_data') | from_json).gpu_power_draw) | round(2) }}

    - name: "GPU Temperature"
      unique_id: gpu_temperature
      unit_of_measurement: °C
      state_class: measurement
      state: >-
        {{ ((states('sensor.gpu_data') | from_json).gpu_temperature) | round(2) }}

    - name: "GPU Utilization"
      unique_id: gpu_utilization
      unit_of_measurement: "%"
      state_class: measurement
      state: >-
        {{ ((states('sensor.gpu_data') | from_json).gpu_utilization) | round(2) }}

    - name: "GPU Memory Utilization"
      unique_id: gpu_memory_utilization
      unit_of_measurement: "%"
      state_class: measurement
      state: >-
        {{ ((states('sensor.gpu_data') | from_json).gpu_memory_utilization) | round(2) }}
1 Like

Mi permetto di partecipare alla discussione perché ho trovato una strada alternativa.
Premetto che uso HA su docker quindi ho la possibilità di far girare altri container.
Uno di questi è sensors2mqtt

che con la seguente configurazione compose permette di leggere alcuni dati dale schede nvidia.

sensors2mqtt:
    container_name: sensors2mqtt
    privileged: true
    restart: unless-stopped
    image: kevinpdavid/sensors2mqtt:main
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              capabilities:
                - gpu
    environment:
      - MQTT_URL=mqtt://mqttbroker_ip:1883
      - MQTT_TOPIC=sensors2mqtt #this is the default
      - INTERVAL=180000 # in milliseconds
      - MQTT_USERNAME=your_user
      - MQTT_PASSWORD=your_password
     ports:
      - "9229:9229" # node inspector