Statistics: value_max of a sensor's state that's text?

Hey there, I’m looking to get the daily value_max of a weather sensor’s observation.

For example if it’s “Cloudy” 5 times, “Mostly Cloudy” 3 times and “Rain” 2 times in a day, I want to be able to have a sensor state “Cloudy” since that’s the most observed condition of the day.

Does this make sense? Unsure how to do that with the statistics sensor.

I ended up asking chatgpt about this one because I felt it was complex. It’s solution was to create a python script that uses the API at /api/history/period/<timestamp> and specify the time range to get the states. Then loop through each state and count them as unique values which gives you the most frequent state of the day. So far so good

I’m curious about the circumstance where it matters which description is “most observed” ? What if the description is “Cloudy” 25 times, with each instance is only 2 minutes long, and it is “Mostly Cloudy” 5 times, but for an hour each…? Is “Cloudy” truly the “most observed”?

My use for this is very edge-case.

My forecast data only updates once every 15 minutes, and historically the conditions itself change maybe hourly with those API calls.

I’m only curious about the conditions during the daylight hours and it’s going to an external mysql db alongside my solar panel readings for the day. Theoretically I can look back and be able to see if there’s a reason for low solar production - e.g. snow/cloudy/rain/etc. Certainly this isn’t scientific and is just for data logging purposes.

You’re right, using the data from the home assistant api, I could calculate the time delta between each state update (seen below) and select the one that has the largest delta across the values, but I’m not certain that’s needed - though an option for future.

[
  [
    {
      "state": "Cloudy",
      "last_changed": "2023-02-28T02:01:34.025693+00:00"
    },
    {
      "state": "Cloudy with Light Snow",
      "last_changed": "2023-02-28T02:33:02.979123+00:00"
    },
    {
      "state": "Partly Cloudy with Isolated Snow Showers",
      "last_changed": "2023-02-28T03:48:03.090232+00:00"
    },
    {
      "state": "Cloudy with Light Snow",
      "last_changed": "2023-02-28T05:03:03.723454+00:00"
    },
    {
      "state": "Partly Cloudy with Isolated Snow Showers",
      "last_changed": "2023-02-28T06:18:02.973284+00:00"
    },
    {
      "state": "Cloudy with Light Snow",
      "last_changed": "2023-02-28T09:18:03.045218+00:00"
    },
    {
      "state": "Cloudy",
      "last_changed": "2023-02-28T14:03:05.311217+00:00"
    },
    {
      "state": "Cloudy with Light Snow",
      "last_changed": "2023-02-28T14:33:06.193750+00:00"
    }
  ]
]
sensor:
  - platform: history_stats
    name: Cloudy percentage
    entity_id: sensor.weather_sensor
    state: "Cloudy"
    type: ratio

  - platform: history_stats
    name: Partly Cloudy with Isolated Snow Showers percentage
    entity_id: sensor.weather_sensor
    state: "Partly Cloudy with Isolated Snow Showers"
    type: ratio

…and so on, although you’d either need to know all possible values or set up an “other” sensor that is 100% minus the sum of the others.

This is a good idea too. Though I think there’s something like 40 observations, so having something like this be dynamic was originally what I was looking for