Entity metadata (like number gender or number) for localization or user-generated names

Very many languages have genders for nouns. This generates some issues for the people doing the localization, as you don’t have enough context in order to properly use adjectives in states. Here’s an example to make it easier to understand:

In English, “the light is on” and “the LEDs are on”, as neither “light”, nor “on” have genders, nor does “on” have a plural form.

In French, “la lampe” (fem.) is “allumée”, but “le LED” (masc.) is “allumé” - notice the missing “e” at the end, signaling that the adjective is in its masculine form.

For a group of lights, the degree of complexity doubles, as “les lampes” are “allumées” and “les LEDs” are “allumés”.

Of course, you can have an entity called “[la] lampe de la salle de bain” (the bathroom light) with a state of “allumé” (i.e. wrong gender) and, although it’s a grammatical issue, people will still be able to understand when checking out the entity in the interface. This, however, starts to badly degrade the user experience in Assist.

If we consider the plural, we have the same issue in English. Imagine having an entity called “Living room LEDs” (i.e. plural) and the following dialog with Assist:

  • Are the living room LEDs on?
  • No, the living room LEDs is off

The trouble is that entity names are user-generated and you can’t hard-code their genders or number anywhere inside the HA code, so the people who are trying to translate sentences and responses for Assist in languages which have more complex grammar than English, they have no other option than to do so with improper grammar. Or, in some edge cases, they face the same problem in English as well.

Here’s my proposal for a feature request which might solve this.

  1. Each language defines a set of taxonomies for entity names. For example, English might only define number with singular/plural possible values. French could, in addition, provide gender with masculine/feminine values. German could provide both number and gender, but for gender, it could also accept a neutral value. Languages might also define default values for each taxonomy, i.e. the most common gender or number - details below.

  2. For each entity, in the entity Advanced settings section, provide the user with the option to state that entity’s name’s gender (masculine/feminine/neutral/whatever else there is in other languages) and number (singular/plural) and any other taxonomies set for the language that the use has selected for their interface. If the user doesn’t attach a custom gender or number to an entity, then the default values for that number are automatically assigned. I would argue that the default values could be customized even further per each domain (e.g. lights might tend to be described using feminine nouns, whereas fans using masculine).

  3. In the localization tool, for any key that relates to an entity state or that directly points to a single entity, allow translators to specify translations for all combinations of taxonomies (e.g. masculine singular, masculine plural, feminine singular, feminine plural etc.).

  4. Whenever a translation of a state or other entity-related property must be localized and displayed/passed on to the next system, provide the entity grammatical properties set by the user (or defaults) in order to pick the corresponding translation of the state.

An example:

Let’s keep French as our language of choice for this. French would define (throughout HA) the following entity-level grammatical properties:

gender:
  name: Genre
  values:
    - name: Féminin
      value: feminine
    - name: Masculin
      value: masculine
  default:
    value: masculine
    domains:
      light: feminine
number:
  name: Nombre
  values:
    - name: Singulier
      value: singular
    - name: Pluriel
      value: plural
  default:
    value: singular

Here’s a proposed translation example for the on state of the light domain in French:

{entity_number, plural, 
    one {
        {entity_gender, select,
            feminine {allumée}
            masculine {allumé}
        }
    }
    other {
        {entity_gender, select,
            feminine {allumées}
            masculine {allumés}
        }
    }
}

Alternatively, there could simply be 4 keys in the translation tool for the on state for lights in French.

Now, let’s say we have 2 lights and we attach grammatical metadata using the UI:

- entity_id: light.living_room_table
  name: Lampe de table du salon
  grammatical_metadata:
    gender: feminine
    number: singular
- entity_id: light.living_room_leds
  name: LEDs du salon
  grammatical_metadata:
    gender: masculine
    number: plural

Later edit: Given good enough grammar engines, this information could be added automatically to each entity name/alias.

Now, in the interface or in Assist, we can correctly show that the state of Lampe de table du salon is allumée and LEDs du salon are allumés.

Disclaimer: I apologize if I’ve messed up the French words above, I speak far better English than French, but I’m hoping French is common enough to make myself understood. If I were to provide the examples in my native language, the issue I’m trying to address would have been harder to convey, simply due to the language barrier. If any French speaker would like to correct me, please send a PM.

First of all, according to the new style guide the name should bot be returned in the response. So the response should be something like No, it's on
However that doesn’t solve the issue of course, especially when gender is so important. For those languages users might expect the translated equivalent of No, he's on or No, she's on, and for your example they might expect No, they're on

The easiest way to solve the gender issue might be to use No, that {{ state.domain }} is on so the reply will be No, that light is on That doesn’t solve the singular/plural thing though.

I think the solution you propose could work, but I don’t think users will add that for all their entities. But that’s just my point of view as Dutch language leader, where the gender issue isn’t a big issue at all.

I’m not expecting all users to add these pieces of metadata to all of their entities, that would be very unlikely.

However, the underlying issue is that it would be very difficult to analyze user-generated content (entity names) for grammatical properties, especially if it’s not written properly (no special characters etc.), so HA can do its best with the information it has. If you, the user, want a better experience, help HA out to provide better copy.

Later edit: What I have proposed is just a backbone, if you will. In the future, I would like to see HA try to tap into entity renaming events to pull the new name and see if it can figure out a gender/number/whatever for it. Users can override these auto-detected values, but with a good detector, we would have 90% of the issue solved.

2 Likes

For now we could at least have the gender setting in person section. It would be much nicer to see messages as:

Jana opustila zonu XX (Jane left zone XX)
Michal vstupil do zony YY (Michael entered zone YY)

Than

Jana opustil(a) zonu XX
Michal vstupil(a) do zony YY
1 Like