Umlaut (Unicode?) in included common.h header does not render on display

Hi,

I just stumbled on some problem regarding Umlaut characters in included/header files.

Scenario:
I am using a common.h header file to define a function to translate the month names into German.
The relevant portion of the common.h:

std::string generateDateMonth(esphome::time::ESPTime time) {
  std::string months[12] = {"Januar", "Februar", "März", "April", "Mai", "Juni", "Juli", "August", "September", "Oktober", "November", "Dezember"};
  std::string month = months[atoi(time.strftime("%m").c_str()) - 1];
  return month;
}

As the eagle-eyed my have noticed, the month of “März” has an umlaut character ä in it.

Then within the main yaml for the ESPHome display, I do call the function as follows:

display:
  - platform: lilygo_t5_47_display
    lambda: |-
      auto time = id(ntp).now();
      std::string dateFormat = generateDateMonth(time);
      it.strftime(x, y+110, id(font_small), TextAlign::TOP_CENTER, dateFormat.c_str(), time);

I obviously included the umlaut ä in the glyphs statement of the font (and the ä will indeed render correctly on other instances of the same font).

font:
  - file:
      type: gfonts
      family: 'Open Sans'
      weight: bold
    id: font_small
    size: 40
    glyphs:
      '&@!,.?"%()+-_:°0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ abcdefghijklmnopqrstuvwxyzåÄäÖöÜü|/ß'

But on the Display this looks somethings like this:

M|rz

and the logger complains that:

Encountered character without representation in font: '\xe4'

So how can I carry over the umlaut from the common.h into the lambda function?

Thanks,

Thomas

Seems you found a bug in the implementation of the DisplayBuffer code. What is happening is that the glyphs are wrongly translated to UTF-8 while the function that does the lookup of each character in the string receives a “char” as input.

It’s a bit hard to understand when you’re not familiar with character encodings like ASCII, Extended ASCII (aka ISO 8859-1), UNICODE and UTF-8. The simplified version is: characters åÄäÖöÜüß are all defined in the Extended ASCII and can be represented as char (=single byte). In your case the ä has code 0xE4.

What is happening in the converted code is that those glyphs are converted to its UTF-8 representative (octal), for example ä is converted to \303\244 which are, from char perspective 2 bytes, namely 0xC3 and 0xA4. And that is why the code cannot find your E4 character in the list of glyphs.

Could not find a workaround for this. So, there is not much you can do about this. Can only suggest you to write a bug report in the ESPHome repository.

Excerpt of the generated code:

lilygo_t5_47_display->set_writer([=](display::DisplayBuffer & it) -> void {
      std::string months[12] = {"Januar", "Februar", "März", "April", "Mai", "Juni", "Juli", "August", "September", "Oktober", "November", "Dezember"};   <=== here the string is still ISO 8859-1 encoded

...

static const display::GlyphData display_glyphdata[] = {display::GlyphData{
    .a_char = " ",
    .data = uint8_t_2 + 0,
    .offset_x = 0,
    .offset_y = 43,
    .width = 10,
    .height = 0,
  }, display::GlyphData{
    .a_char = "!",              // <=== these are taken as-is
    .data = uint8_t_2 + 0,
    .offset_x = 0,
    .offset_y = 14,
    .width = 11,
    .height = 29,
...
  }, display::GlyphData{
    .a_char = "\303\244",       // <=== here's your ä, encoded in UTF-8 (octal). This should be: a_char = "ä";
    .data = uint8_t_2 + 7895,
    .offset_x = 0,
    .offset_y = 12,
    .width = 24,
    .height = 31,

For now would say: remove the “special” characters from the glyphs and change “März” to “Marz”.

1 Like

Thanks
Unfortunately, ,that went a little over my head.

The solution I found:

Opened the common.h file in my favorite Windows text editor, and set the Encoding from “ANSI” to “UTF-8”. Now the Umlaut works.

Just if anyone also has the problem without deep understanding in C.

Thomas

1 Like

Hey Thomas,

I have the same issue. I created a commen.h file and can include it, but what is exactly the line to switch from ANSI to UTF-8?

THX,
Volker

You change the encoding in the editor.
Example in Editor2 in Windows: