What charset for external database?

My searching doesn’t reveal any informative hints.

I want to set up my QNAP NAS with a database for all long term storage of data from HA. Like here- https://www.paolotagliaferri.com/home-assistant-data-persistence-and-visualization-with-grafana-and-influxdb/
What character set would be the best to select for broadest and most straightforward use? utf8mb4_unicode_ci? utf8mb4_general_ci? Some other? The context is North American English.

Thanks.

Q

to be precise values you mentioned are not character sets. Those are collations (which imply related charsets though).

Collations define how specific characters are ordered in relation to others. For example are characters with diacritics just after corresponding letter without diacritics, are placed after z character or interpreted the same as origin character.

Because sorting is nothing else than comparing, collation is also used for comparison strings. for example abč might be or might be not equal to abc depending on chosen collation.

For general purposes I would chose unicode collation which places national characters just after their origin (č after c).

BTW collation might be changed in runtime, by forcing it using special SQL syntax. However in such case performance might be impacted due to the fact that indexes are being created for one chosen (default) collation. So when forcing different collation then used for indexes, they cannot be used.

1 Like

Thanks maxym.

One step farther…