Non-ascii characters in names causes Json parser to fail

You would think that code page problems were a thing of the past, but a problem has appeared for me some time between 2022.6.7 and 2022.8.1.
HA 2022.8.1 as a fresh installation will run once, create the .homeassistant directory and process integrations. Web UI is operational. At the next startup, it will however find non-ascii characters in two Json files which it refuses to handle, and exits.

*UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf6 in position 5585: invalid start byte*
*2022-08-06 11:02:49.365 ERROR (SyncWorker_3) [homeassistant.util.json] Could not parse JSON content: /home/homeassistant/.homeassistant/.storage/core.entity_registry*
*Traceback (most recent call last):*
*  File "/srv/homeassistant/lib/python3.9/site-packages/homeassistant/util/json.py", line 39, in load_json*
*    return orjson.loads(fdesc.read())  # type: ignore[no-any-return]*
*  File "/usr/lib/python3.9/codecs.py", line 322, in decode*
*    (result, consumed) = self._buffer_decode(data, self.errors, final)*
*UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf6 in position 11166: invalid start byte*

This did not happen in the 2022.06.7 that I had before. I saw letters like å, ä, and ö in the web interface and it still worked after reboot.
The affected files are core.device_registry and core.entity_registry.
Sure, some of the names containing non-english characters are created outside of HA, e.g. in Google home, but the “degrees” character in ““unit_of_measurement”: “�C”” is from HA itself. This should definitely have been UTF-8.
Im running on a Re:Terminal using Raspbian GNU/Linux 11 (bullseye).
Any clues?

1 Like

Hi mansson,
I am brand new to HA, just trying to build my own system on a RasPI. I have experienced the same as you, if I use the website of HA from a browser with my domestic localization (Hungarian) even also the names of the zones or some default entities are automatically created with our national characters (á, é, í and so on) and saved to registry files by HA itself in wrong encoding causing a next start to fail.
Today I activated the ewelink integration to link some of my already working devices into HA, but it causes that a temperature sensor would be “corrected” automatically in registry file on every start of HA to contain a “degree” character in the unit_of_measurement parameter, and causing a next start to fail again.
I do not know what to do, I think, HA should handle this issue internally, because the config file is written by the HA itself, and have not been touched by anything else.
I hope, this issue will be solved, because the system is very impressive for me, I would appreciate to use it in my home.

Finally I had to modify sonoff/sensors.py (line 84) to eliminate degree characters from unit_of_measurement value.

self._attr_native_unit_of_measurement = ''.join( c for c in UNITS[self.uid] if c not in '\xb0')

However, I think, it is not a comfortable way for workaround, it should be made again if an update comes to sonoff interface.

This seems to be an issue created recently. You should open an issue with HA core to be fixed by the development team…

There was such an issue, and it was closed with the answer that the person should follow some Python rule and change locale settings to explicitly use UTF-8. That’s weird. So Debian is only for English-speaking people?
I changed to:
export LC_CTYPE=C.UTF-8
export PYTHONIOENCODING=utf-8:surrogateescape
export LC_ALL=C.UTF-8
export LANG=C.UTF-8
in /etc/profile, deleted .homeassistant and performed a complete reinstallation
But the error persists. I still get e.g. characters 0xE5 and 0xF6 in some json file the second time I try to start homeassistant.

This is still a problem. And you guys think you have it tough? Try being me, with lots of æøå in the names of things, like “Kjøkken” (kitchen)! :rofl: I had to remove all those, plus the degree sign and the ² character in the core entity and device registry files, and it finally starts. But shouldn’t this be fixed after so long?

I opened a GitHub issue with this: