Multiple DS18B20 dallas.sensor: Scratch pad checksum invalid!

Did some improvements yesterday.

I did some changes for the reset pulse in the original code and now I’m strictly following the Dallas specification as good as I can. I have not yet started to use my home-made timings, this code just takes a different approach and verifies that each state in the timing is done.

To stress the environments I added capacitors over Data-VCC and Data-Gnd to simulate long cables. Adding a 1 nF capacitance adds quite a good rise-time of ~14 us.

Using the original code the sensor was not identified and no ROM code given. But using my new code the ROM code was identified :slight_smile:.
Still problem in reading temperatures but I will look into that as well.

But high rise-times is just one of the problems we can face, having reflections is another problem so I will look into simulating reflections later. I also need to get a ESP32 chip to make sure it works for those as well since I could see in the code that the developer had made some comments about low level code differences between ESP32 and ESP8266 etc

1 Like

Thank you for taking your time to dig into this :slight_smile:

I found this thread yesterday when I discovered that a freezer that I am controlling with a Sonoff THD16D, flashed with ESPHome, no longer reported any temperature, and thus not controlling the temperature in the freezer any more. And in the logs I could see the “Scratch pad checksum invalid!” error being logged, and that search led me here.

The device is basically this one, but with a display, and is based on esp32. The cable to the Dallas sensor is 50 cm long.

Just performing a restart of the device did nothing to solve the problem. But I managed to “restore” it by re-flashing it with the same FW that it was already using.
So I can no longer reproduce the fault. But it will probably re-surface anytime in the future, unless something is changed in EspHome.

Details about my device, for reference

  • ESPHome: 2023.2.4
  • Board type: nodemcu-32s
  • Connected to WiFi
  • Running the Climate component to control the built-in relay
  • Working flawlessly from ~March -23 until 24/11 -23 when it stopped reading the temperature

I’m to new into this to understand how a re-flash could help, but was it just a re-flash or did it also include a build where you probably would have a lot of libraries updated, including the Dallas component

It was a re-flash with the same version of ESPHome. I use the ESPHome docker container to manage, and build, all the FW’s for my devices. I find it convenient because I can access it from anywhere, and I will always have the same ESPHome version. So the code was re-compiled, but the SW version of ESPHome, and the yaml-file defining the content, was the same as what was used before, that stopped working.

I too find it strange that a re-flash could help. But then, I do not know how the internal ESPHome code looks like. I only know it from a user perspective.

You can test my version now if you want, I have attached the code needed at the end.

If it’s better than the original code… Well at the moment I can’t tell since also the original code now works even though I stress the bus with capacitors.

It’s kind on your own risk and I have only tested this on a 8266. If things goes wrong (as it did for me a couple of times when I created an internal loop :slight_smile: ), you can be enforced to flash using a cable.

So what have I done? The original code followed a best practice document from Maxim: 1-Wire Communication Through Software | Analog Devices. It’s probably a good document but… It also says that: “The system must be capable of generating an accurate and repeatable 1µs delay for standard speed and 0.25µs delay for overdrive speed”. The problem I have seen is that we don’t get a 1us accuracy in timing in the ESPHome platform and hence it’s not that reliable to follow this document. In the Arduino documentation they they say this: “This function works very accurately in the range 3 microseconds and up to 16383. We cannot assure that delayMicroseconds will perform precisely for smaller delay-times

So instead of timing exactly when I shall read and write I have tried to follow the specification from Analog devices: DS18B20 - Programmable Resolution 1-Wire Digital Thermometer (analog.com). And here I have used another approach, I listen on the bus to determine when it has reached a specific state. I have also assumed that the 1-wire devices are very fast in reacting on slope changes from the master so there will be no delays until a device drives the bus low if it shall be low and vice verse.
I know my approach might be sensitive for reflections on the bus but these can easily be handled with some micro delays (repeats) in the code.

For the specific bus commands, this is what I have done:

  • For reading I just checked that the ESP has driven the pin low and then check at what time the sensor has released the bus to tri-state. If the time is more than 15us I consider it to be a 0, if the bus is released to tri-state earlier than 15us I consider it to be a 1.
    image

  • For the reset-pulse I did kind of the same. After pulling the bus low in 480us I released the bus into tri-state and waited until I could read the bus as high. I then check if the bus is low again within a time frame of 240us from when I detected that the bus was high. If it was low within that time-frame there’s at least one sensor present. I then check that the sensor release the bus into tri-state and wait out the second 480us period

  • For the write_bit()…

Since it’s my first C++ work in 15 years the code could be better and there are still things to fix. And I haven’t done the write-bit part yet either.

Other things I done is to remove some none needed instructions and I have also moved the interrupt protection part of the code from a high level down to just the 3 bus control functions read_bit(), write_bit(), and reset().

Next:

  • Listen on the sensors to see when they have done the temperature conversion instead of using the pre-defined time as of now
  • Confirm that all sensors are in a none parasitic mode.
  • Try to find a ESP32 I can test this on. I have one but haven’t been able to flash it.
  • Understand if there’s a more accurate way to measure time or if a timer or HW interrupt could be used instead.
  • Create a test setup introducing reflections. Will probably do this with a 100m cable and some high-ohm and low-ohm terminators. This will be fun since it was a long time since I did this.

Code needed in the Yaml to run my test-code below.

external_components:
  - source: 
      type: git
      url: https://github.com/erapade-forks/esphome
      ref: Testable_in_ESP8266
    components: [dallas]

Using GitHub - nrandell/dallasng for an year.
For me it working on various ESP8266/ESP32, incl. Sonoff devices, D1 mini & Shelly Uni.

1 Like

Just tried on ESP32. Compiled, installed - not booting …

Thanks for testing. My device is in a place not suitable for easy re-flashing. So I will not dare to try it until someone confirms that it works on ESP32.

Thanks for testing, I hope it didn’t gave you any problems.

I will for sure try this dallasng, a totally different piece of code I would say

I updated my device to use this library too, for now, to see if it holds up better.
It compiled ok, the device started ok, and it found the sensor without problem.
So far so good :slight_smile:

I’ve had this problem for a while and I was just reading to see if there is anything new about the error and then spot this page. Tried the code from @erapade and everything seems to work. Installation on Wemos D1 mini went smoothly and checksum error is gone. I have 10 sensors running, each with 2-5m cable. Thanks to @erapade

Thanks for informing. I was going to give this up but now I got some energy.

I think you shall also try this instead of my mod, and please report back in this thread how it works. Some posts up in this thread you will also have a reference to this:

external_components:
  - source: github://nrandell/dallasng
2 Likes

Good work, unfortunately I only have access to ESP32 variants at the moment so can’t test, sorry.

But please test the other alternative “nrandell”. As I understand that has been proven on ESP32 for quite some time now and uses a completely different codebase

Testing the “dallasng” library on my ESP32s2 with two Dallas strings, one has 5 sensors the other 4 sensors is much improved. It still raises an occasional scratchpad error, but probably less by a factor of 20…

1 Like

Proves the most important thing. We need to make sure we communicate correctly with the devices and that there are rooms for imprevements

Why do you have to use the ESP32?

Because for many projects an ESP8266 is too slow or doesn’t have the necessary interfaces, for instance multiple ADC. Plus the cost difference now between the ESP8266/ESP32 is fairly small so unless you’re making 1000 units you may as well buy the ESP32 and be future proofed.

1 Like

Absolutely, before moving over to ESPHome my hardware ran without error for 18 months using a solution built via the Arduino IDE and the standard One Wire / Dallas libs. As soon as I ported to ESPHome the scratchpad errors started, so it’s 100% a software / timing issue.