Multiple DS18B20 dallas.sensor: Scratch pad checksum invalid!

Yeah, it has to be something with ESPHome…the problem is identifying the root cause. ESPHome is pretty advanced compared to running a sketch reading the Dallas sensor. And since it relies on precise timings, there is a lot that could affect this.

Right now I am using the dallasng library mentioned above. It is less than a week, so way to early to say if it is “rock-solid” or not. But I will use that library until I see any other progress being made, since I prefer to run “stock” ESPHome when it comes to libraries.

When something new is ready for testing I can probably help, as long as I know that the new SW will boot on ESP32, since it is a bit tricky to hook it up to a computer to flash it.

I tried that library as well an it was a little better but not good enough to use. Hopefully the esp one of the members is sending me will arrive soon…

Jeff

So far, I only see this issue:

But it is not (yet) causing any issues reading the temperature. It does indicate that something is not entirely ok with the Dallas code though.

Thanx for awesome advice in this thread. I’ve been struggling with 4 dallas sensors together with an azDelivery ESP32 wroom. I was having issues like scratch pad checksum and conversion fail etc.

I tested the dallasng with no luck. Finally I was playing around with different resistors as suggested in different threads (from 4.7k to 1k). When measuring resistance over 3.3v and data I noticed that I had about half of the value over the resistor. Also I had som small connection between 3.3 and ground (my instrument didnt beep when testing soldering joints) but when measuring I could clearly see that there was i small connection.

I cleaned my testboard from flux residue and all the sensors came alive!!

Long story short, If you read this thread and make the assumption that your hardware/software is faulty (like I did). Measure your resistance in the final circuit! It seems like the current sw-code is very picky.

2 Likes

Just a work-note from my side.

I’m doing other things with HA right now so this is not on priority right now. But… As I prepared some Sonoff Basic R2 (that’s where my ESP8266 comes from) to act as PID Climate regulator I added some Dallas sensors (just a single one) to each of them. All worked fine except one. The failing one was discovered with it’s ROM code but then getting CRC failures.

As My family don’t like the house being cold I didn’t do any investigations, I just switched to the “dallasng” code and it then worked fine. I also tried switching to my own code and that also worked fine.

There’s more to do from an investigation side in all of this but as of now we at least have some options we can try. The options are:

external_components:
  - source: github://nrandell/dallasng

or

# Please note that this code has no intention to work with ESP32 
# even though it might work.
# And there's no guaranties, be prepared to flash with serial cable 
# if things goes wrong.
external_components:
  - source: 
      type: git
      url: https://github.com/erapade-forks/esphome
      ref: Testable_in_ESP8266
    components: [dallas]

Just a note though. Even though I created (the last code) myself, I stick with the original one whenever that is working fine and I recommend everyone else having the same approach

1 Like

The devs have come up with a possible fix. Add this to your ESPHome file

external_components:
  - source:
      type: git
      url: https://github.com/ssieb/esphome
      ref: ds
    components: [ dallas ]
    refresh: 1min
1 Like

Added above line to file and it did not solve the issue. Rolled back to version: 2.0.4 in framework section and that resolves all but a few random checksum errors. My setup has 8 ds18b20’s on it.

If the only change was within the dallas component itself I don’t think this will help since reading the scratchpad was already protected (as good as it can be) from interrupts even before.
With “as good as it can be” I mean that we are not able to protected from interrupts happening in e.g. the wifi communications as I understand.

image

But… I got some ideas. For intermittent checksum errors, there should be at least one re-try whenever we are reading the scratch pad, this since we know things can happen due to that we can’t trust the interrupt lock to fully protect the communication with the sensor. I mean, that’s what the checksum is there for anyway…

So, this code should include a repeat if the checksum is not correct. Currently it kind of always returns true and does not check for crc errors:
image
The problem is that it today only returns false when the reset pulse doesn’t go through and that only happens if the bus isn’t high when starting to reset (i.e. should never happen).

While implementing a repeat in the read_scratch_pad method can be done very easily, there’s other code effected that needs to be adopted to this change for the whole code to make sense. Let’s see if I can manage to propose the code change or if the developer does this (preferred)

Having this change will not solve all the problems, but it will protect from wifi interrupt related interferences and other occasional on the edge timing related issues.
And knowing we have this issues with wifi related interrupts, maybe we shall make sure that when we write to the scratch, we also can read back the values correctly as well. Writing to the scratch pad only happens once in the process when we setup the sensors and configures the alarm limits and the resolution, but to be on the safe side…

That wasn’t actually true. At least on the S2, it appears that the wifi was interrupting in between those two sections and messing up the first timing. That change fixed the checksum errors I was getting while watching the OTA logs. I can’t reproduce any checksum errors now.

ok, but I think the wifi issue is so intermittent that its hard to say from just looking at one sensor in one environment.

I could see that I was wrong above saying that the reset pulse only returns false when the bus isn’t initially high, there’s also the presence check taking place after 70 ms. But… the wifi related interrupt could step in at anytime and if it e.g. does this at this point, the code will return false

But since we know the communication line by default (due to e.g. the wifi interrupt issue) isn’t 100% stable I feal that using the checksum provided in the protocol for re-transmission would increase the stability of the component. It might not fix all the problems but at least some of them. If you have the possibility, please also try to stress the bus by adding capacitors with various values around 1nF to stress the bus and see if you still have a stable communication. I did sometimes and sometimes not

It wasn’t intermittent at all. It was happening very regularly. And since after that fix I can’t reproduce any issues, there’s nothing more I can do unless someone else can provide a reproducible issue or other useful information.

1 Like

I was using the dallas_ng driver with the occasional scratchpad error, perhaps 1 per hour. I reverted back to the standard dallas driver and confirmed that I immediately started getting a few errors every minute. Then switched over to the replacement and I haven’t seen a single error in the last 2 hours so for me it seems to be a fix. I’ll keep monitoring and report if I do see any errors.

For info I’m using an ESP32S2 with 2 sensor strings, one with 5, one with 4.

FANTASTIC!!! Glad its working.

@gedger Can you clarify which solution you are using. The thread is not real clear on the solution.

Directly from the ESPHome devs:
devs have come up with a possible fix. Add this to your ESPHome file

external_components:
  - source:
      type: git
      url: https://github.com/ssieb/esphome
      ref: ds
    components: [ dallas ]
    refresh: 1min
1 Like

Thanks for clarifying. Will implement it when I get back from vacation and see how it works.

Not had a single warning now in 20 hours so the fix from the developer is solid.

If anyone else is still having issues, I’ll need to see at least logic analyzer traces of the data line.

Hello
I still have problem with ds18b20. Maybe I’m doing something wrong with code, could you paste .yaml for dallas with modification?
Thanks

Same problem here.
Added the external_components: mod to my yaml but it does not fix the problem.

Running with 3x DS18B20 on NodeMCU without problem on same hardware.

Config ESP8266 NodeMCU V2 , 3 sensors on GPIO12 .

Reverted back to Nodemcu with MQTT…