Node-Red 19.0.2 crashes UnhandledPromiseRejection

This is a new thing, but not sure if it is related to a specific version of NR or HA. Running HA 2025.2.4, NR 19.0.2, and periodically I find flows not running and look and the add-on has crashed. Upon restart it picks up and runs fine for a while.

Below is the latest crash’s footprint in the log. The data above is routine output from a frequently run flow that is quite chatty so it is not unusual see there; it is a flow I tweak frequently so I want a lot of output.

I do not THINK that these crashes come with the UI of NR open, and as these have occurred before I am unable to find any likely proximate cause, there is nothing happening that appears to correlate to a crash.

How can I find more info to find the cause?

In particular, is there a simple way to make a watchdog for NR, maybe an automation that checks every few minutes to see if the add-on is running? (Is there a status that HA automations can “see”)?

This would both allow me to restart it, and also perhaps be more likely to find the situation causing the crash.

Any ideas? Anyone else having this?

Linwood

PS. I use a fair number of function nodes, but have not done anything deeper in terms of modifications to, calls to, or tweaking NR, it is vanilla and run as an add-on to an HAOS installation of HA.

22 Feb 20:34:41 - [warn] [function:Manage Everything when Something Changes ] No zones show any need - resetting to turn off
22 Feb 20:35:20 - [warn] [function:Manage Everything when Something Changes ] diffAvg = -3.4999999999999964, We have decided we should Heat; Zones cooling = 0, Zones heating = 0
22 Feb 20:35:20 - [warn] [function:Manage Everything when Something Changes ] No zones show any need - resetting to turn off
22 Feb 20:35:45 - [warn] [function:Manage Everything when Something Changes ] diffAvg = -3.549999999999997, We have decided we should Heat; Zones cooling = 0, Zones heating = 0
22 Feb 20:35:45 - [warn] [function:Manage Everything when Something Changes ] No zones show any need - resetting to turn off
22 Feb 20:36:00 - [warn] [function:Manage Everything when Something Changes ] diffAvg = -3.5749999999999993, We have decided we should Heat; Zones cooling = 0, Zones heating = 0
22 Feb 20:36:00 - [warn] [function:Manage Everything when Something Changes ] No zones show any need - resetting to turn off
22 Feb 20:39:49 - [red] Uncaught Exception:
22 Feb 20:39:49 - [error] UnhandledPromiseRejection: This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). The promise rejected with the reason "3".
    at throwUnhandledRejectionsMode (node:internal/process/promises:392:7)
    at processPromiseRejections (node:internal/process/promises:475:17)
    at processTicksAndRejections (node:internal/process/task_queues:106:32)
[20:39:49] INFO: Service Node-RED exited with code 1 (by signal 0)
s6-rc: info: service legacy-services: stopping
s6-rc: info: service legacy-services successfully stopped
s6-rc: info: service nginx: stopping
[20:39:49] INFO: Service NGINX exited with code 0 (by signal 0)
s6-rc: info: service nginx successfully stopped
s6-rc: info: service init-nginx: stopping
s6-rc: info: service nodered: stopping
s6-rc: info: service init-nginx successfully stopped
s6-rc: info: service nodered successfully stopped

Well, I thought I at least had a workaround, discovered there’s a “running” sensor you can enable.

Except it doesn’t work. It worked exactly once, but after I stopped Node Red as a test (and it worked), when I restarted it, it would show not running even though it clearly was:

WTF?

Postscript – some time later, without any other changes, the sensor went to true again. Some weird delay?

My thinking was to use this as a workaround, but it relies on that sensor being correct:

alias: Announce if Node Red is stopped and restart
description: ""
triggers:
  - trigger: state
    entity_id:
      - binary_sensor.node_red_running
    from: "on"
    for:
      hours: 0
      minutes: 2
      seconds: 0
conditions: []
actions:
  - action: tts.google_say
    metadata: {}
    data:
      entity_id: media_player.all_google
      message: Node Red has Stopped
  - action: notify.send_lef_mail
    metadata: {}
    data:
      message: Node Red has stopped
  - action: hassio.addon_start
    metadata: {}
    data:
      addon: a0d7b954_nodered
mode: single

Home Assistant knows about Node-RED when it is being run as an add-on via the supervisor, and the supervisor can track when add-ons are running or have updates. The running state is clearly not going to be checked every second, so yes there is a delay.

If you want to know that Node-RED is alive and running, an easy way is to set up a sensor entity as a heart beat.

[{"id":"f321ef2ba88a1a4e","type":"ha-sensor","z":"41cd9b273f93e865","name":"NR Heart Beat","entityConfig":"fd97b3b58cc72db2","version":0,"state":"$now('[FNn,3-3] [MNn,3-3] [D01] [Y] [H01]:[m01]:[s01] [z]')","stateType":"jsonata","attributes":[],"inputOverride":"allow","outputProperties":[{"property":"payload","propertyType":"msg","value":"","valueType":"data"}],"x":420,"y":2160,"wires":[[]],"server":""},{"id":"762c56089ec35c3c","type":"inject","z":"41cd9b273f93e865","name":"Every 10 seconds","props":[],"repeat":"10","crontab":"","once":true,"onceDelay":"10","topic":"","x":210,"y":2160,"wires":[["f321ef2ba88a1a4e"]]},{"id":"fd97b3b58cc72db2","type":"ha-entity-config","server":"","deviceConfig":"","name":"NR Heart Beat","version":"6","entityType":"sensor","haConfig":[{"property":"name","value":"NR Heart Beat"},{"property":"icon","value":""},{"property":"entity_picture","value":""},{"property":"entity_category","value":""},{"property":"device_class","value":"timestamp"},{"property":"unit_of_measurement","value":""},{"property":"state_class","value":""}],"resend":false,"debugEnabled":false}]

I have Node-RED running on my HA machine and also on independent Raspberry Pis, all of which report back every 10 seconds so I can see if and when something stops.

As far as your UnhandledPromiseRejection is concerned, Home Assistant changed the way service calls (now Actions) are run and any Action call has a return, so anything that calls an action must setup a promise for a return, and deal with it. The most (?) likely cause is ‘stuff’ that needs updating. This may be an NR node (check the palette) it may be the Node-RED Companion.

@Biscuit thank you.

If the supervisor will eventually (as in say 30 minutes or less) catch this, that’s probably adequate. Though thank you for the sensor technique, I may change to that if this seems more delayed.

As to:

Let me make sure I understand… the promise stuff is inside these items, not something in how I might use them, right?

All in my pallet are up to date, but this could be a relatively unmaintained contributed item that has not been updated.

Is there anything I can look for in the log that might point to a specific contributed node? I have quite a lot of them (most probably are not in use but always afraid to remove them – if they don’t say “in use” and offer a remove – can I safely delete them? I.e. is “in-use” reliable?)

With respect to the NR developers it sure would be nice if failure of a node to behave right didn’t crash the whole subsystem, couldn’t they have made it a warning somewhere prominent, since it was a breaking change and one they clearly do catch?

I am certainly no expert on JavaScript, so I am sure someone else can explain it much better.

However:

When something gets done, like an http call, there is often a wait before the response comes back. Rather than everything waiting, the code is usually written to issue a promise (that the result will be returned at some point) allowing other things to carry on working.

When the original request response comes back, the owning code is called up to run again, and it must either run OK and fulfill the promise (return something) or something has failed (eg the http request times out) and the code traps this “error” and rejects the promise. The outcome of a promise must be either success or rejection.

If the code as written does not catch the “error” and execute a promise ‘rejection’ correctly (to deal with it) then the error is uncaught and passes upwards through the layers of code. If nothing has been done to deal with it, then eventually Node-RED will pass this error to JS and JS will stop.

Finding the code that is causing this is the hard part. As you say that this error is ‘new’ then it is most likely due to code that did work under node.js v18 but not v22. There have been changes to the way Home Assistant works, and the WebSocket nodes have changed to deal with Action call returns. Node-RED has also moved towards more asynchronous processing, which requires promises.

Exactly - so turning stuff off (disabling nodes) and seeing if the problem goes away might be the only way of finding this.

Looking in the logs may not show up anything of use - since this is probably happening after a node has done something and only when the action returns a result (or more likely times out). Hence having a more precise time of failure may help you look back to see what was doing stuff in the minute or two before the error. I would focus on API calls, http reads, file processing … anything that takes time between request and response.

Any contribution node where the code has not been updated for, say four years, where there is a small user-base.

Nodes not in use are just not used, so won’t be a cause. In general it is probably a good idea to remove unwanted nodes, but clearly only if they are ‘not in use’, which I believe works correctly.

Node-RED is a complicated program, and runs under JavaScript. The execution part - the bit that runs the nodes - has to work on the assumption that the nodes follow the rules. The rule is, if you issue a promise, then you must either fulfill the promise or reject the promise. It is not the job of the Node-RED to trap every ‘unexpected error’ and deal with it, and since this is a JavaScript code issue, Node-RED can’t probably do much about it.

It would be nice if everyone wrote perfect code. Maybe AI will achieve that. In time.

Edit:

Home Assistant is, of course, going to be a possible cause, and again it is necessary to check things like the Node-RED companion version, and the WebSocket nodes.

@Biscuit thank you.

My problem is that this crash is quite rare. Maybe once every week or two. Unfortunately I have been trusting my HVAC system to Node-Red (long story of incompetent installers). So I woke up cold the last time when it stopped in late evening.

But being rare means it’s hard to disable something and see if the problem goes away (how long do you wait?), it’s doubly hard if a given flow has several contributed nodes.

And of course this could be in the core Node Red just in lightly used, or fails only in niche cases.

I understand your points on programming, but I do think this breaking change could have been implemented so that footprints were detectable in failures, some sort of promise stack whereby unanswered cases were reported out as the cause during a crash. Some mechanism to set up a “trap” for such cases so I can find them. It’s not like every contributed set of nodes is used in isolation. But all that’s moot – I didn’t write it, I can’t fix it.

I’ll try to find proximate activity with each crash, something to indicate which flow(s) were active before it crashed.

A possible solution:

So it appears that Node-RED did trap these errors as ‘warn’ only up to v15. If you have been running on v14 then NR would not fall over, just warn about the problem.

This was changed to ‘throw’ from v16 onwards. If you could go back to Node-RED v14~ then the error would not be fatal. However, I very much suspect that going that far back is now not possible.

You might gain better traction on your issue by posting on the Node-RED forum directly.

Good luck

You can capture more details about the rejection by adding this snippet to your Node-RED settings.js file:

process.on('unhandledRejection', (reason, p) => {
  console.log('Unhandled Rejection: ', p, 'reason:', reason);
});

The promise rejected with the reason “3”.

From my experience, this error is thrown by the home-assistant-js-websocket library used by the Home Assistant nodes. The number 3 corresponds to a connection lost state.

Here’s the issue I reported:
https://github.com/home-assistant/home-assistant-js-websocket/issues/533

I can’t say for sure if you’re facing the same issue, but this is the only pattern I’ve found when encountering this type of promise rejection.

1 Like

Thank you @Kermit for the logging change, and also for this.

Am I reading this correctly, that if I restart HD (from the develop, restart option not a full reboot) that Node Red might be running (or start more quickly) and if there is a flow that kicks off with unfortunate timing this might then occur?

I have one flow that runs a LOT (basically any temperature change anywhere in the house), plus I have been doing a lot of changes to other aspects of home assistant (adding zwave nodes mostly, changing them around, etc.). So I do restart it somewhat frequently.

So this might be a bug, not an unmaintained contributed node?

I’ll add the logging change and see if I can get more info the next time it happens.

Yeah… it’s dying when i restart Home Assistant to make various yaml changes active.

And the automation I set up won’t fire probably because it doesn’t see a change from running to not, so tomorrow going to do something like you did, @Biscuit. Just haven’t yet, but almost went to bed with it off again. Would have been a cold morning.

Do you have automations that check this that deal with race conditions at startup (e.g. the automation checking for a gap before NR has actually had a chance to update it)?

Was trying to figure it out and it’s getting kludgy, like checking every X time, but in the check waiting long enough for NR to update before it actually checks, plus have to check if it’s unavailable (as it appears to be if HA restarts and NR hasn’t updated)…

I monitor several critical NR flows, and my four Node-RED instances on a 10, 20 or 30 second cycle. This provides visual feedback and post-failure diagnosis of when something stopped. I have very few failures to need automated restart, and I prefer manual intervention.

Cross-machine monitoring permits triggering NR to email / text if no heartbeat is seen for 5 to 7 minutes on another machine.

Kermit has a great HA has restarted trigger for NR that deals with the complications of waiting for HA or both HA and NR to restart in sequence.

https://zachowj.github.io/node-red-contrib-home-assistant-websocket/cookbook/starting-flow-after-home-assistant-restart.html

I confess I still don’t fully understand it, but it works. Otherwise I just wait 5 to 6 minutes, which is about the maximum time for a full machine, HA, Node-RED add-on reboot.

I use mqtt to tell if NR is running. Add the following to configuration.yaml

mqtt:
  binary_sensor:
    - name: "Nodered running"
      state_topic: "NR/monitor"
      payload_on: "ON"
      off_delay: 65
      device_class: connectivity
      value_template: "{{ value_json.state }}"
      unique_id: 953d0088-a6fb-4a43-a01e-889e23fbe8ee

Add the following to NR

image

[{"id":"0448970314e21ba0","type":"mqtt out","z":"82d58e9221af0153","name":"NR monitor","topic":"NR/monitor","qos":"","retain":"","respTopic":"","contentType":"","userProps":"","correl":"","expiry":"","broker":"601bef1.d5b981","x":710,"y":60,"wires":[]},{"id":"10660dfd8e7f68a4","type":"inject","z":"82d58e9221af0153","name":"","props":[{"p":"payload"},{"p":"topic","vt":"str"}],"repeat":"60","crontab":"","once":false,"onceDelay":0.1,"topic":"","payload":"{\"state\": \"ON\"}","payloadType":"json","x":520,"y":60,"wires":[["0448970314e21ba0"]]},{"id":"601bef1.d5b981","type":"mqtt-broker","name":"Mosquitto","broker":"127.0.0.1","port":"1883","clientid":"","autoConnect":true,"usetls":false,"protocolVersion":4,"keepalive":"60","cleansession":true,"autoUnsubscribe":true,"birthTopic":"","birthQos":"0","birthPayload":"","birthMsg":{},"closeTopic":"","closeQos":"0","closePayload":"","closeMsg":{},"willTopic":"","willQos":"0","willPayload":"","willMsg":{},"userProps":"","sessionExpiry":""}]

The inject fires every minute telling the mqtt sensor that it is connected. The off_delay key waits for 65 seconds for another message, if one doesn’t arrive in that time frame, the sensor is set to off.

The MQTT was perfect since it automatically changes after a delay I can specify. I just put a check for any bad condition after 3 minutes and figure that gives everything plenty of time to restart. Thank you.