Node-RED Subflow Hogging CPU after a few days runnIng

SebaVT · December 2, 2024, 5:17pm

Hi everyone,

I’m having trouble with a Node-RED subflow I created to control my lights based on a circadian rhythm. The subflow is designed to update the lights every 30 seconds, adjusting brightness and color temperature throughout the day. It has conditions to stop adjustments if I manually change the lights, and it also allows me to set minimum/maximum brightness and color temperature.

To make this more flexible, I designed it so that any number of light entities can be connected to the subflow’s input, and it stores these entities for ongoing updates. It also includes a virtual light entity that represents the subflow’s current status, showing the average brightness and color temperature of all the connected lights.

To avoid infinite loops when updating the virtual entity and the real lights, I implemented a system of “busy” flags. I also have a lot of function nodes with asynchronous outputs (using node.send) that I make sure to terminate properly (using node.done).

The flow works fine initially, but after a few days, Node-RED starts using a lot of CPU, and the subflow seems to get stuck in a loop, constantly sending update messages to the lights. The only way to fix it is to restart Node-RED. I’ve monitored the RAM usage, and it remains stable, so I don’t think it’s a memory leak.

Here’s my setup:

Home Assistant version: 2024.11.4
Node-RED version: 18.1.1
Installation method: HA OS
Hardware: HP EliteDesk 800 G4, Intel Core i5-8500, 16 GB RAM DDR4, 512 GB SSD

Here’s my Node-RED flow:
https://github.com/Seba-VT/SVT-Light-Controller/blob/main/SVT-Light-Controller

I have a few questions:

Are there any common reasons why a Node-RED flow might start consuming excessive CPU after running for a few days?
Could the interaction between the virtual light entity and the real light entities be contributing to this problem, even with the ‘busy’ flags?
Is it possible that my use of asynchronous outputs in the function nodes is somehow related to the CPU usage spike?

If anyone has experience with similar issues or has any suggestions on how to debug this further, I would greatly appreciate your input!

SebaVT · December 2, 2024, 5:19pm

I tried pasting the flow here but apparently is to big.

WallyR · December 2, 2024, 5:32pm

If you have loops make sure to clean up the messages, so you do not have a “memory leak”, where the messages just keep growing
Each iteration.

SebaVT · December 2, 2024, 5:43pm

Thanks for the reply, I made sure that all the messages comes to an end a new fresh message is created on each iteration. I’ve been monitoring the memory usage and there is no memory leak as a can tell, the Garbage collector seems to working fine and the memory is “cleaned” regularly and the usage does not increase over time.

Biscuit · December 2, 2024, 5:45pm

There is indeed a posting limit, and it is not difficult to exceed this with a Node-RED flow. One answer is to use a GitHub account, open a repository, and upload the flow.js file there.

https://github.com/oatybiscuit/Node-RED-HA-Octopus-Agile-JSONata

SebaVT · December 2, 2024, 6:36pm

Thanks, I have updated the post with the addition of a link to the repository.

Biscuit · December 3, 2024, 11:20am

Ouch.

I think it has taken me over one hour just to go through the code and update the HA server in the config nodes.

As far as I know, Node-RED runtime (the back end that does the work, not the editor) is quite robust. I have several flows running sub-minute in loops for months on end. Like any running system, the two main killers will be memory leaks, and loops that never terminate. If you are running your own ‘queue of work’ inside this flow, then generating more work than is being consumed will give the CPU increasing work to do.

The code is (in my opinion) excessive and messy, and I think that there are several opportunities for problems. However, for myself I would personally go back to the drawing board and review just what is going on here. In any system, there is data and there are transactions that are applied to the data. Where there are multiple copies of the data, it is a very good idea to have one as the master copy, and to ensure that there are no update-loops.

Home Assistant is a state machine - it keeps a copy of the state of something in the real world, and this copy is used to report and drive automations. You appear to have at least one copy of the copy in there somewhere. Of all of these copies of the ‘light-state’, which copy is the master? If you have an update flow that means if copy A is updated then update copy B, this is fine. If you then have additional pathway that means is copy B is updated, update copy A, then there is the opportunity for a feedback-loop.

My (first) guess is that you have a feedback loop at the data-logic level. You may also have a feedback loop at the practical code level. You are holding 8 entities inside the flow, and these all have several flow nodes connected to them, so plenty of opportunity for two different in-flight messages to be trying to update the same entity, and for many other also in-flight messages to be influenced by the change elsewhere.

I defer this question to anyone else who understand asynchronous coding. However, I can’t see how

node.send(msg)
node.done()

is any different to

return msg

Node-RED executes flows asynchronously anyway, so how do we know that your busy locking flag gets set at the right time? Locking is an interesting subject, particularly as it does not always work due to timing issues (and your flow is doing so much for so long that there must be plenty of time for something to get out of sync).

My, somewhat random thoughts, are:

why are you using a subflow for this? Subflows are OK but they are very difficult to debug and work with. In my world, a subflow is a ‘pure function’ where it takes an input, does stuff to it, and potentially passes something back. Pure functions never ever change state values, the main program holds the data, not the subflow.
you appear to have multiple processing points. Keeping to one ‘read point’ and one ‘write point’ removes the opportunity for two messages to be running and working on the same thing at the same time. Also avoids the need for ‘busy flags’.
you are making use of flow context a lot, and reading several variables. It is possible to read and write several in one go, and there is nothing wrong with using an object to hold several things. If you save the ‘virtual state’ as an object

flow.set("data":
    {"brightnessTolerance": 15, "kelvinTolerance": 500, "brightnessMaxStep": 4}
);

then you can read and write just the fields you need using eg “data.kelvinTolerance”. I personally use Change nodes to work with context so I can more easily see where I am reading and writing.

On the assumption that this code is all about gradually changing the light colour/intensity over time, I wonder if you have hysteresis in there too somewhere?

Light temperature increases by 1 degree
→ recalculate 1 degree down
→ update light temperature
→ recalculate 1 degree up
repeat ad nauseam

Being a subflow this is very difficult to debug, so you may wish to consider lifting it to a flow and putting in some form of logging. Logic-Timing issues can be difficult to locate as they are only reproducible given a specific set of circumstances.

Good luck!

SebaVT · December 3, 2024, 9:35pm

@Biscuit Thanks for taking the time to analyze this, I really appreciate it.

Sorry for that, I’m not a programmer, just a hobbyist. I’m trying my best to be organized with my code and flows.

This is my second attempt to achieve this, after erasing my drawing board and starting again. My first attempt worked very well, but it was much messier and unoptimized. I was just starting with Home Assistant and Node-Red at that time, so there was a lot of room for improvement. One of the biggest improvements I wanted was to eliminate the need for several Home Assistant entities like flags, status, and select entities. Now, these are automatically handled using the entities within the subflow. Previously, I needed many steps to get my lights to work as desired. Now, with this subflow, I simply connect the “event: state” node of each light to the subflow input (I have around 20 copies of these subnodes).

I created a flow variable called entityMessages, which is essentially an array containing each light entity. This variable allows the subflow to know which entities to update and stores their current status. This approach avoids repeatedly using the “Current State” node to read entity statuses, reducing data retrieval from Home Assistant. The entityMessages variable is only modified when a light entity is updated. The subflow updates this variable whenever an “event: state” node sends a message to the subflow input. Therefore, the entityMessages variable serves as my master copy of the entity states.

This is exactly what I think my problem is. I can’t find a reliable way to control this situation. I believe something in my flow is growing and taking more and more processing time, which throws off my timing completely. Initially, I suspected a memory leak, and with your previous help, I identified and fixed some memory issues.

The big difference is that node.send allows you to continue running code after sending a message, and even use it multiple times within the function. Conversely, return msg ends the function, preventing any further code execution within it.

Do you have any resources that I can look into on this topic? It has been difficult for me to find any since I’m not familiar with the terminology.

To facilitate the repeatability of the implementation and to have a single point for modifications.

My previous solution was something like that, and it worked fine. I’ll try this again.

Can this approach improve the overall performance, or is it just for convenience of use?

I don’t use hysteresis here, and I don’t see why I should. I created a tolerance in case a manual change exceeds it, which stops the circadian cycle. This also acts as a buffer to prevent the cycle from stopping when the physical light and Home Assistant report slightly different values due to resolution differences.

Thanks again for taking the time to look into this. It is greatly appreciated, and I have a couple of ideas, thanks to your answer, that I think will improve my subflow. I’ll start changing my current ‘parallel’ approach to a ‘serial’ one.

Biscuit · December 3, 2024, 10:48pm

Nothing to be sorry about. It is your code and you are entitled to do with it exactly as you please.

Node-RED is a bit of a niche subject around here, and perhaps the Node-RED forum is a better place to post your questions (although I think it would be appreciated if you divide the problem and questions into smaller parts). Your post raises a number of more challenging questions, and I certainly don’t have definitive answers, although I hope that my comments do help. I am also keen to learn myself.

I have a few significant flows now, and I am on my third iteration of tidy and improve on several of them. One flow required a bit of work as I had a serious memory leak, but otherwise I find it useful to get something working, walk away, and come back much later to review and refactor. For myself, I would want to review the high level design of the data structure and the update paths and how it all hangs together, and (again for myself) I would want to review the concepts of parallel processing and locking, but clearly I am coming at this from a different starting point and perspective!

Parallel processing is a bit of a nightmare, and I would go as far as to say it should be avoided. A single stream of processing carried out in sequence can be understood, debugged, and is predictable. Parallel processing introduces many problems and challenges.

I have not used parallel processing or locking myself, so I am really not in a position to comment. I am not sure about resources on the subject either - it has been too long since I did the theory and associated programming for a living.
My thoughts were to identify all opportunities for timing issues, hence the random suggestions.

Subflows are tricky as they are difficult to debug, and hold their own environment. As you started the discussion on the basis of debugging, I would still suggest that working at the top level would be easier than from within a subflow. I wrote a complicated subflow but gave up on it eventually as it was like doing key-hole surgery to find out what was going on.

Context is good, but I think it has limitations. Holding everything in one object and reading/writing using a Change node ensures that this happens in assured sequence as part of the flow (after the prior node, before the next node).

I kind of understand node.send() and node.done() but in the places I saw you had them together with nothing in between or after, and I assumed that return msg would do - again this is just trying to identify any potential issue and to remove it as part of a general sweep.

As for hysteresis, it was just another thought regarding potential causes of ‘chatter’ and I just wondered if (I have not looked in depth at the code) the calculations were rounding up and down alternately so as to continually loop.

Enjoy!