I was sufficiently interested these postings to go away for a few weeks (months…) and try to find some answers, mainly because I don’t want to get stuck like this myself and because there seems to be very little information to help when NR falls over like this, at least here in the HA forum.
I run a number of Node-RED flows to monitor and control my Solar PV and battery system, and I am keen to prevent NR memory issues. I have had NR shut down and restart in the past, and traced this back to some lazy coding on my part. I had set up a big loop to repeatedly fire commands to my Pylontech battery console, and used split nodes to break array lists of commands into parts. It all worked, but as I did not bother to join the splits back up at the end of each loop, the message got bigger and bigger until NR ran out of memory (about every 30 days or so). Simply deleting the ‘msg.parts’ bit did not fix it, and I had to rewrite my code to join each split back when I had done with it, before going back round the outer loop again.
Starting from the top:
Node-RED program flows run in Node-RED, which itself runs on Node.JS (hence the ‘node’ in Node-RED). Node is a Java Script server, designed to run JavaScript ‘server-side’, and this has been built around the V8 JavaScript engine that was developed for use in Google Chrome.
V8 is embedded into Node, and does all the JS work, including the memory management. V8 like many ‘interpreted’ lanaguages uses a ‘heap’ for dynamic memory allocation and has a garbage collection (GC) system to recover no longer required memory.
In an HA environment with NR as an addon, the NR addon release is based on a version of Node which in turn includes a version of V8. NR is run in a Docker container, using the base Linux operating system.
How does V8 memory and garbage collection work?
This is the long and complicated bit!
For an overview, the following has most of the details. https://v8.dev/blog/trash-talk
I have also found the following informative. https://jayconrod.com/posts/55/a-tour-of-v8-garbage-collection
JavaScript started life as a simple interpreted language (script) to run just on client web-browsers. To this end it uses self-managed memory with a garbage collector. Compiled languages such as C# often require the programmer to allocate and de-allocate (recover) memory. In a compiled language, the compiler can typically see all the required variables, their size and enclosing scope up-front. The compiler at compile time allocates space for variables, and the execution code only has to deal with dynamic length variables such as strings and variable size arrays. These all go on a ‘heap’, entirely managed by the programmer.
JavaScript has moved on considerably over the past two decades, and execution programmes such as V8 now work on a just-in-time compile process, where JS is compiled to tokens prior to execution. However JS still frees the programmer from having to allocated and de-allocated memory. V8 has changed over the years from a basic ‘stop everything GC’, and now runs a sophisticated generation-intermediate garbage collection system, mostly so as to improve real-time performance in high-use web pages.
V8 controlled memory sits within the overall Node RSS (resident set size) and includes the stack, and the heap. RSS is where the code sits (this includes Node code, Node-RED and execution flows). The stack is a last-in, first-out memory space (like a stack of plates) used to hold functions. Every time a JS function is called, V8 pushes the function details onto the (top of the) stack. When the function ends (returns) the function details are pulled off the stack. Details for each function on the stack include all the primitive variables, as well as pointers to non-primitive objects, arrays, and strings (and function control items like closures and call-backs).
Generally the stack looks after itself. Starting with the main program (which is effectively just another function) the stack grows, and shrinks in use. The only time ‘stack overflow’ can be hit is if a program uses a really big number of functions, or more likely when a loop continually calls a function without limit. Typically this happens in non-terminating loops or recursive function calls, so this is very much down to the programmer to fix. The stack is not garbage collected.
Now to the ‘heap’. This is a block of memory used to hold objects (strings, arrays, JS objects and all their associated primitive variables and further pointers). When the JS code requires an ‘object’ variable, the main pointer to the object goes onto the stack (inside the function being called) and the rest sits in space allocated in the heap. This works well, until the variable is no longer required (the function ends, comes off the stack, and function variables are no longer needed). At this point the old space has to be garbage collected and returned to a ‘free list’
V8 at the time of writing uses the ‘Orinoco’ project approach (named after the Wimbledon Common Womble I assume). If anyone is interested this is my understanding of how it all works:-
At start-up Node gets memory from the OS to load and run itself (the RSS).
- This memory is subdivided to provide code space, the stack, and the heap space.
- The ‘heap’ space allocated by V8 is divided into two parts - the New Space and the Old Space
- The New Space (generational) is divided into two sections - the Nursery and the Intermediate
- The Nursery is divided into two halves - the To space and the From space.
Thus the New Space has three equal sized sections - Nursery To, Nursery From, and Intermediate. These are only ever a few MB big, and do not change size once initialised.
All new memory allocation is done in the Nursery To space. This naturally fills up quite quickly.
On the ‘generation principle’ that the most recently allocated stuff is the stuff most like to become free soonest, the generational approach tries to free up the newest stuff first. In V8 Orinoco, this is done by the scavenger, and this works by regularly going through the Nursery, swapping over the From and To, and moving required memory between the From to the To. Anything in the From (was the written-To) that is still required is moved to the (new) To space and marked as having been saved once.
After a while, the scavenger passes back over Nursery, again swapping the To and From, and anything now still required that was first generation scavenged is moved to the Intermediate space, and marked as saved twice.
Savaging happens all the time, using multiple background threads taking ‘free’ execution time off the main program thread. The process keeps a master list of root pointers from the stack, and follows them into the heap, marking everything touched recursively. Anything not marked is not moved during the scavenge. The ‘From’ space thus become empty of anything wanted, and this space is then effectively clear. The ‘To’ space fills up with newly allocated memory as well as first generation saved memory. Since everything is moved into an empty To space, compaction happens naturally and fragmentation is minimal. Once required memory is moved, the root pointers in the stack are updated, as is the primary root search table, so there is quite a lot going on.
The multiple-thread approach used by the scavenger requires that the root pointer table is divided up and allocated out to different threads. This, and the fact that the main program is still running and potentially allocating new variables on the heap, requires a bit of collision management using ‘write barriers’. Also V8 uses memory in page blocks, each 1MB (each page in Unix is 4k or more depending on processor architecture, so a V8 page is many Unix pages). Thus New Space is physically divided into 1MB chunks, with the scavenger only working across some of the pages at any one time (swapping individual To and From pages) and only freeing up lightly used pages to then be re-used.
Once something has been saved twice, now sitting in the Intermediate space, if it is still required when the scavenger visits for the third time it is moved to the Old Space.
The Old Space is just one large block, and this uses the Intermediate Garbage Collector running a Mark-Sweep(-Compact) process. Unlike the scavenger, this runs (for the key part of the process) in full thread mode, stopping everything else from running. It runs only when required, and is triggered by an heuristic algorithm, so there are no specific circumstances that can been relied upon to trigger the GC. The basic idea behind Orinoco is that the Mark-Sweep GC only runs when it absolutely has to.
The Mark-Sweep GC again starts with the root pointer table, works through the entire table, and covers both the New and the Old spaces. Anything touched is first marked. Then the sweep goes through the heap and picks up anything not marked, adding it to a ‘free memory’ list. Compaction happens only to very fragmented areas, page by page, and therefore the process is often just a Mark-Sweep. Free space in the heap is maintained as a list, sorted by size order. New allocations are thus placed into the spaces ideally just large enough for their requirement. In reality, the Mark stage is multi-thread concurrent and only the Sweep stage takes over the main execution thread, and a great deal of effort has been spent on optimising all these routines and algorithms so as to reduce the (main thread) time-out impact on JS execution so as to give the end web-browser user the best possible experience.
The scavenge cycle is continuous, and has to keep up with the demand for new memory space. The mark-sweep cycle is controlled by a set of heap sizes. The maximum heap size is set at start-up and prevents the heap (and thereby also Node) from going over a given limit. Under this is a physical heap size, which is the amount of memory currently allocated by the operating system. Since Node-RED runs in a Docker Container in HA, Docker passes all memory requests to the underlying operating system (Unix) and will assume (unless set otherwise) that all OS space can be used. At first Unix will provide pages of real memory, then it will start to page out less-used pages and expand into virtual memory space. High demand for memory can introduce paging issues where the OS is trying to push out pages to disc at the same time as getting wanted pages back in, all while programmes are trying to run. Excessive demands will ultimately lead to the OS running out of useable memory completely, whereupon Unix typically starts to kill off processes. A key thing I learnt at college all those years ago was that the paging program must sit in protected (unpaged) real memory - paging out the paging program is akin to turning off the WIFI smart plug powering the WIFI router and trying to switch it back on again.
As well as the physical memory in use, V8 keeps track of the maximum-used heap size and the currently-used heap size. When the ‘currently-used’ size increases close to the ‘max-used’ size, the heap is expanded by taking more available memory in the available-heap or increasing the available-heap up to the maximum-heap size. To make this work, the increase in used-heap size is based on current size multiplied by a factor which is based on the rate of increase. This means that a fast growing heap will increase max-used heap size more quickly than a slow growing heap. Since V8 is designed to run Chrome, it is heavily optimized towards web browser clients. Web pages, when loaded, typically demand a lot of new memory quickly, which rapidly increases the heap-used size then plateaus. Over time memory use may reverse as web pages are closed, but clearly the max-used heap size only goes up. To counter this another algorithm looks at heap memory plateauing behaviour and, where possible, reduces the max heap size. Clearly this reduction in max-used heap will only happen given both time and opportunity for the algorithm to work.
It is worth noting that major mark-sweep garbage collection is triggered by the heap currently-used size approaching the max-used heap limit, which will only happen after memory allocation, and that heap size adjustment is only made after a successful GC.
When the currently-used and max-used size both approach the fixed max-heap-limit, GC has to be run aggressively to try and recover memory before the heap runs out of space.
To make (in reality to encourage) Major Garbage Collection run, memory requests have to consume close to the current heap limit, and then allow the GC time and resource to execute and re-adjust the heap sizes.
This is a simplified interpretation of all the documents I have read, as there are also large object spaces. In reality V8 does not allocate very much memory for the heap, and large objects of say >32k cannot actually fit into the standard generations in the first place so they have their own space that is not GC’d.
There are V8 start-up settings to control the V8 heap, including the heap size, the young space size, and the max-old-space-size. This last setting can be set when starting Node, and in the HA Node-RED addon this is exposed as an advanced start-up setting. This is the only Node-RED configuration that can be used to modify heap and GC ‘behaviour’.
The default max heap size depends on processor (64/32 bit) and Node version. Unofficially it appears that this was 1.4GB up to Node 11, 2.0GB from v13, and 4.0 GB from 14 onwards. These figures are for 64bit processors, halve these for 32 bit - so yes your entire heap could be just 700MB, which when divided between the new and old space, and divided into three for the new spaces, does not give a lot of room to start with.
I have not covered the more tricky subjects of call-backs and asynchronous processing. In JS, call-backs are functions passed with a function call that are separately executed at the end of the main function by way of a return. This clearly gets messy on the stack - where is the call-back in relation to the main function? If the main function goes first and the call-back goes next, then when the main function ends it will clear from the stack but the stack will not unwind as there is another (call-back) function on top of it. This means that variable pointer references still remain. Asynchronous processing with promises becomes even more interesting. If anyone reading this knows how GC works around JS promises, please do explain!
So, what are the problems with just letting Node-RED / Node / V8 just get on with it?
The big challenge is ‘memory leak’. This is where memory is allocated for something, and then either the reference (pointer) is lost or the program or programmer loses track of the memory and does not release it when genuinely no longer required. In C# and Python (which uses pointer counters) it is very possible to leave bits of memory with a reference pointer that neither the GC nor the programmer can get to. JS is more resilient (as it uses ‘reachability’) but still it is possible to create variables (use memory) and not clear these when the program has effectively done with them.
There are several conclusions that I believe can be inferred.
First point is that GC only happens when the heap is getting close to the limit. If the heap grows quickly, the max limit rises quickly and can possibly reach ‘full’ before the GC has had time to be operational and effective. Even if the GC runs, it does not always clear everything, so it could be possible for the GC to run and not find enough free space. If the New Space area has become over-full, then scavage will not work and this area will only be cleaned by the main GC, so if the main GC fails, then scavenge will also fail, and Orinoco has to give up and shut down Node.
Creating a large array in memory will take up a large block in the heap. If this array is of objects, and then the objects are re-written to be bigger objects, the allocation system may move the array to new space leaving holes that are now no longer big enough for re-use. In particular, working on two arrays at the same time using a loop to repeatedly rewrite objects could interleave one array write with the other, particularly as JS execution is becoming more and more asynchronous. Re-writing such arrays with increasingly larger objects could very quickly create a lot of holes in the heap that become unusable without compaction. Since compaction of the Old Space is the very last thing the GC does (if at all) this could fill the heap to failure before the unused space can be recovered.
How to deal with memory
In genuine cases, where there is a requirement for more space, the max-old-space-limit can be increased. This only changes the old space size, and it has the effect of making the Mark-Sweep GC less efficient, so it is not recommended to mess with any of the standard default settings (and allow V8 to work out these for itself). Making the old space very big just delays the inevitable GC and may well mean that, when it happens, there is not enough resource left to allow the GC to work correctly and in-time. GC more regularly on a smaller heap is better than GC as a last dying panic on a larger heap.
There are several things that can be done to mitigate or prevent heap issues.
Starting from the bottom and working up, all memory management stages can generate failure, which could be any one of:
- hardware (memory) faults
- Linux paging issues
- Docker / container allocation limitations
- V8 (stack or heap) failure
- Node.JS
- Node-RED
- C#, Python code run within Node-RED
- Node-RED nodes (3rd party)
- Node-RED JavaScript (programmer written function nodes)
- Node-RED flow (poor programming)
and much of this is out of our scope, particularly when using third party nodes that may have embedded Python or C#, or asynchronous JS. It is worth noting that any C# contained in Node-RED is memory managed outside of the V8 heap in its own space. The programmer is entirely responsible for allocation and recovery as there is no GC of this space.
Anyway, stuff to avoid doing (complied list reading from several places, and certainly not exhaustive)
Stack
- Avoid recursive functions without clear termination (also use tail-recursive functions where possible as JS can optimise these by turning them into a simple loop)
- Keep variables local to the function. Use ‘let’ and not ‘var’ to avoid global variable declaration
Heap
- Copy data by value and not by reference (ironically this is more economical as it avoids multi-pointer references to the same object, which reduces the efficiency of the scavenger and GC) In Node-RED there is a ‘deep copy’ option in the change node when using ‘SET’.
- Delete unwanted data ASAP
- Check loops for possible leakage and ensure loops terminate
- Check timers and API (http) calls to ensure they are closed / deleted when no longer required
- Avoid large data growth without allowing time for the GC to work
- Set heap size to an appropriate value (not as big as possible - should ideally be just a bit more than is required for normal operation)
- Consider long term leakage tracking with auto restart, and/or a plan to find and fix
- Consider short term debugging using multiple heap dumps prior to known failure to trace heap behaviour
To elaborate on some of these points:
Undeclared variables in JS are auto-created but as global variables, same as ‘var’. Thus any such variables all end up remaining for the entire duration of the Node-RED program.
Variables can be set to ‘null’ or re-assigned at any point when no longer needed. The JS ‘null’ means no pointer, which means no reference into the heap, which means the GC can recover the old memory.
Timers and call-backs all set up event listeners. These wonderful things sit on the stack until they time out. Setting a load of timeouts for 10 seconds will leave loads of variables stuck in the heap.
I have read that if a caller is cleared before the promise returns this is not an issue for GC, but recursive timers for API and http calls are a problem.
From a Node-RED flow perspective, the following things consume and hold on to memory.
- Multiple outflow lines from a node
- API calls (timeouts)
- Timers and timeouts
- Nodes that store stuff
- Processing large files / arrays / objects entirely in memory
The multiple lines out is a problem because Node-RED makes a complete copy of the message for each outflow connection. If the message is large, this multiplies the problem.
API calls and http calls are problematic due to the return which includes res and req. These returns in the message are very large and include recursive loops and should ideally be got rid of ASAP.
Large arrays, particularly of objects, should be managed to reduce overall size, and to delete unwanted data. Large files and large arrays that exceed even half the heap size are going to be a problem, and another approach working on just part of the array / file is required.
Other things to look out for are the use of ‘let’ and ‘const’ which should be tightly scoped. The use of functions and {} code blocks can restrain variable scope even further. The idea being, only have data as long as you absolutely need it and let the GC recover it early during the New Space generational cycle rather than letting it get moved into the Old Space.
Looking at the problematic code posted above, I note that almost everything points to consuming lots of memory and not letting the GC do its job. Many http calls, multiple lines out, lots of timers, many complex nodes working (potentially asynchronously) together. The HA WebSocket nodes variously use API calls and a WebSocket connection with HA. The WebSocket is single, permanent, and efficient. API calls are temporary, use more memory with timeouts and http returns, so these I believe are much more likely to cause issues with heap use and GC failure.
Memory Leak
As well as trying to avoid memory leaks in the first place, another approach is to set up a watchdog to monitor the heap memory. JavaScript inherently exposes absolutely nothing about memory to the coder, however the V8 and internal process information is available and can be accessed via JS functions using an external library.
https://nodejs.org/docs/latest-v18.x/api/v8.html#v8getheapstatistics
The node:v8 module exposes information including several calls that return memory or heap details.
Getting this module loaded needs require(‘node:v8’) which cannot be called from within JavaScript in a NR function node. However, I discovered doing this work that the latest Node-RED function nodes can be set to pull in the V8 module. Once done, the v8.getHeapStatistics()
call returns detailed data. There are contrib-nodes for getting heap details or dumping a heap snapshot, however this can be easily done directly in a function node. The hard part seems to be trying to work out what each individual field represents.
This is my project (I’m calling it ‘Project Compost’) to monitor and watch the heap.
Function node to read V8 heap memory figures
[{"id":"1a8578e1722bf90e","type":"function","z":"249a680bf3f915e2","g":"53c4c81afcce308f","name":"Read Memory Use","func":"\nmsg.payload = {\n \"memory\": process.memoryUsage(),\n \"heap\": v8.getHeapStatistics(), \n \"space\": v8.getHeapSpaceStatistics()}\n\nreturn msg;","outputs":1,"timeout":"","noerr":0,"initialize":"","finalize":"","libs":[{"var":"v8","module":"v8"},{"var":"process","module":"process"}],"x":370,"y":1560,"wires":[["f812f2fa619bc191","3c575eaca7a42b1b","c879c5ac63f83116"]]}]
This just shows how to load in the required modules ‘V8’ and ‘process’ so that they can be called.
There is no point calling this too often, since heap memory sizes will not change except following a GC. As an experiment, I am calling this monitoring flow every 1 minute, and with other flows running every 20 seconds to poll my inverter, I see regular changes. At the moment this is just running and recording one day of data to context in an array, and just the RSS space size, the heap, and the used heap. I had need to restart HA following an update (problem with an integration that stopped working and had to be updated) and the result was very interesting indeed.
On the left - my RSS/heap use after Node-RED has been running for quite some time. On the right, just after a restart. Certainly suggests memory leak over time. After start-up I would expect my memory use (RSS) to be around 200 MiB, but was seeing this climb to almost 400 MiB.
Later results:
Well, I went through all my Node-RED (function) nodes when writing this posting and replaced every var with let wherever possible. I believe that I have noticed a difference in just doing this.
Here is my current RSS/heap use after Node-RED has been running for almost three weeks, and the figure is only slightly higher than just after the restart - around 220 to 230 MiB (average).
I am now just monitoring memory use using a graph card in HA.
In conclusion, I have been quite surprised how much difference using ‘var’ can make to a long term heap memory-leak. It seems that, if I want to run Node-RED on a ‘semi-industrial’ long-term basis, then I will have to put much more thought and effort into writing my code…
Note for the Forum Moderators (assuming that anyone has actually read this far…) - I have written this without any recourse to AI. This is all my own work, having taken almost two months over this, and I accept all responsibility for my human mistakes.