A Practical Look at JavaScript Garbage Collection

How V8 manages memory: generational GC, scavenging, mark-sweep, and what it means for the code you write.

#JavaScript#Node.js

Garbage collection in JavaScript is an obligatory topic for every interview, so I will put here this note in order to use it for refreshing my memory. Maybe this note will be interesting for someone else. We need to know which objects GC can not free, and we need to be able to distinguish them from others. That's why we need to remember how V8 actually collects memory, and maybe this knowledge will improve the way how we write code.

Where objects live, and what is the scavenger

Every object, array, closure, and string in JavaScript lives on the heap. V8 splits the heap into a few separate spaces. New Space, is small and fast, and new objects are allocated there. Old Space holds objects that have already survived a couple of GC cycles. There are also few types of space which are used for special cases: a Large Object Space for objects too big and expensive to operate on, a Code Space for compiled machine code from the JIT, and a Map Space for hidden classes, V8's internal descriptors. When it comes to runtime, most objects die young. A temporary array you built to map over some data is usually gone before the next GC cycle even runs. Most allocations are garbage collected almost immediately. It works for almost every real JavaScript workload, so V8 builds its whole strategy around it and runs two collectors instead of one.

The first collector is the Scavenger. It handles the young generation which we discussed just above. It runs often and it is fast. Mark-Sweep, or Mark-Compact is the major GC which handles the old generation. It runs much less often and is more expensive in terms of resources. New Space is split into two equal halves, from-space and to-space. Allocation happens in from-space with a bump pointer (just incrementing an offset), so it costs almost nothing. When from-space is full, the scavenger runs. It walks the root references, the stack, globals, and copies the live objects out of from-space into to-space. It updates every pointer to the new locations, then swaps the labels so to-space becomes the new from-space. Everything that was not copied was dead. And the old from-space is simply wiped. Objects that survive two scavenge cycles get promoted to Old Space.

Scavenger is built for intermediate arrays and object constructions like this:

function processItems(items) {
  // Each .map() creates a temporary array - exactly what the scavenger handles well
  return items
    .map(item => ({ ...item, processed: true }))
    .filter(item => item.active)
    .map(item => item.id);
}

Those intermediate arrays are created, used, and gone inside a single call. That is what the young generation is for, and this pattern costs almost nothing.

Scavenger operation has a tricky edge case. An old object can get a property that points at a young one.

const longLived = {}; // Promoted to Old Space
// ... later ...
longLived.cache = { temp: true }; // New object in New Space

The scavenger only looks at New Space roots, not all of Old Space. On its own, it would miss that longLived.cache is keeping { temp: true } alive.

For fixing this write barriers principle is used. Whenever a reference is written into an object, V8 checks whether a young pointer was just stored inside an old object. If so, it records that reference in a remembered set. When the scavenger runs, it treats the remembered set as extra roots. Scanning that small set is much cheaper.

What happens to objects which survived the Scavenger

Old Space works differently. Copying everything the way the scavenger does would be too expensive here, because Old Space is much bigger and we suspect that its objects are going to live a long time.

The mark phase starts from the GC roots and walks down the entire object graph, marking every object it can reach. The sweep phase then walks Old Space and returns the unmarked, dead memory to free lists for future allocations. There is an optional third step because sweeping leaves gaps in memory over time. That's why V8 watches the fragmentation and when it gets bad enough it moves live objects together and fixes up the pointers. The compaction is expensive that's why it happens only on the most fragmented pages.

Maintaining application performance

Mark-sweep freezes the whole application while it traces the heap. That's why V8 splits this work. Incremental marking breaks the marking into small chunks and runs them between pieces of your code. Concurrent marking goes further and marks the heap on helper threads while the main thread keeps running. Sweeping runs on background threads too. Also the page is not swept until something actually needs the memory.

So major GC pauses in modern V8 are usually a few milliseconds, even for heaps in the hundreds of megabytes.

Where the memory leaks come from

Memory leaks are caused by accidentally keeping references alive. The GC is doing its job on objects we forgot we were still pointing at. Here are few examples on how it can happen:

// 1. Forgotten event listeners
class JsonStream {
  constructor(socket) {
    // This closure captures `this` - if you never remove
    // the listener, this JsonStream instance lives forever
    socket.on('data', (chunk) => {
      this.handleChunk(chunk);
    });
  }
}

// 2. Growing data structures with no bound
const requestLog = [];
app.use((req, res, next) => {
  requestLog.push({ url: req.url, time: Date.now() });
  // This array grows forever. Every request object is retained.
  next();
});

// 3. Closures capturing more than they need
function createHandler(hugeConfig) {
  // This closure keeps `hugeConfig` alive even though
  // it only needs one property
  return () => {
    console.log(hugeConfig.name);
  };
}

Every snippet from this example is a reachable object, so the GC is right to keep it alive. The usual fixes here are to remove the listeners, rotate the const requestLog, etc.

What this means for your code

Functional patterns built on .map(), .filter(), and spreading create temporary objects, and that is fine. The scavenger handles them for almost no cost. Do not rewrite your code to avoid allocations.

And long-lived caches can be the real source of problems. An unbounded Map or plain-object cache is the thing that actually makes the GC suffer. Use a WeakMap when the keys are objects, or give the cache an eviction policy:

class LRUCache {
  constructor(maxSize = 1000) {
    this.cache = new Map();
    this.maxSize = maxSize;
  }

  get(key) {
    if (!this.cache.has(key)) return undefined;
    const value = this.cache.get(key);
    // Refresh position by re-inserting
    this.cache.delete(key);
    this.cache.set(key, value);
    return value;
  }

  set(key, value) {
    if (this.cache.has(key)) this.cache.delete(key);
    this.cache.set(key, value);
    if (this.cache.size > this.maxSize) {
      // Delete the oldest entry (first inserted)
      const oldest = this.cache.keys().next().value;
      this.cache.delete(oldest);
    }
  }
}

For tweaking the gc space you can use --max-old-space-size variable. Node defaults to roughly 1.5-2 GB for Old Space on 64-bit systems. You can raise it if your workload genuinely needs the room. But raising it is not a proper way to fix a memory leak, fixing the root cause is the only real option.

When something still looks wrong, profile before you touch anything. Chrome DevTools and Node's --inspect flag give you heap snapshots, allocation timelines, and GC traces:

# Expose GC stats
node --trace-gc app.js

# Get detailed GC info
node --trace-gc --trace-gc-verbose app.js

# Take heap snapshots programmatically
node --inspect app.js

There is also a way to use interact with GC directly in node. Calling global.gc() after starting Node with --expose-gc is a way to do it. You can do it, but you almost never should. This capability is nothing more than an interesting fact for me, because I am almost sure that my logic will absolutely certainly not beat the built-in v8 GC logic.

Also there is such thing as object pooling. In V8 allocation is a bump-pointer increment, and it is already nearly free. So a pool just adds code and keeps objects alive longer than they need to be, which pushes them into Old Space.

So the overall conclusion from the whole note: 1. Try to stay in new space. 2. Avoid unbounded listeners. 3. Avoid huge globals 4. Be aware of --expose-gc but not utilize it - no reason for it. 5. Be aware of --max-old-space-size, but do not use it as a fix for GC issues, 6. Inspect heap snapshots in Node via --inspect flag.