Incremental Graph Locking Design
Status
This document describes the locking model that the incremental graph SHOULD use in the follow-up implementation PR.
This PR only adds the sleeper primitives needed by the design and removes the
unsafe withoutMutex mechanism. The graph itself still uses its existing coarse
mutex until the follow-up PR wires in the design below.
Problem
The old design relied on a single graph-wide mutex plus withoutMutex, which
temporarily dropped the mutex while a computor ran. That gave concurrency, but
it also created a gap where unrelated graph work could interleave with an
in-flight recomputation and observe or produce inconsistent intermediate states.
The target behavior is:
invalidate()is exclusive with anypull(), but not with otherinvalidate()calls.- Inspection reads such as
debugGetValue()anddebugListMaterializedNodes()are allowed to run concurrently withinvalidate(). pull()is exclusive with inspection reads.pull()is exclusive with otherpull()calls on the same node.pull()calls on different nodes should not block each other.
Sleeper Primitives
The design is based on two sleeper primitives:
withMutex(key, procedure)
This is the existing exclusive mutex:
- at most one caller per key runs at a time;
- other callers queue in FIFO order.
It remains the right primitive for per-node pull exclusion.
withModeMutex(key, mode, procedure)
This PR adds a new grouped lock:
- callers with the same
(key, mode)may run concurrently; - callers with the same
keybut a differentmodeare mutually exclusive; - queued callers are served in FIFO mode groups so that a later caller in the current mode cannot skip ahead of an earlier conflicting mode.
This is the right primitive for global graph phases where we want
pull/observe exclusion without forcing all pulls to serialize with each
other.
Lock Keys
The future implementation SHOULD derive two families of keys.
1. Graph activity key
There is exactly one global key:
GRAPH_ACTIVITY_KEY
This key is acquired through withModeMutex.
Two modes are sufficient:
"observe"forinvalidate()and inspection reads;"pull"for all pull activity.
Because same-mode holders are compatible, many invalidates may overlap and many pulls may overlap. Because different modes are incompatible, no pull may overlap any invalidate or inspection read.
2. Per-node pull key
There is one exclusive key per concrete node:
PULL_NODE_KEY(nodeKeyString)
This key is acquired through withMutex.
It is used only by pull operations, and only for the concrete node currently being pulled. This is what serializes same-node pulls without blocking pulls on different nodes.
Operation Protocol
invalidate(node)
- Acquire
withModeMutex(GRAPH_ACTIVITY_KEY, "observe", ...). - Run the invalidation logic.
- Release the mode lock.
No per-node mutex is needed.
inspection read
- Acquire
withModeMutex(GRAPH_ACTIVITY_KEY, "observe", ...). - Read the requested inspection data.
- Release the mode lock.
No per-node mutex is needed.
pull(node)
- Acquire
withModeMutex(GRAPH_ACTIVITY_KEY, "pull", ...). - Acquire
withMutex(PULL_NODE_KEY(nodeKeyString), ...). - Pull dependencies and compute/write back the node value while holding both.
- Release the node mutex.
- Release the mode lock.
The computor stays inside the pull critical section. This is safe because the critical section is no longer graph-global: other pulls may still proceed on other nodes, while invalidates and inspection reads are excluded.
Why This Matches the Requested Semantics
Invalidates with invalidates
Both use GRAPH_ACTIVITY_KEY in mode "observe", so they are compatible.
Invalidates with reads
Both use GRAPH_ACTIVITY_KEY in mode "observe", so they are compatible.
Pulls with reads or invalidates
Pulls use mode "pull" while reads and invalidates use mode "observe". Those
modes conflict, so these operations are mutually exclusive.
Pulls on the same node
They contend on the same PULL_NODE_KEY(nodeKeyString), so they serialize.
Pulls on different nodes
They share the compatible global "pull" mode and use different per-node mutex
keys, so they may proceed concurrently.
Deadlock Discipline
The follow-up implementation MUST keep this acquisition discipline:
- acquire the graph activity mode lock first;
- acquire any per-node pull mutexes after that;
- never acquire
"observe"while holding a per-node pull mutex.
Inspection reads and invalidates only take the global mode lock, so they cannot participate in a node-level cycle.
Pulls may recursively pull dependencies while already holding pull locks. The
incremental graph is a DAG, so any wait edge from node A to node B implies
that A depends on B. A deadlock cycle would therefore imply a dependency
cycle, which the graph constructor already rejects.
Why withoutMutex Must Not Return
withoutMutex encoded a very different strategy: temporarily leave the critical
section and try to restore it later. That is fundamentally the wrong shape for
the new invariants because:
- it allows a pull to overlap an invalidate;
- it allows two same-node pulls to race through the same recomputation;
- it requires the caller to reason about a lock gap outside the type and API structure of the primitive itself.
The safer replacement is not a more careful "drop and reacquire" helper. The safer replacement is a pair of primitives that directly express the intended compatibility rules.