Building Concurrent Dictionary

Sep 18, 2024

Many languages such as Swift, C#, Java, C++ etc provide multi threading capabilities to enable quicker execution of computing tasks without blocking the main thread. Writing thread safe code is challenging as it leads to numerous issues such as: race condition, deadlocks, starvation, memory inconsistency, performance overheads due to context switching etc. Though, multi threading provides lots of capabilities, but due to challenges mentioned, getting it right is challenging and sometimes may lead inferior performance. Let’s try to build a Concurrent Dictionary and explore the challenges further and how they can be tackled.

Example of class ConcurrentDictionary provides a basic implementation of concurrent dictionary. At present, we have exposed publicly only AddOrUpdate method. Let’s go deeper into its implementation.

Thread executing the method takes lock to _syncRoot object if no other thread has a lock to it. Basically, it sets flag to tell OS that a thread has already taken a lock.
Until the current thread doesn’t execute the whole code block under lock, no other thread can execute the code block. This ensures that adding or updating a key/value is thread safe.

Let’s explore some issues with the AddOrUpdate method. To keep the post short, let’s focus on only 3 major issues.

As there is only one _syncRoot object, that means at a time only thread can update/add new key/value in the concurrent dictionary. Which makes current implementation not performant to many concurrent adds/updates as threads would wait for a thread having current lock to release. There would be huge contention issue. As the size of dictionary grows, the situation would be a huge bottleneck.
node.Value = value: this sets an existing node for a key with the new node value. This code is non atomic, which means, if another thread is reading or updating this node then it may lead to inconsistent data. For instance let’s consider the following struct being set assigned to node.Value = new Point(10, 20); If thread A is setting this value, and if thread B is reading node’s value it, it may happen that during read, X would have previous value and Y may have new value. This results in inconsistent data.
```
struct Point
{
    public int X;  // First integer field
    public int Y;  // Second integer field
    public Point(int x, int y)
    {
        X = x;
        Y = y;
    }
}
```
Stale data: As modern cpus have multiple cores with independent L1,2 caches and registers; common L3 cache, due to this a core may access/write the data from registers/cache instead from memory, this could lead different copies of data available to different threads.

Let’s change the code to tackle the above issues starting with 1st issue.

The above code change tackles the issue 1 mentioned earlier.

_locks array is initialised with the required concurrency, if the concurrency level passed is lesser than 1 then we use the number of cores as the concurrency level.
AddOrUpdate method: the appropriate _locks object is obtained based on the bucketIndex obtained earlier. This would ensure that concurrently multiple threads can mutate the ConcurrentDictionary. Also, it ensures that only 1 thread can mutate a key at a time since there is 1:1 mapping between a key and lockNo so that only one thread can have lock for a given a key.

Let’s fix the 2nd issue.

The changes done are to ensure that the writes are atomic.

if (!typeof(TValue).IsValueType || ConcurrentDictionaryTypeProps<TValue>.IsWriteAtomic)
{
    node._value = value;
}

Above condition checks whether value being set is reference type or value type but can be updated atomically. If true, then it directly sets the new value to the existing key Node. This is also performant as we are not creating new node instance which would require heap allocation.

If the value can’t be set atomically then we check whether prev is null, if the current node is head node then we create a new node instance and assign it the appropriate _buckets array index. Else if the node is not head node then we simply update previous Node next field with the new node.
If key is not found then we directly update _buckets array index with the new node.

Making all of the above changes ensures that AddOrUpdate does atomic changes. So that other threads reading the changes either see the complete change or old change, nothing in between.

Let’s now tackle the 3rd issue of stale data due to multiple cpu cores, as different threads can run on different cpu cores and can cache data in registers or L1, 2 caches. This maybe a big problem when the dictionary resizes to accommodate additional key/values. To avoid the mentioned issue, we can use volatile reads/writes so that read/write happen directly in memory avoiding registers and cache.

private VolatileNode[] _buckets;

private struct VolatileNode
{
    internal volatile Node? _node;
}

Instead of directly accessing Node, we will access it through VolatileNode struct as _node is now volatile, the reads/writes will directly go to memory bypassing any caching or register storage and any cpu optimisations due to cpu instruction reordering. And _buckets is now of type VolatileNode instead of Node so that we work with VolatileNode everywhere.

AddOrUpdate is updated above to work with VolatileNode instead of Node. Volatile.Write is used when value type write is non atomic and the existing key represents the head node or when the key is not found. This would ensure that during dictionary resize and during reads of dictionary keys in TryGetValue defined below, there is no multi threading issue.

public bool TryGetValue(TKey key, out TValue value)
{
    int bucketIndex = GetBucketIndex(key);
    Node node = _buckets[bucketIndex]._node;
    while (node != null)
    {
        if (EqualityComparer<TKey>.Default.Equals(node.Key, key))
        {
            value = node.Value;
            return true;
        }
        node = node.Next;
    }
    return false;
}

Above is the TryGetValue method where we can see that there is no lock, which is good as the optimisations done in AddOrUpdate is enough to take care of all multi threading issues so that TryGetValue method can be lock free ensuring faster execution of the method.

123

Code is inspired from C# ConcurrentDictionary implementation

The code provided is an edited version of initial code obtained from ChatGPT, which is not tested, so there could be inaccuracies.

The topic covered is not definitely the most comprehensive or accurate to handle multi threading, it is just a guide to understand what are potential multi threading issues and how they can be handled. Please use this along with other references to have better understanding on the topic.

Ashish’s Substack

Discussion about this post