Building Concurrent Dictionary
Many languages such as Swift, C#, Java, C++ etc provide multi threading capabilities to enable quicker execution of computing tasks without blocking the main thread. Writing thread safe code is challenging as it leads to numerous issues such as: race condition, deadlocks, starvation, memory inconsistency, performance overheads due to context switching etc. Though, multi threading provides lots of capabilities, but due to challenges mentioned, getting it right is challenging and sometimes may lead inferior performance. Let’s try to build a Concurrent Dictionary and explore the challenges further and how they can be tackled.
Example of class
ConcurrentDictionary
provides a basic implementation of concurrent dictionary. At present, we have exposed publicly only AddOrUpdate
method. Let’s go deeper into its implementation.
Thread executing the method takes lock to
_syncRoot
object if no other thread has a lock to it. Basically, it sets flag to tell OS that a thread has already taken a lock.Until the current thread doesn’t execute the whole code block under
lock
, no other thread can execute the code block. This ensures that adding or updating a key/value is thread safe.
Let’s explore some issues with the AddOrUpdate
method. To keep the post short, let’s focus on only 3 major issues.
As there is only one
_syncRoot
object, that means at a time only thread can update/add new key/value in the concurrent dictionary. Which makes current implementation not performant to many concurrent adds/updates as threads would wait for a thread having current lock to release. There would be huge contention issue. As the size of dictionary grows, the situation would be a huge bottleneck.node.Value = value:
this sets an existingnode
for akey
with the new nodevalue
. This code is non atomic, which means, if another thread is reading or updating this node then it may lead to inconsistent data. For instance let’s consider the following struct being set assigned tonode.Value = new Point(10, 20);
If thread A is setting this value, and if thread B is readingnode
’s value it, it may happen that during read, X would have previous value and Y may have new value. This results in inconsistent data.struct Point { public int X; // First integer field public int Y; // Second integer field public Point(int x, int y) { X = x; Y = y; } }
Stale data: As modern cpus have multiple cores with independent L1,2 caches and registers; common L3 cache, due to this a core may access/write the data from registers/cache instead from memory, this could lead different copies of data available to different threads.
Let’s change the code to tackle the above issues starting with 1st issue.
The above code change tackles the issue 1 mentioned earlier.
_locks
array is initialised with the required concurrency, if the concurrency level passed is lesser than 1 then we use the number of cores as the concurrency level.AddOrUpdate
method: the appropriate_locks object
is obtained based on thebucketIndex
obtained earlier.
This would ensure that concurrently multiple threads can mutate the ConcurrentDictionary. Also, it ensures that only 1 thread can mutate a key at a time since there is 1:1 mapping between akey
andlockNo
so that only one thread can have lock for a given a key.
Let’s fix the 2nd issue.
The changes done are to ensure that the writes are atomic.
if (!typeof(TValue).IsValueType || ConcurrentDictionaryTypeProps<TValue>.IsWriteAtomic)
{
node._value = value;
}
Above condition checks whether value being set is reference type or value type but can be updated atomically. If true, then it directly sets the new value to the existing key Node. This is also performant as we are not creating new node instance which would require heap allocation.
If the value can’t be set atomically then we check whether
prev is null,
if the current node is head node then we create a new node instance and assign it the appropriate_buckets
array index. Else if the node is not head node then we simply update previous Node next field with the new node.If key is not found then we directly update
_buckets
array index with the new node.
Making all of the above changes ensures that AddOrUpdate
does atomic changes. So that other threads reading the changes either see the complete change or old change, nothing in between.
Let’s now tackle the 3rd issue of stale data due to multiple cpu cores, as different threads can run on different cpu cores and can cache data in registers or L1, 2 caches. This maybe a big problem when the dictionary resizes to accommodate additional key/values. To avoid the mentioned issue, we can use volatile reads/writes so that read/write happen directly in memory avoiding registers and cache.
private VolatileNode[] _buckets;
private struct VolatileNode
{
internal volatile Node? _node;
}
Instead of directly accessing Node
, we will access it through VolatileNode
struct
as _node
is now volatile
, the reads/writes will directly go to memory bypassing any caching or register storage and any cpu optimisations due to cpu instruction reordering. And _buckets
is now of type VolatileNode
instead of Node
so that we work with VolatileNode everywhere.

AddOrUpdate
is updated above to work with VolatileNode
instead of Node. Volatile.Write
is used when value type write is non atomic and the existing key represents the head node or when the key is not found. This would ensure that during dictionary resize and during reads of dictionary keys in TryGetValue
defined below, there is no multi threading issue.
public bool TryGetValue(TKey key, out TValue value)
{
int bucketIndex = GetBucketIndex(key);
Node node = _buckets[bucketIndex]._node;
while (node != null)
{
if (EqualityComparer<TKey>.Default.Equals(node.Key, key))
{
value = node.Value;
return true;
}
node = node.Next;
}
return false;
}
Above is the TryGetValue
method where we can see that there is no lock
, which is good as the optimisations done in AddOrUpdate
is enough to take care of all multi threading issues so that TryGetValue
method can be lock
free ensuring faster execution of the method.
Code is inspired from C# ConcurrentDictionary implementation
The code provided is an edited version of initial code obtained from ChatGPT, which is not tested, so there could be inaccuracies.
The topic covered is not definitely the most comprehensive or accurate to handle multi threading, it is just a guide to understand what are potential multi threading issues and how they can be handled. Please use this along with other references to have better understanding on the topic.