Understanding Rust Thread Safety
Programming interviews can be a great learning experience for interviewers as well as interviewees. In fact, I recently had the opportunity to learn something new about Rust while interviewing an engineering candidate.
Background About our Interview Process
At OneSignal, we don't believe in handing someone a problem and watching them flounder with zero autocomplete and zero docs. Engineers who apply to join our team are allowed to use their own environments, access documentation, and treat the interviewer as a resource. If a candidate can solve a problem with some gentle assistance from the interviewer or by asking a few questions, then they're much closer to the real development experience. Senior engineers on our teams frequently act as resources for other developers, and if a candidate can get to a solution by treating the interviewer like a colleague then they're halfway home. Candidates often learn something new from the interviewer, and we've had great feedback on our process.
The Learning: Arc and rwlock
The candidate was working on a problem involving concurrency and needed to share the same std::sync::mpsc::Sender
with multiple threads concurrently. They wrote code analogous to the following (extremely simplified) example:
Now you may initially be thinking that it's just wrong to put a channel sender inside of an RwLock
— and you'd be right. Both of these types are useful for cross-thread synchronization, and it's odd to use both of them together. Each Sender
is intended to be owned by a single thread, and if you need to coordinate between multiple threads you should be calling Sender::clone()
. However, I urge you to recall that this was taking place during an interview, and concurrency was not the only thing going on in this program. The candidate had made other architectural decisions that led them here, and this small example has removed all of that context. Rust is a complex language with many types of synchronization primitives available, and during the time constraints of an interview, it's not unreasonable for someone to combine them in unusual ways.
The Rust compiler had a much stricter and less egalitarian view on this combination of types, however. When we tried to compile it, we were greeted with this error:
This confused both of us a lot. I know that Sender<T>
can be sent across threads safely — that's the whole point of it! We tried a few things (using .write()
on the RwLock
instead of .read()
) but we couldn't get it to compile. Eventually, I wondered if the code would work if we used the simpler Mutex
instead of an RwLock
. I had the candidate try this instead, and the program works. Here's what that program looks like:
Running this program produces the output that we expected from the first one:
[src/main.rs:12] rx.recv().unwrap() = 4
It's worth pointing out that if you find yourself putting a Sender
inside some type of Mutex
or locking data structure, then you may want to reconsider your program's structure. But why does the RwLock
version fail, and the Mutex
version work?
To answer this question, we need to consider the meaning of the Send
and Sync
traits, the Sender
, RwLock
and Mutex
types, and look closely at the implementations of Send
and Sync
for all of the types in question. Let's start with RwLock
and Mutex
.
RwLock vs. Mutex
Let's look at the differences between the two synchronization types that were used in this example, RwLock
and Mutex
. They are conceptually very similar — both allow you to guard a resource to ensure that it is accessed from multiple threads in a safe way. They have one important difference, however. Mutex
holds a lock for both reads and writes, whereas RwLock
treats reads and writes differently, allowing for multiple read locks to be taken in parallel but requiring exclusive access for write locks. The way that RwLock
enforces single writer, multiple reader rules is the same as the single &mut
, multiple &
reference rules for single-threaded code. Here's a visual of how the same sequence of reads & writes occurs with a Mutex
and an RwLock
.
As the diagram shows, when you have a read-heavy workload it is preferable to use an RwLock
to avoid blocking reader threads when it's not required. The same sequence of events takes much longer with a Mutex
than it does with an RwLock
.
We thought that we could get away with using an RwLock
around our Sender
because the method that we needed to call (Sender::send(&self, value: T)
) requires an immutable (read) reference to the underlying Sender
rather than a mutable &mut self
reference. This implies that we could use RwLock
and RwLock::read()
to guard calls to Sender::send()
. This would allow us to spend much less time waiting on synchronization than if we used a Mutex
around the Sender
.
Send
The Send
trait is the simpler of the two traits to explain conceptually. A type is Send
if it is safe to transfer it across thread boundaries. This means that you can construct an instance of the type on one thread, transfer ownership to another thread, and Drop
it on that other thread. Most types in Rust are Send
, but there are a few exceptions. The most common non-Send
types are Rc
and all raw pointer types. The raw pointer types are not Send
because Rust cannot make any guarantees about the synchronization mechanisms that are in the underlying types, or when the pointers will be accessed. Rc
is not Send
because it uses a non-atomic reference counter which is not safe to send between threads.
Sync
Sync
is a bit trickier to explain. A type T
is Sync
if and only if &T
is Send
. This means that it must be safe to share immutable references of T
between threads. A simpler way to think about this is to imagine if it would be safe to have multiple immutable references of type T
being read from in parallel from multiple threads. The most common non-Sync
types are UnsafeCell
, Cell
, and RefCell
. These types all rely on unsynchronized mutation, so they are inherently not thread-safe.
Sender
The std::sync::mpsc::Sender
type is the producer-half of a multi-producer, single-consumer channel. It's intended to be used by calling Sender::clone
and using a single owned Sender
per-thread. This pattern permits Sender
to be non-Sync
because each individual instance of Sender
does not need to synchronize with itself across multiple threads. It only needs to be Send
so that instances can be moved from thread to thread with ownership staying consistent.
Send + Sync in our programs
Let's look at the Send
& Sync
implementations for all of the types that are in our example programs. Our programs used Arc<Mutex<Sender<u8>>>
(works) and Arc<RwLock<Sender<u8>>>
(does not work).
impl<T: ?Sized + Send> Send for RwLock<T>
impl<T: ?Sized + Send + Sync> Sync for RwLock<T>
impl<T: ?Sized + Send> Send for Mutex<T>
impl<T: ?Sized + Send> Sync for Mutex<T>
impl<T: Send> Send for Sender<T>
impl<T> !Sync for Sender<T>
impl<T: ?Sized + Sync + Send> Send for Arc<T>
impl<T: ?Sized + Sync + Send> Sync for Arc<T>
The !Sync
syntax indicates that Sender<T>
is explicitly not Sync
. The ?Sized
syntax is not relevant to this discussion.
There are a few important things to note here:
Sender<T>
is not Sync
. It's not safe to have multiple immutable references to a Sender
live across multiple threads at the same time. Considering that Sender::send(&self)
requires only an immutable reference, there must be some kind of interior mutability going on.
Mutex<T>
is both Send
and Sync
if T
inside of it is Send
. If something is inside of a Mutex
, it will never have multiple immutable references live at the same time since Mutex
does not allow multiple locks (read or write) to be taken at the same time. Because of this, Mutex<T>
effectively bypasses the restrictions of Sync
.
RwLock<T>
is Send
where T
is Send
, but it is only Sync
if T
is both Send + Sync
. Since most types are Send + Sync
, this is a non-issue. It only causes problems because the Sender<T>
within it is not Sync
. Recall from our explanation that RwLock
allows multiple read locks to be open in parallel.
Arc<T>
is only Send
and Sync
if the underlying T
is both Send + Sync
, meaning that you cannot send an Arc<T>
across thread boundaries where T: !Sync
.
These things all cascade to cause the compiler error that we saw in the first program. Sender<T>
is not Sync
meaning that RwLock<Sender<T>>
is not Sync
, meaning that Arc<RwLock<Sender<T>>>
is neither Send
nor Sync
. That means that this type, despite containing three different supposedly thread safe synchronization types is not thread safe at all.
So what about thread safety?
I think that this compiler error and the resulting exploration of the lock types and Send
& Sync
was a fantastic demonstration of the power of Rust's type system. Coming from other systems programming languages, these ideas are often much more nebulous. A type is often labeled (only via documentation) as "thread-safe" or "not thread-safe." In Rust, the Send
and Sync
traits allow us to express much more granular ideas about thread safety, and communicate them much more clearly. A Sender<T>
is "thread safe" in that it can be sent between threads, but it cannot be shared between threads. That's what Send
and !Sync
tells us. There are so many layers of safety to multithreaded programming, and those layers exist in other systems languages, but in Rust we have the compiler to help us through them instead of relying on our own memory of how they work.
In an interview setting, would the un-safety of Arc<RwLock<Sender<u8>>>
have caused program crashes? Maybe, maybe not. It would have caused issues under a production workload for sure, and I'm glad that we have Rust to point these things out to us for those times.
I really enjoyed going on this dive into thread safety in Rust, and it was all due to collaborating with an interview candidate on solving an interesting problem. If you like solving interesting problems, check out our jobs board and join our team!
Join our Team!