In a nutshell, if two threads can access the same object concurrently (without synchronization), and at least one is a writer (performing a non-const operation), you have a data race. For further information of how to use synchronization well to eliminate data races, please consult a good book about concurrency.


Example, bad(反面示例)

There are many examples of data races that exist, some of which are running in production software at this very moment. One very simple example:


int get_id()
 static int id = 1;
 return id++;

The increment here is an example of a data race. This can go wrong in many ways, including:


  • Thread A loads the value of id, the OS context switches A out for some period, during which other threads create hundreds of IDs. Thread A is then allowed to run again, and id is written back to that location as A's read of id plus one.

  • 线程A获取id的值之后操作系统上下文从A中退出一段时间,这时另外的线程生成了几百个ID。接着线程A继续运行,这时id重新被写入,而值是A读取的局部变量加1之后的结果。

  • Thread A and B load id and increment it simultaneously. They both get the same ID.

  • 线程A和B同时获取id并加1。它们得到同样的ID。

Local static variables are a common source of data races.


Example, bad(反面示例):

void f(fstream&  fs, regex pattern)
   array buf;
   int sz = read_vec(fs, buf, max);            // read from fs into buf
   gsl::span s {buf};
   // ...
   auto h2 = async([&] { sort(std::execution::par, s); });     // spawn a task to sort
   // ...
   auto h3 = async([&] { return find_all(buf, sz, pattern); });   // spawn a task to find matches
   // ...

Here, we have a (nasty) data race on the elements of buf (sort will both read and write). All data races are nasty. Here, we managed to get a data race on data on the stack. Not all data races are as easy to spot as this one.


Example, bad(反面示例):

// code not controlled by a lock

unsigned val;

if (val < 5) {
   // ... other thread can change val here ...
   switch (val) {
   case 0: // ...
   case 1: // ...
   case 2: // ...
   case 3: // ...
   case 4: // ...

Now, a compiler that does not know that val can change will most likely implement that switch using a jump table with five entries. Then, a val outside the [0..4] range will cause a jump to an address that could be anywhere in the program, and execution would proceed there. Really, "all bets are off" if you get a data race. Actually, it can be worse still: by looking at the generated code you may be able to determine where the stray jump will go for a given value; this can be a security risk.



Some is possible, do at least something. There are commercial and open-source tools that try to address this problem, but be aware that solutions have costs and blind spots. Static tools often have many false positives and run-time tools often have a significant cost. We hope for better tools. Using multiple tools can catch more problems than a single one.


There are other ways you can mitigate the chance of data races:


  • Avoid global data

  • 避免全局数据

  • Avoid static variables

  • 避免静态数据

  • More use of value types on the stack (and don't pass pointers around too much)

  • 在堆栈上更多地使用值类型(并且不要来回传递指针)

  • More use of immutable data (literals, constexpr, and const)

  • 更多地使用不可修改的数据(literals, constexpr, and const)

