Deadlocking in centralized counting barrier implementation

For project 2 of my advanced operating systems course, I quickly typed up what I thought was a functional simple centralized counting barrier implementation. However, after launching the compiled executable multiple times in a row, I noticed the program would hang and not immediately exit … damn deadlock. Instead of loading the debugger and inspecting each of the thread stack frames, I revisited the code and reasoned about why the code would deadlock.

In the code (below), lines 29-32 are the culprit for the race condition. Just as one thread (say thread B) is about to enter the while (count > 0) loop, another thread (the last thread) could reset the count = NUM_THREADS. In this situation, thread B would continue spinning: forever.

Centralized Barrier Example from Lecture Slides

Centralized Barrier

Code Snippet

#include <stdbool.h>
#include <omp.h>
#include <stdio.h>

#define NUM_THREADS 3


int main(int argc, char **argv)
{
    int count = NUM_THREADS;
    bool globalsense = true;

#pragma omp parallel num_threads(NUM_THREADS) shared(count)
    {
#pragma omp critical
        {
            count = count - 1;
        }

        /*
         * Race condition possible here. Say 2 threads enter, thread A and
         * thread B. Thread A scheduled first and is about to enter the while
         * (count > 0) loop. But just before then, thread B enters (count == 0)
         * and sets count = 2. At which point, we have a deadlock, thread A
         * cannot break free out of the barrier
         *
         */

        if (count == 0) {
            count = NUM_THREADS;
        } else {
            while (count > 0) {
                printf("Spinning .... count = %d\n", count);
            }
            while (count != NUM_THREADS){
                printf("Spinning on count\n");
            }
        }

    }

    printf("All done\n");
}