Joy of Parallelism: Am I a Work Pirate (Stealer)?

Rise of Parallelism:

We are already well ahead of the curve that transformed single processor machines into multi core revolution in hardware. This transformation has not yet fully been exploited by, ordinary to critical applications running on current hardware. Thanks to Microsoft for helping even the masses to unleash the power of parallelism. This post after a long hiatus from my side, comes with a bang about the nuances that drive the parallelism into the fore of technology world. This post talks about the coveted and cool algorithm that takes the multicore machines and turns into a massive cruncher.

The Crux of Parallelism:

The CPU intensive iterative/routine/concurrent tasks(fine grained language instruction sets) need to be treated as chunks (partitioned) that run on various multi core processors simultaneously and thereby after completion, consolidate the results of individual iterations(that run on respective processors) and present it to caller. These are not I/O intensive tasks, but these are processor intensive.

Guts of Work Stealing Algorithm:

That said, work stealing algorithm efficiently deals with optimally assigning and exploiting the power of unused /idle processors and, also balances the overall load amongst the processors. How? Each processor has a special abstract data structure, double ended queue called as deque. This is a structure where one can push and pop on both the ends. There is a private deque assigned to each processor and also global deque common to all the processors. As when a task is spawned or forked (remember how a task is spawned in UNIX), that task is assigned/pushed to the local deque of the respective processor and it pops from bottom of deque and the processor monitors it by context-switching if there are more than one tasks in its deque. During execution of a single task in a given processor, it can lead to spawning of multiple tasks which are pushed into the local (private) deque. The interesting part here is if the processor happens to run out of tasks to execute or in other words- idle, then that processor in context can steal tasks from other processor’s deque (private) or global deque, keeping them busy. Now you see why the word stealing is there in its name.

Conclusion:

Existing popular solutions in multi core parallelism advocates a shared memory concept rather than private memory. The latter part comes with a drawback of data synchronization cost. On the other hand work stealing algorithm which when leveraged in solving the parallel conundrum, helps to optimally use the idle processor without the cost of accessing the public memory share but with private memory fenced deque.

Reference:

[1] Scheduling Parallel Programs by Work Stealing with Private Deque. (Arthur Chargueraud, Umut A. Acar and Mike Rainey).