Sunday, February 24, 2013

Introduction

Spawn is the open-source multithreading library funded by Tigerspike which was developed for Orthohack. Due to its generic nature, it has been released here as its own project. Here it is on Github.

Spawn was designed to shield C code from the vomit of PThreads (POSIX Threads) which is the POSIXy interface to CPU multithreading. It does so in a 32/64 agnostic manner. PThreads suffers from numerous design flaws which inhibit the ability of C programs to efficiently map their threads to the processor and memory topology on a particular platform. Although it provides some capabilities in this regard, the information required to make efficient choices generally resides in platform-dependent data structures outside the context of PThreads. So PThreads clients the pay price of the complexity required to support such capabilities, without the data required to exploit them effectively.

Also, it appears that PThreads and/or Linux is very inefficient at launching threads, even when idle, with latencies often in the millisecond range. Spawn makes some attempt to hide this latency through pipelining, as well as minimizing calls to the OS which are likely to result in a response of "I'm busy, so please call back later". One of PThreads' worst problems is the fact that the target thread can receive only 1 pointer as an input parameter; this forces multiply duplicated data structures when both a global readonly database and a thread number are required in order to execute a thread (i.e. the usual case). Spawn hides this mess by providing an interface to pass both items to the target thread.

PThreads also create lots of ways in which a program can fail while leaving threads dangerously active. An attempt has been made, involving documentation and intuitive macros, to help programmers implement Spawn in a threadsafe manner, yet without sacrificing too much of the available parallelism or having to worry about sundry POSIX error codes.

There are 2 modes (we told you it was simple): SPAWN() and SPAWN_ONE(), which are mutually exclusive when using the same handle (spawn_t data structure, to be precise). The former spawns a group of threads. The latter spawns a single thread (with the idea being to repeat this several times). The caller then synchronizes using SPAWN_RETIRE_ALL() to finish. (It's not quite that simple, so please read the headers in the functions in spawn.c, as you follow along with the demo in demo.c.)

It's possible that we can improve Spawn to deal with cache affinity and nonuniform memory architecture (NUMA) proximity domains in the future. But for now, it's a lot better than dealing with  PThreads.

See README.TXT for starters.