July 21, 2018
How does your OS see goroutines?
If you’re beginning your advanture with Go, you’re probably attracted by Go’s support of concurrency. That’s the main selling point of this language. If you know some Go, you probably tell your friends that Go’s model of concurrecy is based on CSP (Communicating Sequential Processes). But how does it look under the hood? Does it use threads, green threads? If someone asked you to explain it how does it look from an OS perspective, what would you say? Let’s have a look closer.
In most cases, understanding a concept comes down to the ability to explain and understand the basics.
Multiple cores in a single CPU
Keeping each CPU in a different chip isn’t really good for performance - the lattency for communication is too high. Plus it can produce a lot of heat in the long run. Placing multiple CPUs (called cores) into a single processor lowers the lattency and enhances speed.
As Go developers we care about concurrency. One CPU can process a single task at a time but why sometimes OS shows that there is twice more cores than we actually have in our processor? That is most likely due to Intel® Hyper-Threading Technology (Intel® HT Technology) that delivers two processing threads per physical core. If you have got a Hyper-Threading-enabled Dual Core CPU, the OS will spot four logical CPUs. Hardware execution resources will be split and arranged - this can provide somewhat better performance. If you own a AMD processor, I read that AMD has its own technology for virtual cores.
Like I said earlier, one CPU can process a single task at a time - this task is a process. One CPU is switching quickly among processes giving the illusion of parallesism. If we have a multicore CPU (let’s say 4 cores), we’ll have four executing processes each one with its own control flow and each one running independently of another ones.
A process is basically a program in execution with its own memory space (stack, heap, text, data) and can have one of the following five states at a time (start, ready, running, waiting, terminated (or exit)). Rapid switching between processes (programs), back and fourth is called multitasking.
A process must have at least one thread (called main thread) but it usually contains multiple threads The primary difference is that threads within the same process run in a shared memory space, while processes run in separate memory spaces. The idea is to achieve parallelism by dividing a process into multiple threads running in a quasi-parralel context as they were seperate processes. Processes are used to group resources together; threads are the entities scheduled for execution on the CPU. Having multiple threads running in parallel in a single process context is the same as having multiple processes running in parallel on one computer.
The term multithreading is used to describe the situation where multiple threads are running in the context of the single process. When a multithreaded process is run on a single CPU system, the threads take turns running. By switching between multiple processes, the system gives the illusion of paralleism. Multithreading works the same way. With three threads in a process, the threads appear in parallel, each one a CPU, with approx. 1⁄3 (it depends on OS, scheduling algorithm and more) the time that the CPU has scheduled for its process. On multicore system, the situation is similar, only that each CPU core executes threads in same manner but more threads can run in parallel, giving us the power of native hardware parallism and multithreaded execution.
Goroutines vs threads
Ok, grasping the basics, we can finally start talking about goroutines. Go provides goroutines for concurrency and a goroutine is a function that is running concurrently alongside other parts of the source code. Program initialization runs in a single goroutine (every Go program has at least one (the main) goroutine), but that goroutine may create other goroutines, which run concurrently.
The golang runtime creates OS threads (pthreads) equal to the number of GOMAXPROCS for the go program and goroutines are scheduled on these limited OS threads by the golang runtime. It’s important to understand that goroutines exists only in the virtual space of the go runtime and not in the OS.
Threads consume a lot of memory due to their large stack size (≥ 1MB). So creating 1000s of thread means you already need 1GB of memory. A goroutine is created with initial only 2KB of stack size. Each function in go already has a check if more stack is needed, and the stack can be copied to another region in memory with twice the original size (in Go, the stack grows and shrinks as needed, much like resizing a hashtable, a new large stack is allocated and through some very tricky pointer manipulation, all the contents are carefully copied into the new, larger, stack). This makes goroutine very light on resources.
The Go mantra
Don’t communicate by sharing memory, share memory by communicating. Go handles all of the synchronization for you, if two goroutines need to share data, they can do so safely over a channel which exists only in virtual space, so the OS doesn’t block the thread. Channels are essentially syncronised message queues. When a goroutine blocks, such as by calling a blocking system call, the go runtime automatically moves other goroutines on the same operating system thread to a different, runnable thread so they won’t be blocked. The programmer sees none of this, which is the point.
The Go runtime
The Go runtime manages scheduling, garbage collection and the runtime environment for goroutines among other things.
The runtime keeps track of each goroutine, and will schedule them to run in turn to a pool of threads belonging to a process. Goroutines are separate from threads but rely upon them to run, and scheduling goroutines onto threads effectively is crucial for the efficient performance of Go programs. So, while there might be multiple threads created for a process running a Go program, the ratio of goroutines to threads should be much higher than 1 to 1. Multiple threads are often neccesary to ensure that goroutines are not blocked. It is important to know that all the OS sees is a single user level process requesting and running multiple threads.
I hope this article helped you to understand how Go provides concurrency under the hood.
 “Analysis of the Go runtime scheduler” by Neil Deshpande, Erica Sponsler, Nathaniel Weiss
 “C++ Concurrency in Action” by Anthony Williams
 “C++ Multithreading Cookbook” by Milos Ljumovic