Parallelism and concurrency
In September 2022, Nvidia CEO Jensen Huang declared Moore’s Law dead, while Intel CEO Pat Gelsinger said it was still alive. This highlights the disagreement even among industry leaders.
Preemptive and cooperative multitasking
Multitasking enables concurrency between processes by time-sharing i.e. a single CPU core. There are two main strategies to achieve that:
Preemptive multitasking is usually used in most modern OS. A scheduler determines which process should execute next and uses an interrupt meachnism to suspend currently running processes. In this way
In cooperative multitasking, the actual task has to yield the control back to the scheduler. By returning control to the scheduler the task behaves cooperatively, so that other task can be run as well. Here the programmer needs to ensure that the tasks yields frequently enough.
Co-routines
“functions whose execution you can pause”.
the values of data local to a coroutine persist between successive calls; the execution of a coroutine is suspended as control leaves it, only to carry on where it left off when control re-enters the coroutine at some later stage.
Multithreading
Threads share the same resources, i.e. memory of a process, what makes them more lightweight than using multiple processes. Multithreading specifically refers to the concurrent execution of more than one sequential set (thread) of instructions. However, the hardware has to support running multiple threads in parallel (by providing multiple cores) to truely run multiple threads in parallel. Otherwise threads will run “only” concurrently, usually following the preemptive multitasking paradigma (of the OS, or some scheduler).
Hardware multithreading
https://de.wikipedia.org/wiki/Simultaneous_Multithreading
Async and event loops
While the cooperative multi-tasking paradigma is quite old, with node.js, eventloops and asynchronous programming gained popularity (again). events are handled concurrently, what allows many concurrent operations like network requests, disk I/O, without blocking the execution.
Nodejs
Node.js is single-threaded and follows the pre-emptive multitasking strategy.
worker_threads module (not turely threads):
- Each worker threads runs in its own instance of the V8 JS engine, doesn’t share memory
- Woker thread communicate with main thread via message passing
multiprocessing:
- child_process module
- communicate via ipc with parent process
Python
asyncio: Single-threaded event-loop
- perform I/O operations without blocking main thread
Threading: Thread
- Can not run in parallel (GIL)
- Can be used to wait for IO, not cpu intensive workload
Multiprocessing: Multiprocessing
- Each process has its own python interpreter and memory space
- IPC via queues and pipes
IronPython and Jthon
If external code is thread-safe and releases the GIL (like numpy and pandas), true parallelism can be achieved When the external code needs to interact with Python objects or call back into Python code, it must reacquire the GIL. During these interactions, the GIL ensures thread safety but again limits parallelism.
A thread must hold the GIL to call CPython C APIs. Python code running in the interpreter, like x = f(1, 2), uses those APIs. Every == comparison, every integer addition, every list.append: it’s all calls into Python C APIs. Thus threads running Python code must hold on to the lock when running. Other threads can’t acquire the GIL, and therefore can’t run, until the currently running thread releases it, which happens every 5ms automatically. Long-running (“blocking”) extension code prevents the automatic switching. Python extensions written in C (or other low-level languages) can however explicitly release the GIL, allowing one or more threads to run in parallel to the GIL-holding thread.
Java
Golang
Goroutine
Lightweight thread
- Managed by go runtime, not by OS
- Starts with small stack size, that adapts as needed (more go routines can exist at a time)
- Multiplexed goroutines onto threads, makes context switching cheaper
- Go handles goroutine scheduling, load balancing cores and memory management
Tight loops or CPU-bound goroutines can prevent the scheduler from running other goroutines, limiting concurrency and parallelism. Goroutines must yield control voluntarily for others to run, e.g. by blocking on I/O, channels, or sleeping. runtime.GOMAXPROCS(n)
Java
- Multithreading:
- Thread, Runnable
- Fork/join
- Parallel streams: Automatically split onto multiple threads
- ThreadPoolExecutors
- Concurrent ds: ConcurrentHashma, CopyOnWriteArrayList
- Synchronization methods:, i.e. synchronized,
- Async: Completeable Future (Java 8), Callbacks and Futures
- Multiprocessing:
- Spawn multiple JVM processes
- IPC via sockets/shared memory
- Virtual threads (Java 19/21): Similar to golang idea/coroutines
Cpp
- c++11: async function template, std:future, std:promise for async comm between threads
Threading:
- std:thread, std::mutex, std::lock_guard, and std::unique_lock for thread synchronization and mutual exclusion.
Multiprocessing:
-
spawn child rocesses, message passing libraries like MPI
-
OpenMP: shared-memory parellilsm
-
TBB: task-based parallelism
-
cpp parallel stl: parallel imp of standard template library
rust
Fearless concurrency: Rust’s ownership system, borrow checker, and trait bounds like Send and Sync enable “fearless concurrency”. 4 9 11 The compiler catches concurrency bugs like data races and invalid sharing at compile-time, preventing many runtime issues. 4 11 Rust’s concurrency primitives and patterns are designed to be safe and prevent common pitfalls. 1