Comments by "LoneTech" (@0LoneTech) on "Developer Voices" channel.

Linux is an RTOS, though many don't use that functionality (e.g. core reservation, memory locking, scheduler replacement). And there are real time programming languages, e.g. the Copilot Realtime Programming Language and Runtime Verification Framework. It's common to relegate hard realtime tasks of limited complexity to coprocessors like PRUs in BeagleBone, PIOs in Raspberry Pi MCUs, or separate microcontrollers. An example of such a task is dynamic voltage and frequency scaling in mainline CPUs.
2
For those who'd like to scratch more surfaces, you might want to look into CSP (e.g. Transputer, XMOS, Occam), Clash and Futhark, or some DSPs like the Epiphany processors (with their dedicated loop modes). C is amazingly awkward for functional pipelines.
2
I totally agree on preferring Rust over C++, but it isn't a magical silver bullet. What you want isn't just CUDA in Rust, but the best pieces of Halide, Chapel and Futhark. Chapel has a strong concept of domain subdivisions and distributed computing, Halide has algorithm rearrangements, Futhark has a less noisy language with some strong library concepts like commutative reductions and tooling that can autotune for your data. You'd also want a reasonably integrated proof system, as in Idris 2. The core thing that Chapel and Halide bring is the ability to separate your operational algorithm from your machine optimizations. E.g. if you chunk something for optimization, the overall operation is still the same. Futhark does some of that too, but only profile guided. Some fields approach this by separately writing formal proofs that two implementations are equivalent instead, but it's a much smoother process if you can maintain that as you write, like Idris attempts.
1
You're right, pipelining is time domain multiplexing at the function block level. If you have some fairly complex function, it takes time to complete it, as it propagates through multiple layers of logic. If we add registers spread out in that deep logic, the depth is lower so we can raise the frequency, but the new registers must then be filled with more data. The stages of the pipeline are like work stations along a conveyor belt. It's the same in CPUs; a pipelined CPU has multiple instructions at varying stages of completion. A revolver CPU, such as XMOS XS1, runs instructions from multiple threads to ensure they're independent (generic name SMT, hyperthreading is one example). MIPS instead restricts the effects, such that the instruction after a branch (in the delay slot) doesn't need to be cancelled. DSPs like GPUs specialize in this sort of thing, and might e.g. use one decoded instruction to run 16 ALU stages for 4 cycles (described as a wavefront or warp).
1