Comments by "LoneTech" (@0LoneTech) on "Developer Voices"
channel.
-
2
-
2
-
I totally agree on preferring Rust over C++, but it isn't a magical silver bullet. What you want isn't just CUDA in Rust, but the best pieces of Halide, Chapel and Futhark. Chapel has a strong concept of domain subdivisions and distributed computing, Halide has algorithm rearrangements, Futhark has a less noisy language with some strong library concepts like commutative reductions and tooling that can autotune for your data. You'd also want a reasonably integrated proof system, as in Idris 2.
The core thing that Chapel and Halide bring is the ability to separate your operational algorithm from your machine optimizations. E.g. if you chunk something for optimization, the overall operation is still the same. Futhark does some of that too, but only profile guided. Some fields approach this by separately writing formal proofs that two implementations are equivalent instead, but it's a much smoother process if you can maintain that as you write, like Idris attempts.
1
-
You're right, pipelining is time domain multiplexing at the function block level. If you have some fairly complex function, it takes time to complete it, as it propagates through multiple layers of logic. If we add registers spread out in that deep logic, the depth is lower so we can raise the frequency, but the new registers must then be filled with more data. The stages of the pipeline are like work stations along a conveyor belt. It's the same in CPUs; a pipelined CPU has multiple instructions at varying stages of completion. A revolver CPU, such as XMOS XS1, runs instructions from multiple threads to ensure they're independent (generic name SMT, hyperthreading is one example). MIPS instead restricts the effects, such that the instruction after a branch (in the delay slot) doesn't need to be cancelled. DSPs like GPUs specialize in this sort of thing, and might e.g. use one decoded instruction to run 16 ALU stages for 4 cycles (described as a wavefront or warp).
1