Comments by "Lawrence D’Oliveiro" (@lawrencedoliveiro9104) on "Multiple Processor Systems - Computerphile" video.

6:57 There is another technique, called “pipelining”. Or, in industrial terms, you set up an “assembly line”, where one person just butters the bread, and passes the slices onto the next person to put in the cheese and slap them together. Modern CPUs use this technique as well as multiple processors. And then there’s also caching.
20
“Multi computer” would imply entirely separate physical boxes connected by cables. These might be regular LAN interconnects, but custom high-speed ones are also used (e.g. Fibre Channel, I think). One also talks about “tightly coupled” versus “loosely coupled” multiprocessor machines. Typically “loosely coupled” ones don’t share RAM, so they communicate by sending messages over communication channels of some sort. “Tightly coupled” ones share (almost) their entire RAM space, so they can communicate just by reading and writing shared memory. A small step back from full tight coupling would be NUMA (“Non-Uniform Memory Access”), where each processor has some local memory, and can also access memory belonging to other processors, but the latter accesses are slower. So this logically appears just like tight coupling, and is almost as easy to program.
4
10:35 Another kind of SIMD was introduced back in in the 1970s with the Cray machines. These had instructions which could act, element-by-element, on arrays of up to 64 elements with a single invocation. However, the operations were not performed simultaneously on every element, but were pipelined so that, after the initial instruction setup, the result elements would be churned out in succession very quickly. Contrast this with PowerPC Altivec (which I think predates Intel’s initially rubbish MMX), where operations are performed on all elements of the operand vector simultaneously. Note also that these vectors tend to be shorter than the Cray ones: a maximum of 4 elements (maybe 8 for small integers) is typical.
3
@Michael Hansen I think the earlier ILLIAC IV research machine was also SIMD, with four processors executing the same instruction set on four data streams. Seymour Cray looked at the difficulty of programming that before creating his architecture for the Cray-1.
2
8:16 Case in point: the Atari Falcon had a Motorola 56001 DSP, because in those days general-purpose CPUs were not fast enough to do processing of CD-quality sound data in real time. Nowadays we don’t see DSPs much in regular PCs (except maybe embedded inside a controller for some peripheral). But new, more compute-intensive tasks have become popular, which is why we now have programmable GPUs, for example. Will the wheel turn again? Yes, I think it is possible.
2
@petitio_principii Most 1980s-vintage graphics chips were fairly simple-minded: their job was to convert the state of bits in video RAM to an analog video signal that could be fed to a CRT monitor. All the real graphics rendering was done by the CPU. The one notable exception was of course the Amiga.
2
Both! It’s a tag team of high-powered multiprocessors and low-powered ones.
2
And also more expensive. Proprietary OSes, and the proprietary server-side apps that run on them, tend to be licensed based on the number of CPU sockets in your machine. A special exemption was made for multiple CPUs in the same socket, just for the sake of keeping the customers happy, because such a thing was unknown in the 1990s.
1
Grosch’s Law probably comes into it at some point as well.
1
It’s called “homogeneous coordinates”. Sometimes the extra component isn’t a 1; a value of 0 can be used to represent a “point at infinity” (having a direction but no actual position).
1
@aarondavis5386 It means that all linear transformations in n dimensions can be expressed uniformly as multiplications between vectors of (n + 1) components and matrices of (n + 1) × (n + 1) components (e.g. 4×4 matrices and 4-vectors for 3D graphics). If you try to use n-vectors and n×n matrices, then rotation, scaling etc are represented by multiplications, but translation (change of location) has to be done by addition.
1
14:28 Threads are separate execution entities that share a common process context. It’s not the only way to parallelize a program: there are also programs that run multiple separate processes in parallel. In fact, one might argue that threads are about the last form of parallelism you should resort to: look at other solutions first if you can, then fall back to threads if nothing else will do. This is because of their propensity for the most difficult sort of bugs: the ones that are timing-dependent, so attempts to track them down can often cause them to (temporarily) disappear.
1
@kc9scott That is correct. The genius of Steve Wozniak was in coming up with such a hardware-minimal design. And not just that, but one where the software work was done on such a low-powered processor. Engineers would look at the Apple II motherboard and ask “where’s the floppy controller?”. There wasn’t one. Or rather, it was the “RWTS” routine in the ROM.
1