Comments by "mathew tedder" (@solifugus) on "Analyzing Deepseek's "undefined" NVIDIA PTX optimizations (with benchmarks!)" video.

I just have an RTX 3070 plugged into my System76 Meerkat via external enclosure through thunderbolt USB C. Much more fun, though, has been my MilkV RISC-V computer... I have my own pattern learning/prediction approach that isn't nearly so brute force as ML. It is a much faster serial process that finds correlations without cartesian comparisons. However, the parallel compute is still a big performance boost because my approach (MindSplicer) requires a lot of ratio calculations. I really love RISC-V.
1
Also, MindSplicer develops pattern templates for instantiating recognitions. These templates include spatial/temporal observations (of both I and O), time delays, and variable segments. For a variable segment in a pattern, it stores the most common and what is in common between what has been observed there before. This enables novel substitution. In other words, substitution problem solving, analogy, and conceptualization. Each hurdle (a term I use) in a pattern has an influence and the confidence of the observation(s) in it adds to the total likelihood of the pattern being recognized depending on its influence. When a pattern is partially recognized, we need to consider those confidences and influences for the stretch of the pattern that has been so far more or less matched to consider the likelihood that it will continue.
1
The "attention is all you need" thing works differently here. Instead of having a massively huge context window, patterns are recognized and those recognitions are observed. Over time and multiple experiences, the specifics are weeded out by finding the commonalities between these larger abstractions. Each approach has advantages and disadvantages over the other, I think. A massively huge context window means it doesn't require much learning. It just has a spectacular memory. However, the MindSplicer's learning approach can learn and once it does, what it can do with what it learns is much more powerful.
1