Comments by "clray123" (@clray123) on "Computerphile" channel.

I'm not sure why you need complicated papers or long videos on this topic. A language model approximates the conditional probability distribution of the dataset it is trained on. Even if it (ignoring practical memory limits) mirrored the perfect probability distribution of all the words ever uttered by human race, it would be just that, a perfect tool for reproducing words that have been already uttered, while matching frequencies contained in the training set. As for generating anything that has not been uttered or so-called "generalizing", it's up to the researchers to prove that they mean by that word. As far as I'm concerned, I see no "generalizing" or "reasoning" going on whatsoever, just copying (with errors) what has already been put in during the training. That out-of-distribution generations stay close to the in-distribution ones creates an illusion of being "creative" or "able to generalize", while in reality it is more like "idiotically picking the semantically closest words and hoping we get back on track [i.e. in-distribution again]". P.S. Adding more data just means that your model is less likely to "derail" into completely out-of-distribution territory and keep generating garbage. Kind of like these "roly poly" dolls that stand up again after you push them over.
13
They are not going for ideal, just for average. But an averagely fit person is not an ideal athlete.
3
@odomobo It's not necessary. As a human brain, you only have one channel of communication - your senses - and still you can (mostly) distinguish your own thoughts from foreign injected ones and get skeptical if someone is trying to fool you. The same mechanism, whatever it is in our brains, will be sufficient for LLMs. To an extent it's already there in the "reasoning" LLMs - the "but wait," interjections and "a-ha" moments that make them backtrack and follow the right train of thought instead of plenty possible invalid ones.
3
@ Correct, but as you notice this is not a problem with AI, this is a problem with critical thinking overall. In fact there is no reason to believe future AI will be less critical thinking than an average person given the training an AI receives and an average person on planet Earth doesn't.
3
Yet another word is "planning". We already know that transformers have trouble reproducing traditional planning algorithms (what they can do is copy a limited amount of fixed-legnth traces of such algorithms). Because transformers are not even Turing-complete. They have to generate their output in a single pass; whereas planning (or optimization in general) requires iterations or backtracking. So the "AI" which is perhaps capable of doing what humans are doing is currently the slow and resource-intensive optimization algorithms used to TRAIN those models, not the fast algorithms used to EXECUTE the models (aka do "inference").
3
@fakecubed Not really. The basic transformer (attention) algorithm has not changed much since its introduction in 2014 (Bahdanau et al). Even the famous "breakthrough" paper by Vaswani et al in 2017 was already just a refinement, nothing radically new. What has been changing is the scale of training - more data, more GPUs. And some performance optimization in the algorithm too (e.g. flash attention) or training (e.g. DPO instead of RLHF PPO), but no groundbreaking new ideas.
2
Actually, i had a situation where the tax office mistakenly transferred $50K to me.. without writing anything in any form. They are able to confuse themselves with their own actions. The actual problem here is how you would keep that money without becoming an inmate.
2
Yes, poor academic guy is raving about 2d sandbox while Nvidia already has the full 3d one.
2
@Pixelarter CoT is not "new". And multi-shot generation is not going to save you if every shot has a potential for introducing new errors. In fact I vaguely recall a more recent paper criticizing CoT as a waste of effort in terms of improving generation quality! (Sorry, no reference, but you can search for it.)
2
Any sort of "feedback loop" really.
2
Well, too bad he actually has this thing called "phone book" (or Contacts as may be) in his shitty little smartphone.
2
One trick ponies are running out of their one trick (well actually two tricks, we have transformers and we have diffusion).
1
And now imagine what would have happened if it had not been just a shitty social network website, but the world's digital money system or something of actual importance.
1
We'll know it has peaked when Nvidia stock starts crashing.
1
The generative algorithms cannot and do not do loops.
1
@joshmogil8562 Yes, pretty much so. It may be difficult to come up with given how greatly the algorithms and the hardware supporting them depend on each other. What we already know (and have known for a long time) is that you cannot effectively parallelize algorithms where strong data dependencies exist. But the LLMs (and GPUs) are just all about exploiting massively parallel computation. I beliee the current bet of the industry players is that the non-parallel aspects will be "somehow less important" and will be sorted out outside of the models, but so far I've seen very little in terms of confirming this.
1
It SEEMS to correct itself. Just like the original LLMs SEEM to understand language.
1
DeepSeek, like o1, o3, is full of deep wishful thinking...
1