Youtube hearted comments of clray123 (@clray123).

What they are claiming is that the model is not memorizing particular sequences, but instead reconstructing a general generative algorithm which can be used to construct these (and other) sequences. Why they call the data-generating algorithm a "world model" is anyone's guess, probably much more pretentious and cooler to put in an AI paper (those people are very keen to put their wishful thinking into fancy words).
5
I think in such evident kidnapping cases (e.g. camera footage plus eyewitnesses of a struggling child being snatched off the street and the kidnapper caught red-handed) capital punishment would be quite adequate.
4
lol at the intro ... he's been making too many "positive" videos.. alllll right.
3
Trust me bro, limited context length is not the main issue with the LLMs. Them outputting hopeless garbage at random times is. Finally, I've tried the exact approach described in the paper before (letting the LLM autonomously decide when to "store" or "retrieve" information from external storage) and it totally craps out on the LLM's inability to make these decisions at appropriate times (speaking of open source LLMs). It would either store/retrieve everything after every prompt (if trained to do so) or not at all. No intelligence was noticeable whatsoever. So I'm very surprised at the purported good results they rave about in the paper. Probaly a lot of "man behind the curtain" prodding the model to do the right memory action at the right time involved in this.
3
The best thing about Qxir drawings is how characters discard and regrow their arms when required.
2
1. Use an underpowered Linux laptop. 2. Connect to a powerful cloud desktop to do all that number crunching. Unless of course by "developers" you mean "gamers", then you are shit out of luck.
2
If I train a LLM on sequences of digits of pi, will it be able to accurately generate further digits it has not ever seen?
2
Ok, I now know who the host of this channel reminds me of - Block McCloud (in his Crazy Man video).
2
Training requires craploads of memory and about 3x the computation power compared to inference. Also making the weights changeable eliminates the possibility of reusing (readonly) KV caches across multiple user sessions. So the most obvious answer is: it does not scale at all. And training on crap input tends to destroy the oh-so-well finetuned model (e.g. you could dump in garbage and make it "forget" all the precious information, in fact we are not sure how much of the input is forgotten/overwritten during the "regular" non-adversary training process either).
2
As always "it seems a little inconsistent". IOW, it seems a little crap and will a little lie to you, which will make it a little useless. Maybe even entirely useless.
1
@barakeel If it was able to notice the digits belong to pi, find a pi-generating program, and use it to generate more digits, I might call it a sign of intelligence. Otherwise it's just a crappy recorder.
1
The argument is that your brain is just the same kind of "machine" which is "simulating" its consciousness, whatever it is you designate by that word.
1
No, it just matches your bs question to the other D&D bs found among the billions of tokens it was trained on.
1
@AlexanderWeixelbaumer You tell me. What my brain produces does not look like the crap GPT produces.
1