Comments by "clray123" (@clray123) on "Tom Nicholas"
channel.
-
115
-
18
-
7
-
6
-
5
-
5
-
4
-
4
-
4
-
3
-
3
-
3
-
3
-
3
-
3
-
3
-
3
-
3
-
3
-
3
-
3
-
2
-
2
-
2
-
2
-
2
-
2
-
2
-
2
-
2
-
2
-
2
-
2
-
2
-
2
-
@mousethehuman7179 I doubt that the applicants would be happy to wait for decision inside their home country. After all, this would indicate that they are not really persecuted after all (because they can wait where they live for months). And there is already a process for that - it's called applying for a visa.
The reason for deterring new arrivals is not mainly because "they're illegal, dangerous people" (although some are).
There are two reasons: the first is the leeching off social systems - just in case they can't find suitable work, which is actually quite likely, given that many of them are unskilled, uneducated laborers - of which there is enough in the target country already. So they do not really bring much with them, but they create an immediate drain on local resources, which their previous generations have not earned. Unless we agree to dispose with social support systems entirely (of which I would be in favor), it is difficult to just open borders to everyone.
The second reason, and this is much more critical, is the future development once you allow such mass immigration to take place. Given the demographics and the cultural incompatibility, these populations are going to eventually displace the native population of whichever Western country they emigrate to, which will give them political majorities and political power. They will then take care of their own people's interests (why wouldn't they?) and in turn create worse living conditions for the native populations of the target countries (as it has already happened in some African countries such as South Africa, Zimbabwe).
So this is simply a pragmatic concern about increased competition for limited resources, between populations which do not understand nor like each other very much - if only for historical reasons, but these reasons, being historical, unfortunately cannot be made to disappear.
2
-
2
-
2
-
2
-
2
-
2
-
2
-
2
-
2
-
2
-
2
-
2
-
2
-
2
-
1
-
1
-
1
-
1
-
1
-
1
-
1
-
1
-
1
-
1
-
1
-
1
-
@ccaagg You imply that the regurgitation is intentional and happens for certain texts only. This is generally not true. The models can also be tricked into reproducing random training data, e.g. see "Scalable Extraction of Training Data from (Production) Language Models" by Nasr et al, 2023. Generally, some (small) measured perecentage of raw input can be reproduced verbatim.
Probability distribution: the job of a language model, based on the loss function used in the training algorithm, is, for any prompt in the training data set, to faithfully reproduce the completions found there. For example, if the trainining data contained the word "meows" after the word "cat" in 50% of training samples and the word "barks" in the other 50% of training samples with "cat", the model should assign 50% probability to "meows" and 50% probability to "barks" when it is prompted with "cat". That it works that wy can be trivially demonstrated by training a small language model on such a specially prepared training set.
"Generalization" or "interpolation" to other sequences not found in the training data set only happens as a side effect (and when it yields undesirable outcome, we call it "hallucination"). For example, if the word "cat" had a similar vector representation to the word "tiger" (learned based on all the other training examples) the model might output "tiger meows" or "tiger barks" despite not having had these sequences provided in the training data directly. To what extent such "generalization" happens and in what way exactly is subject to much debate.
It can be demonstrated in constructed examples that it indeed works for certain sequence prediction problems that exhibit inherent symmetry. In particular, there is a paper on "grokking", which demonstrates that a neural network might first overfit to the training data only, but after continued training find an improved "recipe" to correctly reproduce those input-output pairs that were intentionally left out from the training data set (the example they use is learning modulo division; interestingly the same approach does not work for e.g. learning the multiplication table). Colloquially you could call it "filling in the gaps".
It has been also demonstrated for language models that during training they develop so called "induction heads", which is basically the ability to copy words from previous context into appropriate positions in the generated output. For example, even a simple LM might respond with "Hello, John!", if you prompt it with "My name is John, hello!" altough the training data set only ever contained "Hello, Kate!". You could still call it a form of regurgitation... because if the model had no greetings at all in the training set, it would respond with unrelated gibberish.
By "producing" in context of language models I just mean the generation of a completion given a prompt (aka inference).
By "knowing" I mean the representation of training data as encoded in the model's weights - i.e. the data which allows the inference algorithm to produce output when prompted.
1
-
1
-
1
-
1
-
1
-
1
-
1
-
1
-
1
-
1
-
1
-
1
-
1
-
1
-
1
-
1
-
1
-
1
-
1
-
1
-
1
-
1
-
1
-
1
-
1
-
1
-
1
-
1
-
1
-
1
-
1
-
1
-
1
-
1
-
1
-
1
-
1
-
1
-
1
-
1
-
1
-
1
-
1
-
1
-
1
-
1
-
1
-
1
-
1
-
1
-
1
-
1
-
1