Comments by "clray123" (@clray123) on "Tom Nicholas" channel.

  1. 115
  2. 18
  3. 7
  4. 6
  5. 5
  6. 5
  7. 4
  8. 4
  9. 4
  10. 3
  11. 3
  12. 3
  13. 3
  14. 3
  15. 3
  16. 3
  17. 3
  18. 3
  19. 3
  20. 3
  21. 3
  22. 2
  23. 2
  24. 2
  25. 2
  26. 2
  27. 2
  28. 2
  29. 2
  30. 2
  31. 2
  32. 2
  33. 2
  34. 2
  35. 2
  36.  @mousethehuman7179  I doubt that the applicants would be happy to wait for decision inside their home country. After all, this would indicate that they are not really persecuted after all (because they can wait where they live for months). And there is already a process for that - it's called applying for a visa. The reason for deterring new arrivals is not mainly because "they're illegal, dangerous people" (although some are). There are two reasons: the first is the leeching off social systems - just in case they can't find suitable work, which is actually quite likely, given that many of them are unskilled, uneducated laborers - of which there is enough in the target country already. So they do not really bring much with them, but they create an immediate drain on local resources, which their previous generations have not earned. Unless we agree to dispose with social support systems entirely (of which I would be in favor), it is difficult to just open borders to everyone. The second reason, and this is much more critical, is the future development once you allow such mass immigration to take place. Given the demographics and the cultural incompatibility, these populations are going to eventually displace the native population of whichever Western country they emigrate to, which will give them political majorities and political power. They will then take care of their own people's interests (why wouldn't they?) and in turn create worse living conditions for the native populations of the target countries (as it has already happened in some African countries such as South Africa, Zimbabwe). So this is simply a pragmatic concern about increased competition for limited resources, between populations which do not understand nor like each other very much - if only for historical reasons, but these reasons, being historical, unfortunately cannot be made to disappear.
    2
  37. 2
  38. 2
  39. 2
  40. 2
  41. 2
  42. 2
  43. 2
  44. 2
  45. 2
  46. 2
  47. 2
  48. 2
  49. 2
  50. 1
  51. 1
  52. 1
  53. 1
  54. 1
  55. 1
  56. 1
  57. 1
  58. 1
  59. 1
  60. 1
  61. 1
  62. ​ @ccaagg  You imply that the regurgitation is intentional and happens for certain texts only. This is generally not true. The models can also be tricked into reproducing random training data, e.g. see "Scalable Extraction of Training Data from (Production) Language Models" by Nasr et al, 2023. Generally, some (small) measured perecentage of raw input can be reproduced verbatim. Probability distribution: the job of a language model, based on the loss function used in the training algorithm, is, for any prompt in the training data set, to faithfully reproduce the completions found there. For example, if the trainining data contained the word "meows" after the word "cat" in 50% of training samples and the word "barks" in the other 50% of training samples with "cat", the model should assign 50% probability to "meows" and 50% probability to "barks" when it is prompted with "cat". That it works that wy can be trivially demonstrated by training a small language model on such a specially prepared training set. "Generalization" or "interpolation" to other sequences not found in the training data set only happens as a side effect (and when it yields undesirable outcome, we call it "hallucination"). For example, if the word "cat" had a similar vector representation to the word "tiger" (learned based on all the other training examples) the model might output "tiger meows" or "tiger barks" despite not having had these sequences provided in the training data directly. To what extent such "generalization" happens and in what way exactly is subject to much debate. It can be demonstrated in constructed examples that it indeed works for certain sequence prediction problems that exhibit inherent symmetry. In particular, there is a paper on "grokking", which demonstrates that a neural network might first overfit to the training data only, but after continued training find an improved "recipe" to correctly reproduce those input-output pairs that were intentionally left out from the training data set (the example they use is learning modulo division; interestingly the same approach does not work for e.g. learning the multiplication table). Colloquially you could call it "filling in the gaps". It has been also demonstrated for language models that during training they develop so called "induction heads", which is basically the ability to copy words from previous context into appropriate positions in the generated output. For example, even a simple LM might respond with "Hello, John!", if you prompt it with "My name is John, hello!" altough the training data set only ever contained "Hello, Kate!". You could still call it a form of regurgitation... because if the model had no greetings at all in the training set, it would respond with unrelated gibberish. By "producing" in context of language models I just mean the generation of a completion given a prompt (aka inference). By "knowing" I mean the representation of training data as encoded in the model's weights - i.e. the data which allows the inference algorithm to produce output when prompted.
    1
  63. 1
  64. 1
  65. 1
  66. 1
  67. 1
  68. 1
  69. 1
  70. 1
  71. 1
  72. 1
  73. 1
  74. 1
  75. 1
  76. 1
  77. 1
  78. 1
  79. 1
  80. 1
  81. 1
  82. 1
  83. 1
  84. 1
  85. 1
  86. 1
  87. 1
  88. 1
  89.  @OlgaZuccati  So far it gets curated and selected by people hired by and compensated by the AI companies. But yes, there are also AI models specifically designed to generate synthetic data for training, e.g. Nvidia's NeMo. What you are referring to as "cheating" is the difficulty of specifying "bulletproof" (from human perspective) reward functions in reinforcement learning. But the same problem exists in doing things the other way around (by writing down algortihms instead of describing the desired outputs) - it's called software bugs. As for humans checking the output "at every step of the way", this is precisely of what machine learning tries to avoid. Sometimes (most of the times?) all you need is statistical assurance that the output fulfills your requirements - you don't care how it came to be and are not interested in nor capable of checking the process. When you obtain electricity from an energy company, you are not also interested in checking of how exactly the power plants operate and whether they do what they should at every step. The proof of the pudding is in the eating, the only time when checking matters is if you are concerned about safety. But even then you have to acknowledge that you have not possibly checked or avoided every possible hazard, which is why planes fall down and trains collide despite greatest effort to avoid. So in that sense AI-"learned" algorithms are not that different from handcrafted ones - you check (by exhaustive testing) what urgently needs to be checked and accept the rest if it works well enough.
    1
  90. 1
  91. 1
  92. 1
  93. 1
  94. 1
  95. 1
  96. 1
  97. 1
  98. 1
  99. 1
  100. 1
  101. 1
  102. 1
  103. 1
  104. 1
  105. 1
  106. 1
  107. 1
  108. 1
  109. 1
  110. 1
  111. 1
  112. 1
  113. 1
  114. 1