Comments by "" (@gezenews) on "DeepSeek stole our tech... says OpenAI" video.

It stole job from ai before ai even stole any jobs from humans.
5
@Aspeer1971 It's betrayed by the fact that the initial training of the model is entirely based on data collection in the first place. The underlying advancement that made it possible was the tokenization of web content and feeding that into a plain ol model. None of that is particularly complex or new or interesting aside from the tokenization of the web content. The entirety of the product is built on repetition and borrowed data. So no there is no "cheaters" involved unless you include everyone.
2
@tongpoo8985 but you understand the llm is possible because of tokenization of web content, and basic machine learning.
2
@anthonyhiscox you fail to differentiate between that and the complexity of gpt. You act like there is some deeply complicated thinking machine at the heart of openai, meanwhile its the exact same level of mass garbage input to output, copying, from the existing internet forums. There is no deeper technology at the core of gpt. It is truncated forum content
2
Youre basically telling me that you cant word for word copy a dictionary, its still copyright protected. And im telling you that a dictionary is not a novel useful invention that justifies staking half the economy on it. There are smaller models for identifying birds and plants, and keeping with the analogy, these function as encyclopedias on those specific topics. Curated data that serves a specific purpose. If you spent the time to develop them you would expect a return on your efforts. Meanwhile the llm is a dictionary. There is no specific effort to collect specific data and perform a specific function. You have just taken all possible data, fed it in, and then rewardwd the positive responses. Anybody can do that. Even if they were forced to start from scratch, deepseek could easily do that. The chore of it is in collecting the data, not in doung anything magical with your model. Which makes it a low value piece of intellectual property. Whats special is the largeness of it, not the model itself.
2
@hubertgiza3843 Yeah but they were recently hacked. Also during some service interruptions yesterday, DeepSeek gave me a "You have to pay for GPT premium" default message from GPT 3 or 4. Which was so suspiciously obvious as a killshot to DeepSeek IP's claims that I actually do not even believe it.
1
@Aspeer1971 It will not degrade over time. They can improve it using the existing user base and train it the same way the big LLMs are. The big LLMs which btw are using a lot of h1b hiring to do so. Quality and intelligence are clearly not factored in. They rarely have been whenever this brand of machine learning is in play. It's not that complicated. They resolved the complex details a long time ago and everything is published. There is no sacred IP that can't be replicated. Big LLMs premium version functions as a glorified search engine. That's all it is. Intelligence is a completely different avenue of product. Nobody wasting time on their LLM improvements is going to get to AGI using such a dull repetition and 20 year old machine learning technology. The point being DeepSeek isn't that special because none of the LLMs are.
1
@outwithrealitytoo 4 layers of cheating at well-worn IP that is not privately held nor overly complex.
1
@Aspeer1971 you would be right if "training" didnt mean bombarding a model with as much data as you can possibly find. But as is, training is just more copying on content including the current set of users. Its not training like you or i do to study for a test. Its just adding more content to its refined plagiarism blob. The "big genius deepseek" idea that made it more efficient was simply splitting it into different categories. That how barabrically simplistic the current gpt system is. Not even an upper level segment of the bots. Just one giant model with more data than anything ever made. It BETTER be smart because it is taking all the repetitive training data into account. The maximum amount possible. That is a feat of architecture, data, tokenization, and the basic application of machine learning. Notice i didnt mention intelligence.
1
@outwithrealitytoo not it what they are copying is already stolen. If you make some YTP from spongebob clips you dont own you cant sue someone for making a separate YTP using pieces borrowed from you in the exact same way. And you entirely skipped the part where it is all plagiarized and not at all intelligently coordinate or organized piece of technology. Its plagiarism of existing web content, tokenized, and fed to a model stupidly. Not in a fascinating new way. There is no underlying novelty you could patent, besides the tokenization, and that isnt hard to recreate separately in a different way. So what exactly is the ip your protecting? Theres nothing in model that is unique to other machine learning tech other than its size. The largeness of the model is not patentable.
1
@outwithrealitytoo You are not understanding what I am saying. I'm not saying it wouldn't be IP theft to copy a dictionary. I'm saying there is nothing creative in the dictionary that you would need to plagiarize in the first place. The fact that they did use an earlier model to start from is a privelege of convenience that can be easily bypassed with all the funding they just got. Remember the two big things that make the LLM? Tokenize Web content (a lot of it) and machine learning techniques that have existed for a long time. Please tell me what is standing in the way of creating their own? It's certainly not any expertise or genius standing in their way. You can reproduce the same results just like you can just write your own dictionary. You don't have to put any creative effort into it at all. They massively improved the existing product without trying, recreating the original from scratch would not be hard.
1