General statistics
List of Youtube channels
Youtube commenter search
Distinguished comments
About
Gilad Barlev
Brodie Robertson
comments
Comments by "Gilad Barlev" (@GSBarlev) on "Stop Reporting AI Generated Vulnerabilities!!" video.
I require contributors to license their code contributions under a GPL-compatible license. That can only be done by the author or someone who holds a copyright, and courts have already held that AI-generated works cannot be copyrighted.
3
It's a known problem that AI summarization tools amplify sentiment. That is, if I write a 100-page report that "X software is mostly solid but has some non-critical UI quirks," an LLM will summarize that as "X software is hot, unusable garbage!!"
3
Maybe I'm a luddite, but I prefer rules-based code-scanners and auto-linters. Because those kinds of tools obey straightforward logic and can be unit tested.
3
Codellama is, what? A 7b model? So, like 14GB of RAM (preferably VRAM) and as much FLOPS as you can throw at it? I mean, I guess that's still lighter weight than VSCode¹ ¹Insert JetBrains or emacs as desired
2
AI-generated code is such an enormous can of worms. 1. The submitter is much less likely to understand the code they're submitting 2. The submitter is much more likely to naïvely trust the code than something they copied from StackOverflow 3. Neither the "proompter" nor the LLM's company is considered the "author," so even if it's not verbatim-plagiarized from the training data, how can anyone transfer a copyright to the project?
1
Actually, "comprehending" is something that GPTs do really well. The magic of these algorithms is that, rather than use recurrent networks to find deep patterns, they instead focus on "attention" mechsnisms that contextualize the input into as many ways as possible. This makes them ideally suited for information extraction and summarization. The issue is that they're lousy when it comes to insight and invention—their goal is to synthesize the most likely /appropriate response to a proompt from the training data, and that means that they can never exceed their training data. For a vulnerability scanner, that means that they can only ever find code similar to other vulnerabilities and can never actually discover novel threat risks.
1