Comments by "" (@grokitall) on "*NEW STUDY* Does Co-Development With AI Assistants Improve Code?" video.
-
3
-
@julianbrown1331 partly it is down to the training data, but the nature of how they work does not filter them by quality either before, during, or after training, so lots of these systems are producing code which is as bad as that produced in the average training data, most of which is produced by newbies learning either the languages or the tools.
also, you misrepresent how copyright law works in practice. when someone claims you are using their code, they only have to show that it is a close match. to avoid summary judgement against you, you have to show that it is a convergent solution from the constraints of the problem space, and that there was no opportunity to copy the code.
given that there have been studies showing that for edge cases with very few examples they have produced identical code snippets right down to the comments in the code, good luck proving no chance to copy the code. just saying i got it from microsoft copilot does not relieve you of the responsibility to audit the origins of the code.
even worse, microsoft cannot prove it was not copied either, as the nature of statistical ai obfuscates how got from the source data to the code they gave you.
even worse, the training data does not even flag up which license the original code was under, so you could find yourself with gpl code with matching comments leaving you with your only choice being to release your proprietary code under the gpl to avoid triple damages and comply with the license.
on top of that, the original code is usually not written to be security or testability aware, so it has security holes, is hard to test, and you can't fix it.
2
-
2
-
@julianbrown1331 yes, you can treat it as a black box, only doing version control on the tests, but as soon as you do you are relying on the ai to get it right 100% of the time, which even the best symbolic ai systems cannot do. also, the further your requirements get away from the ones defining the training data, the worse the results get.
also the copyright issues are non trivial. when your black box creates infringing code, then by design you are not aware of it, and have no defence against it.
even worse, if someone infringes your code, you again do not know by design, cannot prove it, as you are not saving every version, and if you shout about how you work, there is nothing stopping a bad actor copying the code, saving the current version, letting you generate something else, then suing you for infringement, which you cannot defend against because you are not saving your history.
it is the wrong answer to the wrong problem, with the potential legal liabilities being huge.
1
-
the problem with the idea of using statistical ai for refactoring is that the entire method is about producing plausible hallucinations that conform to very superficial correlations.
to automate refactoring, you need to understand why the current code is wrong in this context. this is fundamentally outside the scope of how these systems are designed to work, and no minor tweaking can remove the lack of understanding from the underlying technology.
the only way around this is to use symbolic ai, like expert systems or the cyc project, but that is not where the current money is going.
given the current known problems with llm generated code, lots of projects are banning it completely.
these issues include:
exact copies of the training data right down to the comments, leaving you open to copyright infringement.
producing code with massive security bugs due to the training data not being written to be security aware.
producing hard to test code, due to the training data not being written with testing in mind.
the code being suggested being identical to code under a different license, leaving you open to infringement claims.
when the code is identified as generated, it is not copyrightable, but if you don't flag it up it moves the liability for infringement to the programmer.
the only way to fix generating bad code is to completely retrain from scratch, which does not guarantee fixing the problem and risks introducing more errors.
these are just some of the issues of statistical methods, there are many more.
1
-
1
-
1
-
@trignals not really. the history of programming has been to migrate away from hard to understand, untestable, clever code which almost nobody can understand, towards code which better models the problem space and the design goals needed to get something good to do the job, which is easier to maintain due to the costs moving away from the hardware, then the initial construction, till most of the cost is now in the multi year maintainence mode.
there are lots of people in lots of threads on lots of videos about the subject who seem to buy the hype that you can just throw statistical ai at legacy code, it will suddenly create massive amounts of easy to understand tests, which you can then throw at another ai which can just trivially create wonderful code which will replace that big ball of mud with optimum code behind optimum tests, where the whole system is basically ai generated tests and code, but built by systems which fundamentally can never reach the point of knowing the problem space and the design options as they fundamentally do not work that way.
as some of those problems are analogous to the halting problem, i am fundamentally sceptical of the hype which goes on to suggest that if there is not enough data to create enough superficial correlations, then we can just go ahead and use ai to fake up some more data to improve the training of the other ai systems.
as you can guess, a lot of these assumptions just do not make sense. a system which cannot model the software cannot then use the model it does not have to make high level design choices to make the code testable, it cannot then go on to use the analysis of the code it does not do to figure out the difference between bad code and good code, or to figure out how to partition the problem. finally, it cannot use that understanding it does not have to decide how big a chunk to regenerate, and if the new code is better than the old code.
for green field projects, it is barely plausible that you might be able to figure out the right tests to give it to get it to generate something which does not totally stink, but i have my doubts. for legacy code, everything depends on understanding what is already there, and figuring out how to make it better, which is something these systems basically are not designed to be able to do.
1