Comments by "" (@grokitall) on "Continuous Delivery"
channel.
-
1
-
1
-
1
-
1
-
@lapis.lazuli. from what little info has leaked out, a number of things can be said about what went wrong.
first, the file seems to have had a big block replaced with zeros.
if it was in the driver, it would be found with testing on the first machine you tried it on, as lots of tests would just fail, which should block the deployment.
if it was a config file, or a signature file, lots of very good programmers will write a checker so that a broken file will not even be allowed to be checked in to version control.
in either case, basic good practice testing should have caught it, and then stopped it before it even went out of the door.
as that did not happen, we can tell that their testing regime was not good.
then they were not running this thing in house. if it was, the release would have been blocked almost immediately.
then they did not do canary releasing, and specifically the software did not include smoke tests to ensure it even got to the point of allowing the system to boot.
if it had, the system would have disabled itself when the machine rebooted the second time and had not set a simple flag to say yes it worked. it could then have
also phoned home, flagging up the problem and blocking the deployment.
according to some reports, they also made this particular update ignore customer upgrade policies. if so, they deserve everything thrown at them. some reports even go as far as to say that some manager specifically said to ship without bothering to do any tests.
in either case, a mandatory automatic update policy for anything, let alone some kernel module is really stupid.
1
-
The problem with your three ring circus is that the middle ring does not gain you anything, which is why no operating system that I know of implements it.
The windows kernel had a flag which enabled crowdstrike to require their code to always run in order for windows to boot.
This is possibly a good thing for security software and critical drivers, but the problem is that on reboot after the crash, nothing done by windows kept any record of which mandatory drives actually completed booting. If it had, it could have flagged the crowdstrike broken driver as buggy, and booted without it.
Crowdstrike also claimed to have stuff in their code which identified buggy updates, but either it did not exist, or was never tested, but if it worked as they told their customers it did, it also could have flagged the 291 update as broken, and rolled back to the previous version until a new update was released. Because it did not work, this never happened.
1
-
1
-
not really, we all know how hard it is to fix bad attitudes in bosses.
in the end it comes down to the question of which sort of bad boss you have.
if it is someone who makes bad choices because they don't know any better, train them by ramming home the points they are missing at every oportinity until they start to get it.
for example if they want to get a feature out the door quick, point out that by not letting you test, future changes will be slower.
if they still don't let you test, point out that now it is done, we need to spend the time to test, and to get it right, or the next change will be slower.
if they still did not let you test, when the next change comes along, point out how it will now take longer to do, as you still ha e to do all the work you were not allowed to do before, to get it into a shape where it is easy to add the new stuff.
if after doing that for a while, there is still no willingness to let you test, then you have a black boss.
with a black boss, their only interest is climbing the company ladder, and they will do anything to make themselves look good in the short term to get the promotion. the way to deal with this is simply to get a paper trail of every time you advise him of why something is a bad idea, and him forcing you to do it anyway. encourage your colleagues to do the same. eventually one of the inevitable failures will get looked into, and his constantly ignoring advice and trying to shift blame to other will come to light.
in the worst case, you won't be able to put up with his crap any more, and will look for another job. when you do, make sure that you put all his behaviour in the resignation letter, and make sure copies go directly to hr and the ceo, who then will wonder what is going on and in a good company will look to find out.
1
-
1
-
1
-
1
-
1
-
1
-
1
-
1
-
1
-
1
-
1
-
1
-
@evgenyamorozov no, it won't compile, but you are designing the public api, so it does not need to. Once you have an api call which is good enough, you have your failing test, which fails by not compiling, so you create a function definition which meets that api, and a minimal implementation which returns whatever value your test requires, thereby compiling and passing the test.
Then you come up with another example which returns a different result, which then fails, and make just enough changes to the implementation to pass both tests. Then keep doing the same thing.
1
-
1
-
1
-
1
-
1
-
1
-
1
-
1
-
1
-
@ansismaleckis1296 the problem with branching is that when you take more than a day between merges, it becomes very hard to keep the build green and pushes you towards merge hell.
the problem with code review and pull requests is that when you issue the pull request and then have to wait for code review before the merge, it slows everything down. this in turn makes it more likely that the patches will get bigger, which take longer to review, making the process slower and harder, and thus more likely to miss your 1 day merge window.
the whole problem comes from the question of what is the purpose of version control, and it is to do a continuous backup against every change. however this soon turned out to be of less use than expected because most backups ended up in a broken state, sometimes going months between releasable builds. this made most of the backups to be of very little value.
the solution to this turned out to be smaller patches merged more often, but the pre merge manual review was found not to scale well, so a different solution was needed, which turned out to be automated regression tests against the public api, which guard against the next change breaking existing code.
this is what continuous integration is, running all those tests to make sure nothing broke. the best time to write the tests was before you wrote the code, as then you have tested that the test can fail and pass. this tells us that the code does what the developer intended it to do.
tdd adds refactoring into the cycle, which further checks the test to make sure it does not depend on the implementation.
the problem with not merging often enough is that it breaks refactoring. either you cannot do it, or the merge for the huge patch needs to manually apply the refactoring to the unmerged code.
continuous delivery takes the output from continuous integration, which is all the deployable items, and runs every other sort of test against it trying to prove it unfit for release.if it fails to find any issues, then it can then be deployed.
the deployment can then be done using canary releasing, with chaos engineering being used to test the resilience of the system, performing a roll back if needed.
it looks too good to be true, but is what is actually done by most of the top companies in the dora state of devops report.
1
-
alpha beta and releasable date back to the old days of pressed physical media, and their meanings have changed somewhat in the modern world of online updates.
originally, alpha software was not feature complete, and was also buggy as hell, and thus was only usable for testing what parts worked, and which parts didn't.
beta software occurred when your alpha software became feature complete, and the emphasis moved from adding features to bug fixing and optimisation, but it was usable for non business critical purposes.
when beta software was optimised enough, with few enough bugs, it was then deemed releasable, and sent out for pressing in the expensive factory.
later, as more bugs were found by users and more optimisations were done you might get service packs.
this is how windows 95 was developed, and it shipped with 4 known bugs, which hit bill gates at the product announcement demo to the press, after the release had already been printed. after customers got their hands on it the number of known bugs in the original release eventually went up to 15,000.
now that online updates are a thing, especially when you do continuous delivery, the meanings are completely different.
alpha software on its initial release is the same as it ever was, but now the code is updated using semantic versioning. after the initial release, both the separate features and the project as a whole have the software levels mentioned above.
on the second release, the completed features of version 1 have already moved into a beta software mode, with ongoing bug fixes and optimisations. the software as a whole remains in alpha state, until it is feature complete, and the previous recommendations still apply, with one exception. if you write code yourself that runs on top of it, you can make sure you don't use any alpha level features. if someone else is writing the code, there is no guarantee that the next update to their code will not depend on a feature that is not yet mature, or even implemented if the code is a compatability library being reimplemented.
as you continue to update the software, you get more features, and your minor version number goes up. bug fixes don't increase the minor number, only the patch number. in general, the project is moving closer to being feature complete, but in the meantime, the underlying code moves from alpha to beta, to maintainance mode, where it only needs bug fixes as new bugs are found.
thus you can end up with things like reactos, where it takes the stable wine code, removes 5 core libraries which are os specific, and which it implements itself, and produces something which can run a lot of older windows programs at least as well as current wine, and current windows. however it is still alpha software because it does not fully implement the total current windows api.
wine on the other hand is regarded as stable, as can be seen from the fact that its proton variant used by steam can run thousands of games, including some that are quite new. this is because those 5 core os specific libraries do not need to implement those features, only translate them from the windows call to the underlying os calls.
the software is released as soon as that feature is complete, so releasable now does not mean ready for for an expensive release process, but instead means that it does not have any major regressions as found by your ci and cd processes.
the software as a whole can remain alpha until feature complete, which can take a while, or if you are writing something new, it can move to beta as soon as you decide that enough of it is good enough, and when those features enter maintainance mode, it can be given a major version increment. this is how most projects
now reach their next major version, unless they are a compatability library.
so now code is split into 2 branches, stable and experimental, which then has code moved to stable when ci is run, but it is not turned on until it is good enough, so you are releasing the code at every release, but not enabling every feature.
so now the project is alpha (which can suddenly crash or lose data), beta (which should not crash but might be slow and buggy) or stable (where it should not be slow, should not crash, and should have as few bugs as possible).
with the new way of working, alpha software often is stable as long as you don't do something unusual, in which case it might lose data or crash. beta software now does not usually crash, but can still be buggy, and the stable parts are ok for non business critical use, and stable software should not crash, lose data or otherwise misbehave, and should have as few known bugs as possible, thus making it usable for business critical use.
a different way of working with very different results.
1
-
1
-
1
-
1
-
the problem with the idea of using statistical ai for refactoring is that the entire method is about producing plausible hallucinations that conform to very superficial correlations.
to automate refactoring, you need to understand why the current code is wrong in this context. this is fundamentally outside the scope of how these systems are designed to work, and no minor tweaking can remove the lack of understanding from the underlying technology.
the only way around this is to use symbolic ai, like expert systems or the cyc project, but that is not where the current money is going.
given the current known problems with llm generated code, lots of projects are banning it completely.
these issues include:
exact copies of the training data right down to the comments, leaving you open to copyright infringement.
producing code with massive security bugs due to the training data not being written to be security aware.
producing hard to test code, due to the training data not being written with testing in mind.
the code being suggested being identical to code under a different license, leaving you open to infringement claims.
when the code is identified as generated, it is not copyrightable, but if you don't flag it up it moves the liability for infringement to the programmer.
the only way to fix generating bad code is to completely retrain from scratch, which does not guarantee fixing the problem and risks introducing more errors.
these are just some of the issues of statistical methods, there are many more.
1
-
1
-
1
-
1
-
@ContinuousDelivery this is exactly the correct analogy to use.
In science what you are doing is crowd sourcing the tests based upon existing theories and data, and using the results to create new tests, data and theories.
Peer review is then equivalent of running the same test suite on different machines with different operating systems and library versions to see what breaks due to unspecified assumptions and sensitivity to initial conditions.
This then demonstrates that the testing is robust, and any new data can be fed back into improving the theory.
And like with science, the goal is falsifiability of the initial assumptions.
Of course the other problem is that there is a big difference between writing code and explaining it, and people are crap at explaining things they are perfectly good at doing. Testing is just explaining it with tests, and the worst code to learn the skill on is legacy code with no tests.
So people come along and try to fit tests to legacy code only to find that the tests can only be implemented as flaky and fragile tests due to the code under test not being designed for testability, which just convinces them that testing is not worth it.
What they actually need is to take some tdd project which evolved as bugs we're found, delete the tests, and compare how many and what types of bugs they find as they step through the commit history. If someone was being really nasty they could delete the code, and reimplement it with a bug for every test until they got code with zero passes, and then see what percentage of bugs they found when they implemented their own test suite.
1
-
Tdd comes with a number of costs and benefits, and so does not doing tdd or continuous integration.
The cost of doing tdd is that you move your regression tests to the front of the process, and refactor as you go and it can cost up to 35 percent extra in time to market..
What you get back is an executable specification anyone can run to reimplement the code in the form of tests, a set of code designed to be testable with very few bugs, and the combination is optimized for doing continuous integration. You also spend very little time on bug hunting. it also helps with areas that are heavily regulated as you can demonstrate on an ongoing basis that it meets the regulations.
All of this helps with getting customers to come back for support, and for repeat business.
Not doing tdd also comes with benefits and costs.
The benefit Is mainly that your initial code dump comes fast, giving a fast time to market.
The costs are significant. As you are not doing incremental testing, the code tends to be hard to test and modify. It also tends to be riddled with bugs which take a long time to find and fix. Due to the problem of being hard to modify, it is also hard to extend, and if they have to get someone else to fix it it can sometimes be quicker to just reimplement the whole thing from scratch.
This tends to work against getting support work and repeat business.
As for the snowflake code no one will touch, it will eventually break, at which point you end up having to do the same work anyway, but on an emergency basis with all the costs that implies. Testing is like planting a tree, the best time to do it is a number of years ago, the second best time is now.
The evidence for incremental development with testing is in, in the dora reports. Not testing is a disaster. Test after gives some advantages initially, while costing more, but rapidly plataus. Test first cost a very little more than comprehensive test after, but as more code I covered you get an ever accelerating speed of improvements and ease of implementation of those improvements, and it is very easy for others to come along and maintain and expand the code, assuming they don't ask you to do the maintenance and extensions.
1
-
I doubt it, but you do not need them. If you look at history you can see multitudes of examples of new tech disrupting industries, and project that onto what effect real ai will have.
Specialisation lead us away from being surfs, automation removed horses as primary power sources, and changed us from working near 18 hour days seven days per week towards the current 40 hour 5 day standard.
Mechanisation also stopped us using 98 percent of the population for agriculture, moving most of them to easier, lower hour, better paying work.
This lead to more office work, where wordprocessors and then computers killed both the typing pool and the secretarial pool, as bosses became empowered to do work that used to have to be devolved to secretaries.
As computers have become more capable they have spawned multiple new industries with higher creative input, and that trend will continue, with both ai and,additive manufacturing only speeding up the process.
The tricky bit is not having the industrial and work background change, but having the social, legal and ethical background move fast enough to keep up.
When my grandfather was born, the majority of people still worked on the land with horses, we did not have powered flight, and the control systems for complex mechanical systems were cam shafts and simple feedback systems.
When I was born, we had just stepped on the moon, computers had less power than a modern scientific calculator app on your smartphone, and everyone was trained at school on the assumption of a job for life.
By the time I left school, it became obvious that the job for life assumption was on it's way out from the early seventies, and we needed to train people in school for lifelong learning instead, which a lot of countries still do not do.
By the year 2000, it became clear that low wage low skilled work was not something to map your career around, and that you needed to continually work to upgrade your skills so that when you had to change career after less than 20 years, you had options for other, higher skilled and thus higher paid employment.
Current ai is hamstrung by the fact that companies developing it are so pleased by the quantity of available data to train them with that they ignore all other considerations, and so the output is absolutely dreadful.
If you take the gramarly app or plug in, it can be very good at spotting when you have typed in something which is garbage, but it can be hilariously bad at suggesting valid alternatives which don't mangle the meaning. It also is rubbish at the task given to schoolchildren to determine things like if you should use which or witch, or their, there or the're.
Copilot makes even worse mistakes, as you use it wanting quality code, but the codebases it was trained upon have programmers with less than 5 years experience, due to the exponential growth of programming giving a doubling of the number of programmers every 5 years.
It also does nothing to determine the license the code was released under, thereby encouraging piracy and similar legal problems, and even if you could get away with claiming that it was generated by copilot and approved by you, it is not usually committed to version control that way, leaving you without an audit trail to defend yourself.
To the extent you do commit it that way, it is not copyrightable in the us, so your companies lawyers should be screaming at you not to use it for legal reasons.
Because no attempt was made as a first step to create an ai to quantify how bad the code was, the output is typically at the level of the average inexperienced programmer, so again, it should not be accepted uncritically, as you would not do so from a new hire, so why let the ai contribute equally bad code?
The potential of ai is enormous, but the current commercial methodology would get your project laughed out of any genuinely peer reviewed journal as anything but a proof of concept, and until they start using better methods with their ai projects there are a lot of good reasons to not let them near anything you care about in anything but a trivial manner.
Also as long as a significant percentage of lawmakers are as incompetent as you typical magazine republican representative we have no chance of producing a legal framework which has any relationship to the needs of the industry, pushing development to less regulated and less desirable locations, just like is currently done with alternative nuclear power innovations.
1
-
1
-
1
-
1
-
@nschoem that perception definitely exists, and is based upon intuitive feelings that writing tests with your code takes longer, which is true but not really relevant. What happens with feature addicted managers is that they start off saying get the feature to work and we can write the tests later. Then they prioritize the next feature over testing, resulting in no tests, and what few tests do get written are fragile because the only way to test most code that was not designed with tests in mind tend to rely on implementation details to work at all.
This results in code with increasing levels of technical debt which gets harder and harder to debug and extend, making everything slower. The only way to fix this is by refactoring your way out of the problem, which needs tests, and test after tests are harder to write and fragile, so you end up writing tdd style tests for the refactored code so you can just delete those original tests as they cease being helpful.
You still have to write the tests in either case if you have a long lived or large code base, but tdd style tests first tests tend to be API tests which don't depend on internal implementation details, and thus don't break much.
1
-
1
-
@lucashowell7653 the tests in tdd are unit tests and integration tests that assert that the code does what it did the last time the test was run. These are called regression tests, but unless they have high coverage and are run automatically with every commit you have large areas of code where you don't know when something broke.
If the code was written before the tests, especially if the author isn't good at testing, it is hard to retrofit regression tests, and to the extent you succeed they tend to be flakier and more fragile. This is why it is better to write them first.
Assuming that the code was written by someone who understands how to write testable code, you could use A.I. to create tests automatically, but then you probably would not have tests where you could understand easily what the test failing meant due to poor naming. When you get as far as doing continuous integration the problem is even worse, as the point of the tests is to prove that the code still does what the programmer understood was needed and document this, but software cannot understand this yet. If you go on to continuous delivery, you have additional acceptance tests whose purpose is prove that the programmer has the same understanding of what is needed as the customer, which requires an even higher level of understanding of the problem space, and software just does not understand either the customer or the programmer that well either now or in the near future.
This means that to do the job well, the tests need to be written by humans to be easily understood, and the time which makes this easiest is to write one test, followed by the code to pass the test. For acceptance tests the easiest time is as soon as the code is ready for the customer to test, adding tests where the current version does not match customer needs. Remember customers don't even know what they need over 60% of the time.
1
-
@temper8281 this is the argument I've heard from everyone who can't write good tests, as well as from those who don't understand why testing matters. If you are just doing a little hobby project which nobody else is ever going to use, and is never going to be extended, then you can mostly get away with not having tests.
As the project gets bigger, or more popular, or has more developers you need the tests to stop regressions, communicate between developers, spot coupling, and numerous other things, most notably to deal with technical debt. The more any of those factors rise, the higher the cost of shipping broken code, and thus the more the tests matter. By the time you need continuous integration you cannot do without the tests, but the harder it is to retrofit them to your existing legacy code base, so it is better to learn how to do testing earlier and add it to the project sooner.
1
-
1
-
1
-
1