General statistics
List of Youtube channels
Youtube commenter search
Distinguished comments
About

Continuous Delivery
comments

Comments by "" (@grokitall) on "Continuous Delivery" channel.

Previous
1
Next
...
All

Your comment demonstrates some of the reasons people don't get tdd. First, you are equating the module in your code as a unit, and then equating the module test suite as the unit test, and then positing that you have to write the entire test suite before you write the code. This just is not how modern testing defines a unit test. An example of a modern unit test would be a simple test that when given the number to enter into the cell perform a check to see if the number is between 1 and the product of the grid sizes and returns a true or false value. For example your common sudoku uses a 3 x 3 grid, requiring that the number be less than or equal to 9, so it would take the grid parameters, cache the product, check the value was between 1 and 9, and return true or false based on the result. This would all be hidden behind an API, and you would test that given a valid number it would return true. You would then run the test, and prove that it fails. A large number of tests written after the fact can pass not only when you run the test, but also then you either invert the condition, or comment out the code which supplies the result. You would then write the first simple code that provided the correct result, run the test, see it pass, and then you have validated your regression test in both the passing and failing mode, giving you an executable specification of the code covered by that test. You would also have a piece of code which implements that specification, and also a documented example of how to call that module and what it's parameters are for use when writing the documentation. Assuming that it was not your first line of code you would then look to see if the code could be generalized, and if it could you would then refactor the code, which is now easier to do because it already has the regression tests for the implemented code. You would then add another unit test, which might check that the number you want to add isn't already used in a different position, and go through the same routine again, and then another bit of test and another bit of code, all the while growing your test suite until you have covered the whole module. This is where test first wins, by rapidly producing the test suite, and the code it tests, and making sure that the next change doesn't break something you have already written. This does require you to write the tests first, which some people regard as slowing you down, but if you want to know that your code works before you give it to someone else, you either have to take the risk that it is full of bugs, or you have to write the tests anyway for continuous integration, so doing it first does not actually cost you anything. It does however gain you a lot. First, you know your tests will fail. Second you know that when the code is right they will pass. third, you can use your tests as examples when you write your documentation. fourth, you know that the code you wrote is testable, as you already tested it. fifth, you can now easily refactor, as the code you wrote is covered by tests. sixth, it discourages the use of various anti patterns which produce hard to test code. there are other positives, like making debugging fairly easy, but you get my point. as your codebase gets bigger and more complex, or your problem domain gets less well understood initially, the advantages rapidly expand, while the disadvantages largely evaporate. the test suite is needed for ci and refactoring, and the refactoring step is needed to handle technical debt.
9
@julianbrown1331 if as you accept, it is a nightmare to maintain by hand, and it is not suitable for version control, it is almost certainly bad code. most reasons for code to be hard to,maintain are due to breaking good practice, and the code which was generated based on largely untested training data is almost certain to be a testability nightmare are well. not to mention the security issues, copyright problems, etc.
3
the fact is that mandatory automatic updates is in breach of any sensible security, stability or resilience policy. they should be able to say there is an update available, and you should be able to say that it does not get installed until it has been triggered by you.
3
third party mandatory automated updates that you cannot turn off have no place in any security policy for corporate use. these types of updates are basically saying that your system is so unimportant that you can let someone you don't know decide that the system can be shutdown now, and it does not matter if it needs a complete reinstall to fix it. that is why every half way competent engineer here is saying wtf. it has been known bad practice for systems which matter for decades.
3
@julianbrown1331 partly it is down to the training data, but the nature of how they work does not filter them by quality either before, during, or after training, so lots of these systems are producing code which is as bad as that produced in the average training data, most of which is produced by newbies learning either the languages or the tools. also, you misrepresent how copyright law works in practice. when someone claims you are using their code, they only have to show that it is a close match. to avoid summary judgement against you, you have to show that it is a convergent solution from the constraints of the problem space, and that there was no opportunity to copy the code. given that there have been studies showing that for edge cases with very few examples they have produced identical code snippets right down to the comments in the code, good luck proving no chance to copy the code. just saying i got it from microsoft copilot does not relieve you of the responsibility to audit the origins of the code. even worse, microsoft cannot prove it was not copied either, as the nature of statistical ai obfuscates how got from the source data to the code they gave you. even worse, the training data does not even flag up which license the original code was under, so you could find yourself with gpl code with matching comments leaving you with your only choice being to release your proprietary code under the gpl to avoid triple damages and comply with the license. on top of that, the original code is usually not written to be security or testability aware, so it has security holes, is hard to test, and you can't fix it.
2
My biggest problem with object oriented programming is that for most of it's history people using different languages could not agree which things were fundamental, and which things were just implementation specific. Lots of early oop systems were just the same as before with an oop sticker slapped on it.
2
And yet every technique has this same issue. You cannot explain the new technique without working on small, manageable demonstrations. If you try anything else, you quickly get bogged down in the problem details, and obfuscate how the method is actually supposed to work, You can only move on to your example when you have a clear understanding of the method, to try and get better understanding of how and why it works, but first you have to understand how to do it.
2
it did not become a brick. both microsoft and crowdfire released kernel updates. one of them broke everything. microsoft had no means to automatically block them on next reboot. therefore microsoft is not fit for locked down corporate use. unfortunately nobody else is either.
2
That is basically what the delta airlines suit is.
2
Everything is taught using easy examples. The reason is simple. If you try and teach using real world large problem examples then it becomes less about what you are trying to teach, and more about the detailed understanding of the large problem you are using as your example. So what a good teacher will do is look for the simplest example possible which still allows you to demonstrate as many details as possible of the thing you are trying to teach. If it turns out a specific aspect doesn't work with that example, look for another one that covers as many of the remaining details as possible. It is for just this reason that every new way of doing AI in games starts by examining how it works with tic tac toe and moves on to harder examples as needed. Because you already know the problem space, you can spend all of your time looking at the potential solution and trying to understand the techniques involved.
2
The people making those arguments are just proving that they don't understand testing, and can't figure out how to do continuous integration. Tdd only pushes towards lots of abstraction when trying to get legacy code under control. The rest of the time it drives towards writing testable modular code with io pushed to the edges of the system, producing self testing code with a thin and thus easily replaceable user interface. While this does move the regression tests to the front of the process, it makes the debugging phase nearly non-existent and what little is left becomes fairly trivial, as does expanding the codebase. The data is already in about which approach between no test, test after and test first produces better results in the form of the dora reports, and it says do more testing earlier and at the right level and get more, better and easier to expand code faster in the long term. It also points to the long term not being that long.
2
The context switching is actually one of the points of tdd. You are constantly validating that your regression test fail, then pass with passing code, then using the refactoring step which is now easy as you have the tests to handle the technical debt. All the while you are discouraging habits which produce hard to test, coupled and in other ways legacy code. As a side effect you end up with a validated test suite for continuous integration, code which you know passes, new code going into version control which you know works and so do the tests that go in with it, and debugging becomes easier due to the smaller changes. You also end up with high code coverage, testable code, a starting set of examples for writing your documentation, and lots of other benefits.
2
you mean an argument for continuous delivery. ci and cd are designed to block deployments of broken systems.
2
sorry, but with broken kernel drivers for any operating system for locked down corporate use you either need automated roll back after a failed reboot, or you need automated bad driver detection and isolation. nobody implements the latter, and windows safe mode does not qualify as the former.
2
@AK-vx4dy infrastructure is any person, device or service without which your business stops trading and starts losing money. resilience planning requires you to ask what parts of your business does this cover, and what plans do you need to put in place to cope with it. chaos engineering then goes one step further, and deliberately blocks that infrastructure to see if your plans are good enough.
2
the value of maintainence is not in speed of change, but in the fact that when done well, it produces ever improving code which is easier to change. this requires minor updates which make specific changes to make particular types of improvements, which requires understanding why the code is less than optimal, and which change is the better one to make. this is fundamentally at odds with how statistical ai in general works, and when you regenerate sections of code in big blocks, you have no reason to believe that what it guessed this time is any better than what it guessed last time, or that it is not throwing away better code to replace it with something worse. it also fundamentally screws up the whole idea of version control, as it is impossible to create good commit messages, and you are repeatedly just bulk replacing large chunks of code rather than evolving it.
2
That one is actually quite easy to answer if you look at how GUI code has been historically developed. First, you write too much of the UI which does nothing. Second, you write some code which does something but you embed it in the UI code. Third, you don't do any testing. Fourth, due to the lack of testing, you don't do any refactoring. Fifth, you eventually throw the whole mess over the wall to the q & a department, who moan that it is an untestable piece of garbage. Sixth, you don't require the original author to fix up this mess before allowing it to be used. When you eventually decide that you need to start doing continuous integration, they then have no experience of how to write good code how to test it, or why it matters. So they fight back against it. Unfortunately for them, professional programmers working for big companies need continuous integration, so they then need to learn how to do unit testing to develope regression tests, or they will risk being unproductive and risk being fired.
1
I think there is also an issue with how tdd is presented, partly preaching to the choir. A lot of those opposing tdd do not have the same definitions as the ci and tdd community. They oppose unit testing because to them a unit is a complete module, and the unit test is every test in the suite to test the module. Similarly they write the entire module, and only write regression tests when they have to, and adding tests or security or portability after the fact is always a nightmare. Because they don't write tests first, their code coverage is minimal, often consisting of UI tests and end to end tests which are fragile, and invert the testing pyramid. A lot of them come from the windows ecosystem or from the object orientation community, where the definitions don't match.
1
was it ci which was the problem, or the need to start writing tests for hard to test legacy code and bad managers making things worse?
1
@julianbrown1331 yes, you can treat it as a black box, only doing version control on the tests, but as soon as you do you are relying on the ai to get it right 100% of the time, which even the best symbolic ai systems cannot do. also, the further your requirements get away from the ones defining the training data, the worse the results get. also the copyright issues are non trivial. when your black box creates infringing code, then by design you are not aware of it, and have no defence against it. even worse, if someone infringes your code, you again do not know by design, cannot prove it, as you are not saving every version, and if you shout about how you work, there is nothing stopping a bad actor copying the code, saving the current version, letting you generate something else, then suing you for infringement, which you cannot defend against because you are not saving your history. it is the wrong answer to the wrong problem, with the potential legal liabilities being huge.
1
Just to call you out, but you got something wrong. The EU problem was that Microsoft wanted to implement something in a way that only their client software could access, which was monopoly abuse to try and extend their monopoly to other parts of the software. It was pointed out to them that this was illegal in the EU, and they needed to go away and find a legal way to do what they wanted. Microsoft then decided that they would not implement the idea at all rather than letting their competitors use the same api's, but that was totally their call. The EU case was never about enforcing quality, but about not illegally extending the monopoly.
1
@ulrichborchers5632 dealing with your points in no particular order. 1, the EU did not favour competition over quaIty. Having previously found Microsoft guilty of trying to extend their os monopoly to give them control of the browser market, they flagged up to Microsoft that their proposed change was equally guilty, and thus illegal with regards to the security market. All they said to Microsoft was go have another think and look for a legal way to do what you want to do. Making the api accessible to all security firms equally would have solved that problem, removing the attempt to expand their monopoly, but instead of that, Microsoft chose to abandon the proposed api instead. This was Microsofts choice, not the eu's. When apple had the same choice later, they provided a kernel api to all user land programs which could thus move the user code out of the kernel, which is usually a good idea. The fact that the change was initially proposed and that apple later did basically the same thing shows that the idea had merit. The fact that Microsoft chose to roll back the proposed change rather than share it with everyone was a commercial decision at Microsoft, not a technical one, as can be seen from the fact that a lot later they basically had to copy apple. The EU literally had nothing to say about the technical merits or the options Microsoft had, they just said you can't do that, it's illegal, have another go and get back to us. There were multiple options for Microsoft, who chose for commercial reasons to just roll back the change, but it was their choice. 2, third party code in the kernel. There are various good reasons to have third party code in the kernel, and multiple ways you can handle it. One reason is speed, which is the case for low level drivers like graphics. Another is the level of access needed to do the job, which is the case for security software. Linux handles it by requiring this code to be added to the kernel under the same gpl2 license as the rest of the code, or if you want to keep it private, you get a lower level of access to some of the api's. Closed source systems like Microsoft have a number of options. They tried just letting any old garbage in, which was one of the reasons the win95 to windows me consumer kernel was such a flakes and crash prone pice of garbage. They tried forcing the graphics drivers into user space, which is why windows 3.11 did not really crash that much, but this made the thing slow. They tried going through the driver signing route, but that is underfunded and thus both slow and expensive, but more importantly, cloudstrike showed that by allowing the code to be a binary blob, it was easy to subvert. The backdoor as as you called it was basically a repl inside the driver, which partly makes sense for this use case, but the way it was implemented was as another binary blob which their own code did not get to see. The only way to handle this safely is to create a cryptographic hash at creation time, which would be easy to check at download time to spot corruption. They did not do this. This only tells you it did not get corrupted in flight, so you also need testing. While they claimed to be doing this, they did not do it. Next, as it is kernel code, you provide a crash detection method which rolls back to an earlier version if it can't boot. Again they claimed to do this but didn't. Lastly you allow critical systems to run the previous version. Again they claimed to do this, but the repl binaries were explicitly excluded from this, so when the primaries crashed due to the bad update, they tried switching to systems which were supposed to run the previous version, and found that it only applied to the core code, and not the binary repl blobs, so as soon as these were updated with those blobs they crashed as well. I find their argument that they did not have time for testing and canary releasing as specious as the security through obscurity argument. If you don't have time for tests, you really don't have time to manually reboot all the machines you crashed. The testing time to determine that your new update crashed the system is minimal. Just install the update to some local machines, reboot, and have the machine ping another machine to say it booted up OK, and you are done. They could have done this easily, but we know from the way that they crashed the Linux kernel with a previous bug (which happened to be in the kernel) that they did not do this. We have known how not to do this for at least 3 decades for user space code, and for kernel space code you should be even more careful, but they just could not be bothered to do any of the things which would have turned this into a non event, including doing what they told their customers they were doing. This is why the lawsuits are coming, and why at least some of them have a good chance of winning against them.
1
This is mostly not a microsoft issue. It is a kernel space code issue which applies to all operating systems. Any line of kernel code can have a bug, and the wrong bug in the kernel can take down the whole machine, irrelevant of which operating system you are using. Microsoft were guilty of failing to fix a previously known bug where a kernel mode driver can request to always run, and when it crashes due to a bug it forces a permanent reboot cycle, which has hit them multiple times before.
1
@quantumangel I'm not defending windows or Microsoft, but apart from the boot loop problem in this incident it is all on crowdstrike. However the failure mode of buggy third party kernel code can crash all operating systems.
1
You should write the tests that matter for your use case. If the software needs certain functions, add tests for those in your unit and integration tests which are run when you do continuous integration. If you find out later that you have various minimal requirements for non functional attributes of the software, add them to your continuous delivery pipeline to make sure it never drops below those limits. Just remember that retrofitting capabilities to code to pass tests for newly discovered requirements can be much more expensive than writing the tests first and then making the code pass, but in the end when to write the tests, and howmexpensive it is to write them is your choice.
1
@BigWhoopZH no, e-gate happened because it was delivering software updates over the same capped internet connection used to do live testing, but it should not have been connecting to head office for every check anyway, it should have been using a local mirror of the data for resilience.
1
@Artoooooor the only problem with that test is the wait instruction. What happens if you are running on the latest, blindingly fast, machine, and your user is running on an older, slower machine. In that case you need to either make the wait longer, and if so by how much, or you have to make the prior function synchronous so that it doesn't return until the write finishes. Personally I would favour the second approach unless you are specifically trying to emulate the timing as well, in which case I would move the wait inside the function call and bookend the function with a timing function to force the wait to be at least however long it needed to be.
1
legacy code almost by definition is code where testing is at best an afterthought, so retrofitting it to be testable is a pain. luckily, you don't have too. not all code is created equal, so you write new code using tdd, and as part of the refactoring step, you move duplicate code out of the legacy section, modifying and writing just enough tests to make sure it continues to work. this results in a code base where the amount of untestable code keeps reducing, while the code under test keeps increasing. more importantly, you only modify the legacy code when it needs changing. the rest stays the same. working any other way is basically chasing code coverage for a system not designed to be tested, which is why dave says trying to force it under tdd style tests is a bad idea. over time, more and more code needs modifying, and thus comes under test.
1
You call them requirements, which generally implies big upfront design, but if you call them specifications it makes things clearer. Tdd has three phases. In the first phase, you write a simple and fast test to document the specifications of the next bit of code you are going to write. Because you know what that is you should understand the specification well enough to write a test that is going to fail, and then it fails. This gives you an executable specification of that piece of code. If it doesn't fail you fix the test. Then you write just enough code to meet the specification, and it passes, proving the test good because it works as expected and the code good because it meets the specification. If it still fails you fix the code. Finally you refactor the code, reducing technical debt, and proving that the test you wrote is testing the API, not an implementation detail. If the valid refactoring breaks the test you fix the test, and keep fixing it until you get it right. At any point you can spot another test, make a note of it, and carry on, and when you have completed the cycle you can pick another test from your notes, or write a different one. In this way you grow your specification with your code, and is it incrementally to feed back into the higher level design of your code. Nothing stops you from using A.I. tools to produce higher level documentation from your code to give hints at the direction your design is going in. This is the value of test first, and even more so of tdd. It encourages the creation of an executable specification of the entirety of your covered codebase, which you can then throw out and reimplement if you wish. Because test after, or worse, does not produce this implementation independent executable specification it is inherently weaker. The biggest win from tdd is that people doing classical tdd well do not generally write any new legacy code, which is not something you can generally say about those who don't practice it. If you are generally doing any form of incremental development, you should have a good idea as to the specifications of the next bits of code you want to add. If you don't you have much bigger problems than testing. This is different from knowing all of the requirements for the entire system upfront, you just need to know enough to do the next bit. As to the issue of multi threading and micro services, don't do it until you have to and then do just enough. Anything else multiplies the problems massively before you need to.
1
Despite doubling the number of programmers every five years, the need for them is going up even faster. This means that the demand exceeds supply and therefore if the development environment is not good and relies on getting crappy code out of the door fast, you should start looking for a better job as soon as you can. Just make sure in your interview that you ask the right questions to ensure that you are moving to somewhere better before taking the job. A good interviewer will let you ask questions to get a good fit between your needs and theirs, as they won't retain staff if they don't.
1
@davidvernon3119 the success rate isn't great, but the point was mainly that there are more job vacancies than programmers, so you don't need to keep working for a bad employer. obviously being invited to join another company is better, but you don't need to wait for the invite.
1
@CallousCoder this is not a os fanboy issue. every operating system could easily start by loading just enough modules to be able to read and write a file on pemenent storage. it could the load a signature list of modules loaded last time. when the module asks to be loaded, it can check if it is changed, and if so add it to a block next time list which it writes to storage immediately. if was already there, don't load it. when the kernel finishes booting, it can clear the block list, and save it to storage. at that point, the kernel can recover on next reboot simply by disabling all the updated drivers on the blocked list, and this incident never happens.
1
@CallousCoder sorry, responded to the wrong post
1
@Aleks-fp1kq because they see it as paying 30% extra. The same management will probably also really dislike refactoring, and will probably like the feature treadmill, while not getting that ignoring all the things mentioned slows down the ability to implement features. In all of these cases you are just moving stuff you will have to do anyway in any codebase above a certain size to the front of the process where it will make the biggest difference, which is why you do it first.
1
@gppsoftware the linux kernel gets a new patch every 30 seconds on average, so there are cases where multiple updates per day make sense. netflix do this quite well. good ci and cd act to stop you releasing if there is a problem, so a snafu this bad would not make it out the door. even if it got out, canary releasing to your own computers would quickly stop the roll out of the patch.
1
@vladimirpopov2373 that sounds like a similar buggy driver issue, but that is just normal windows problems, as the driver just broke, but did not crash the kernel.
1
@zed5129 someone made comments about how multiple releases per day do not make any sense under any circumstances. i was using the linux kernel development process as one example where that is obviously not the case. virus recognition signature files at the heart of the problem here is another. in response to this need we have over many years developed methods to avoid exactly the issue that happened here and mitigate the extra risks involved, but the evidence clearly indicates that despite these methods being common knowledge, either they do not use them, or in this instance they chose to subvert them.
1
any os kernel module can bsod a system. there are only two ways to stop it. 1 require every module to be submitted to os vendor for intensive testing prior to release. microsoft tried it and nobody wanted to pay them them thousands to sign every minor patch. 2, have the os catch bad drivers, and automatically block them next reboot. nobody does this. a poor approximation is windows safe mode, which does not work in a corporate environment. so every kernel driver update can hose your system with a need for a full reinstall.
1
@Storytelless first, you separate out the code that does the search, as it is not part of the ui, and write tests for it. Then you have some ideas as to what parameters you want to pass to the search, which are also not part of the ui, and add tests for those. Finally, you know what those parameters are going to be, which hints at the ui, but you have the ui build the search query, and test that that query is valid. Only after all of this is it necessary to finalise the shape of the ui, which is a lot easier to test due to all of the other stuff you have already removed. The ui should be a thin shim at the edge of the code which just interacts with the user to get the parameters. This was it is easier to replace it with a different one if your ui design changes, because you have already removed all of the other stuff which is ui dependent. You can then test the ui using one of the guides test frameworks, just looking to see if the automated step you have recorded actually selects what you expected it to and returns the correct value.
1
but the state of measurement in software stinks. with the exception of the dora metrics, they are all equally bad, leading to a situation similar to the drunk looking for his keys undrr the street light, even though that is not where they dropped them. while i don't dispute the value of good measurement, we have not done enough work on large version control repositories to develop good measurements.
1
This problem happens regularly, but the consequences are usually less severe. In this case it was an off by one error which could easily have been caught if they had actually been doing what they told their customers they were doing, but as it applied to kernel code running critical infrastructure, lives were lost. In another case, it was a radiotherapy machine which defaulted to maximum dose and then reduced it under user selection, not informing the user of the actual setting for the dosage, which killed people. There are also cases in architecture where someone changed the plans without doing any testing, and walkways collapsed killing people. The same thing happened with mechanical linkages in the tail flight controls for some aircraft, resulting in crashes and loss of life. Most of these types of failure are more akin to having your office suite crash and cost you a couple of hours of work.
1
using c as an example, anything in the header file is part of the public api, which is the part you test. anything in the c source code file is implementation specific. you should be able to rename this c file and replace it with a completely new one, and all the tests should still pass once you have a complete implementation. to the extent it does not, you were testing internal details of the previous implementation. that is what is meant by do not test the implementation, that you should be able to do this, and to some extent get full code coverage just using the public api. any code not covered is dead code and can be deleted if your test suite is complete.
1
yes it is. have a modular kernel, flag up which modules have changed on shutdown, when rebooted clear the list, and if the reboot did not complete, block those modules on next reboot.
1
it is almost a certainty. this hit hospital emergency wards and operating theaters. if nobody died due to lack of necessary data it will be a miracle.
1
Not only is there the issue of not testing the private api, but there is also the issue that people from the oop community regard a unit as equivalent to the module, and thus the unit test to them is the entire test suite for that module, which is not what a unit test actually is.
1
This is because it will not work unless it is as intrusive and mandatory as the ntsb, with appropriate legal liabilities for management, and good luck getting that past any legislature funded by lobbyists. In the end it will be like the smoking ban in the UK, in that they will face a choice between coming to a legal framework that works, or delaying and having a much worse one implemented by creeping binding case law. Most people are probably criticising the practicality of getting something good passed, not the idea of having it, but I am sure that there will be plenty of short sighted idiots who cannot see which way the wind is blowing.
1
@tediustimmy that is the entire point of testing. We automate because when you do it manually it is not repeatable. We test to prove that the code is not fit for release, and block it if any of the tests fail, until you figure out why it failed and fix the underlying problem. The history of testing tells us that tests fundamentally fall into two different categories. Type 1 tests are deterministic, repeatable, and provide a gold standard test which if it fails should block the release. Type 2 tests are non deterministic, flaky, and a pain to find the source of the problem. These will usually pass when you rerun the test a couple of times. Due to the different failure modes, you can easily partition your tests into these two types, blocking on Type 1, and not blocking on type 2. You should do everything you can to migrate code coverage from Type 2 tests to type 1, including refactoring the code so that less of it falls under the Type 2 scope. The reason for this is simple. Type 2 tests spot heisenbugs, which will eventually find a way to come back and bite you. As such, they provide a valuable source of data as to the risks in your code, but should only be allowed to exist while you search for the source of the bug. As such, long standing Type 2 tests are a really big red flag, but every piece of non trivial code has some.
1
@llothar68 every operating system I have seen has the equivalent to a command to reboot or shutdown now or after a short time. If you run it in a virtual machine and have a heartbeat process, when the external system loses the heartbeat it can kill the virtual machine and report it. Both of these steps can be automated.
1
@mrpocock the clients were under the impression that they were doing in house testing, but cloudstrike subverted expectations. By splitting the code into well tested and signed core code, and untested binary driver patches, they changed the rules. They had an option to use the n-1 version of the code, but their salesmen and documentation failed to have any knowledge that this only applied to them core code. The secondary code was rolled out to all versions of the core code, and this is where the bug lived. This outage was 100% avoidable, they just could not be bothered to do what was needed to avoid it.
1
@wwkw4992 actually, it is not a good idea to create the tests for your own code under two conditions. The first is when the company demands that all testing is done after the code is thrown over the wall to the testing silo.This not only produces worse tests, but makes those tests flakey and fragile. The other time is for writing acceptance tests, when the developer is rubbish at talking with the customer, where you need someone with a clue to extract that information for the developer. In all other cases test first works better.
1

Previous
1
Next
...
All