Comments by "" (@grokitall) on "ThePrimeTime"
channel.
-
8
-
7
-
4
-
4
-
3
-
3
-
3
-
2
-
2
-
Absolutely right. Unit tests do automated regression testing of the public API of your code, asserting io combinations to provide an executable specification of the public API. When well named, the value of these tests are as follows:
1, Because they test only one thing, generally they are individually blindingly fast.
2, when named well, they are the equivalent of executable specifications of the API, so when it breaks you know what broke, and what it did wrong.
3, they are designed to black box test a stable public API, even if you just started writing it. Anything that relies on private API's are not unit tests.
4, they prove that you are actually writing code that can be tested, and when written before the code, also proves that the test can fail.
5, they give you examples of code use for your documentation.
6, they tell you about changes that break the API before your users have to.
Points 4 and 6 are actually why people like tdd. Point 2 is why people working in large teams like lots of unit tests.
Everyone I have encountered who does not like tests, thinks they are fragile, hard to maintain, and otherwise a pain, and who was willing to talk to me about why usually ended up to be writing hard to test code, with tests at to high a level, and often had code with one of many bad smells about it. Examples included constantly changing public API's, over use of global variables, brain functions or non deterministic code.
the main output of unit testing are code that you know is testable, tests that you know can fail, and knowing that your API is stable. As a side effect of this, it pushes you away from coding styles which makes testing hard, and discourages constantly changing published public API's. A good suite of unit tests will let you completely throw away the implementation of the API, while letting your users continue to use it without problems. It will also tell you how much of the reimplemented code has been completed.
A small point about automated regression tests. Like trunk based development, they are a foundational technology for continuous integration, which in turn is foundational to continuous delivery and Dev ops, so not writing regression tests fundamentally limits quality on big, fast moving projects with lots of contributes.
2
-
2
-
no, waterfall was not a thought experiment, it was something that emerged by discovering that we also need to do x, then just tacking it on the end.
the original paper said "this is what people are doing, and here is why it is a really bad idea".
people then took the diagram from the original paper and used it as a how to document.
the problem with it is that it cannot work for 60% of projects, and does not work very well for a lot of the others.
they tried fixing it by making it nearly impossible to change the specs after the initial stage, and while it made sure projects got built, it ended up with a lot of stuff which was delivered obsolete due to the world changing and the specs staying the same.
the agile manifesto occurred in direct response to this saying here is what we currently do, but if we do this instead, it works better, making incremental development work for the other 60%.
2
-
i think he is mainly wrong about just about everything here.
there is a lot of reimplimentation in open source, as people want open source usage of preexisting filetypes and apis, but this is useful.
technical quality matters, otherwise you end up with bit rot.
people who can do cool things is not the same as people who can explain cool things and otherwise communicate well.
most of the rest of it sounds like someone who sends in a 90k pull request then feels but hurt that nobody is willing to take it without review, and it is just too big to review.
20 years ago we had open source developed with authorised contributers, and it just caused way to many problems, which is why we now do it with distributed version control, and very small pull requests. this also removes most of the "how dare he criticise me" attitude due to being less invested in a 10 line patch than a 10k patch.
also, text does not transmit subtle very well, and when it does, most people miss it. this leads to unsubtle code criticism, to maintain the engineering value.
2
-
2
-
2
-
2
-
the issue of what makes code bad is important, and has to do with how much of the complexity of the code is essential vs accidental. obviously some code has more essential complexity than others, but this is exactly when you need to get a handle on that complexity.
we have known since brooks wrote the mythical man month back in the 1970s that information hiding matters, and every new development in coding has reinforced the importance of this, which is why abstraction is important, as it enables this information hiding.
oop, functional programming, tdd, and refactoring all build on top of this basic idea of hiding the information, but in different ways, and they all bring something valuable to the table.
when you have been in the industry for a short while, you soon encounter a couple of very familiar anti patterns, spaghetti code, the big ball of mud, and the worst one is the piece of snowflake code that everyone is afraid to touch because it will break.
all of these are obviously bad code, and are full of technical debt, and the way to deal with them is abstraction, refactoring, and thus testing.
given your previously stated experience with heavily ui dependant untestable frameworks, therefore requiring heavy mocking, i can understand your dislike of testing, but that is due to the fact that you are dealing with badly designed legacy code, and fragile mocking is often the only way to start getting a handle on
legacy code.
i think we can all agree that trying to test legacy code sucks, as it was never designed with testing or lots of other useful things in mind.
lots of the more advanced ideas in programming start indirectly from languages where testing was easier, and looked at what made testing harder than it needed to be, then adopted a solution to that particular part of the problem.
right from the start of structured programming, it became clear that naming mattered, and that code reuse makes things easier, first by using subroutines more, then by giving them names, and letting them accept and return parameters.
you often ended up with a lot of new named predicates, which were used throughout the program. these were easy to test, and by moving them into well named functions it made the code more readable. later this code could be extracted out into libraries for reuse across multiple programs.
this lead directly to the ideas of functional programming and extending the core language to also contain domain specific language code.
later, the realisation that adding an extra field broke apis a lot lead to the idea of structs, where there is a primary key field, and multiple additional field. when passed to functions, adding a new field made no difference to the api, which made them really popular.
often these functions were so simple that they could be fully tested, and because they were moved to external libraries, those tests could be kept and reused. this eventually lead to opdyke and others finding ways to handle technical debt which should not break good tests. this came to be known as refactoring.
when the test breaks under refactoring, it usually means one of 2 things:
1, you were testing how it did it, breaking information hiding.
2, your tools refactoring implementation is broken, as a refactoring by definition does not change the functional nature of the code, and thus does not break the test.
when oop came along, instead of working from the program structure end of the problem, it worked on the data structure side, specifically by taking the structs, adding in the struct specific functions, and calling them classes and method calls.
again when done right, this should not break the tests.
with the rise of big code bases, and recognition of the importance of handling technical debt, we end up with continuous integration handling the large number of tests and yelling at us when doing something over here broke something over there.
ci is just running all of the tests after you make a change to demonstrate that you did not break any of the code under test when you made a seemingly unrelated test.
tdd just adds an extra refactoring step to the code and test cycle, to handle technical debt, and make sure your tests deal with what is being tested, rather than how it works.
cd just goes one step further and adds acceptance testing on top of the functional testing from ci to make sure that your code not only still does what it did before, but has not made any of the non functional requirements worse.
testing has changed a lot since the introduction of ci, and code developed using test first is much harder to write containing a number of prominent anti patterns.
2
-
I think llvm did not do itself any favours. Originally it used gcc as its backend to provide support for multiple triples, but later defined them in an incompatible way. Seems silly to me.
It has long been possible for compiler writers to define int64 and int32 for when it matters and let the programmer use int when it does not matter for portability. The compiler writer should then use the default sizes for the architecture, rather than just using int.
At abi implementation time, it matters, so there should not be any it depends values in any abi implementation.
Of course that case mentioned is not the only time the glibc people broke the abi.
I think it was the version 5 to 6 update, where they left the parts that worked like c the same, but broke the parts that worked like c++, but did not declare it as a major version bump, so every c program still worked, but every c++ library had to be recompiled, as did anything which used the c++ abis.
Another instance of full recompile required, and it has become obvious that the glibc authors don't care about breaking users programs.
2
-
2
-
@noblebearaw actually, the bigger problem is that black box statistical ai has the issue that even though it might give you the right answer, it might do so for the wrong reason. there was an early example where they took photos of a forest with and without tanks hiding in it, and it worked. they then went back and took more photos of camouflaged tanks, and it didn't work at all. they managed to find out why, and the system had learned that the tank photos were taken on a sunny day, and the no tank photos were taken on a cloudy day, so the model learned how to spot sunny vs cloudy forest pics.
while the tech has improved massively, because statistical ai has no model except likelihood, it has no way to know why the answer was right, or to fix it when it is found to get the answers wrong. white box symbolic ai works differently, creating a model, and using the knowledge graph to figure out why the answer is right.
2
-
2
-
ci came from the realisation that the original paper from the 70s saying that the waterfall development model, while common was fundamentaĺly broken, and agile realised that to fix it, you had to move things that appear late in the process to an earlier point, hence the meme about shift left.
the first big change was to impliment continuouse backups, now refered to as version control.
another big change was to move tests earlier, and ci takes this to the extreme by making them the first thing you do after a commit.
these two things together mean that your fast unit tests find bugs very quickly, and the version control lets you figure out where you broke it.
this promotes the use of small changes to minimise the differences in patches, and results in your builds being green most of the time.
long lived feature branches subvert this process, especially when you have multiple of them, and they go a long time between merges to the mainline (which you say you rebase from).
specifically, you create a pattern of megamerges, which get bigger the longer the delay. also, when you rebase, you are only merging the completed features into your branch, while leaving all the stuff in the other megamerges in their own branch.
this means when you finally do your megamerge, while you probably don't break mainline, you have the potential to seriously break any and all other branches when they rebase, causing each of them to have to dive into your megamerge to find out what broke them.
as a matter of practice it has been observed time and again that to avoid this you cannot delay merging all branches for much longer than a day, as it gives the other braches time to break something else resulting in the continual red build problem.
2
-
2
-
2
-
2
-
2
-
2
-
@alst4817 my point about black box ai is not that it cannot be useful, but due to the black box nature, it is hard to have confidence that the answer is right, that this is anything more than coincidence, and the most you can get from it is a possibility value for how plausible the answer is.
this is fine in some domains where that is good enough, but completely rules it out for others where the answer needs to be right, and the reasoning chain needs to be available.
i am also not against the use of statistical methods in the right place. probabilistic expert systems have a long history, as do fuzzy logic expert systems.
my issue is the way these systems are actually implemented. the first problem is that lots of them work in a generative manner.
using the yast config tool of suse linux as an example, it is a very good tool, but only for the parameters it understands. at one point in time, if you made made any change using this tool, it regenerated every file it knew about from its internal database, so if you needed to set any unmanaged parameters in any of those files, you then could not use yast at all, or your manual change would disappear.
this has the additional disadvantage that now those managed config files are not the source of truth, this is hidden in yasts internal binary database.
it also means that using version control on any of those files is pointless as the source of truth is hidden, and they are now generated files.
as the code is managed by those options in the config file, that should be in text format, version controlled, and any tools that manipulate them should update only the fields it understands, and only for files which have changed parameters.
similarly, these systems are not modular, instead being implimented as one big monolithic black box, which cannot be easily updated. this project is being discussed in a way that suggests that they will just throw lots of data at it and see what sticks. this approach is inherently limited. when you train something like chatgpt, where you do not organise the data, and let it figure out which of the 84000 free variables it is going to use to hallucinate a plausible answer, you are throwing away most of the value in that data, which never makes it into the system.
you then have examples like copilot, where having trained on crap code, it on average outputs crap code. some of the copilot like coding assistants actually are worse, where they replace the entire code block with a completely different one, rather than just fixing the bug, making a mockery of version control, and a .ot of tne time this code then does nit even pass the tests the previous code passed.
then we have the semantic mismatch between the two languages. in any two languages either natural or synthetic, there is not an identity of function beteeen the two languages. somethings can't be done at all in the language, and some stuff which is simple in one language can be really hard in another one. only symbolic ai has the rich model needed to understand this.
my scepticism about this is well earned, with lots of ai being ever optimistic to begin with, and then plateauing with no idea what to do next. i expect this to be no different, with it being the wrong problem, with a bad design, badly implemented. i wish them luck, but am not optimistic about their chances.
2
-
2
-
2
-
1
-
1
-
1
-
ada was commissioned because lots of government projects were being written in niche or domain specific languages, resulting in lots of mission critical software which was in effect write only code, but still had to be maintained for decades. the idea was to produce one language which all the code could be written in, killing the maintainability problem, and it worked.
unfortunately exactly the thing which made it work for the government kept it from more widespread adoption.
first, it had to cover everything from embedded to ai, and literally everything else. this required the same functions to be implimented in multiple ways as something that works on a huge and powerful ai workstation with few time constraints needs to be different from a similar function in an embedded, resource limited and time critical usage.
this makes the language huge, inconsistent, and unfocused. it also made it a pain to implement the compiler, as you could not release it until absolutely everything had been finalised, and your only customers were government contractors, meaning the only way to recover costs was to sell it at a very high price, and due to the compiler size, it would only run on the most capable machines.
and yes, it had to be designed by committee, due to the kitchen sink design requirement. the different use cases needed to fulfil its design goal of being good enough for coding all projects required experts on the requirements for all the different problem types, stating that x needs this function to be implemented like this, but y needs it to be implemented like that, and the two use cases were incompatible for these reasons.
rather than implementing the language definition so you could code a compiler for ada embedded, and a different on for ada ai, they put it all in one badly written document which really did not distinguish the use case specific elements, making it hard to compile, hard to learn, and just generally a pain to work with. it also was not written with the needs of compiler writers in mind either.
also, because of the scope of the multiple language encodings in the language design, it took way too long to define, and due to the above mentioned problems, even longer to implement. other simpler languages had already come along in the interim, and taken over a lot of the markets the language would cover, making it an also ran for those areas outside of mandated government work.
1
-
There is a reason people use simple coding katas to demonstrate automated regression testing, tdd, and every other new method of working. Every new AI game technology starts with tic tac toe, moves on to checkers, and ends up at chess or go. It does this because the problems start out simple and get progressively harder, so you don't need to understand a new complex problem as well as a new approach to solving it.
Also, the attempt to use large and complex problems as examples has been proven not to work, as you have so much attention going on the problem that you muddy attempts to understand the proposed solution.
Also, there is a problem within a lot of communities that they use a lot of terminology in specific ways that differ from general usage, and different communities use that terminology to mean different things, but to explain new approaches you need to understand how both communities use the terms and address the differences, which a lot of people are really bad at.
1
-
@chudchadanstud like ci, unit testing is simple in principle. when i first started doing it i did not have access to a framework, and every test was basically a stand alone program with main and a call, which then just returned a pass or fail, stored in either unittest or integrationtest directories, with a meaningful name so when it failed the name told me how it failed, all run from a makefile.
each test was a functional test, and was run against the public api. i even got the return values i did not know by running the test to always fail printing the result, and then verifying that it matched the result next time. when a new library was being created because the code would be useful in other projects, it then had the public api in the header file, and all of the tests were compiled and linked against the library, and all had to pass for the library to be used.
all of this done with nothing more than a text editor, a compiler, and the make program. this was even before version control took off. version control and a framework can help, but the important part is to test first, then add code to pass the test, then if something breaks, fix it before you do anything else.
remember, you are calling your public api, and checking that it returns the same thing it did last time you passed it the same parameters. you are testing what it does, not how fast, or how much memory it uses, or any other non functional property.what you get in return is a set of self testing code which you know works the same, because it still returns the same values. you also get for free an executable specification using the tests and the header file, so if you wished you could throw away the library code and use the tests to drive the rewrite to the same api.
but it all starts with test first, so that you don't write untestable code in the first place.
1
-
@TimothyWhiteheadzm for airlines it is the knock on effects which kill you. say you cannot have the passengers board the plane.
at this point, you need to take care of the passengers until you can get them on another flight.
this might involve a couple of days staying at a hotel.
then the flight does not leave. at this point neither the plane or the pilots are going to be in the right place for the next flights they are due to take. as some of these pilots will be relief crew for planes where the crew are nearing their flight time limit, that plane now cannot leave either, so now you have to do the same with their passengers as well.
in the case of delta, airlines it went one step further, actually killing the database of which pilots were where, and you could not start rebuilding it from scratch until all the needed machines were back up and running.
the lawsuit from delta alone is claiming 500 million in damages, targeting crowdstrike for taking down the machines, and microsoft for not fixing the boot loop issue which caused them to stay down.
i know of 5 star hotels which could not check guests in and out, and of public house chains where no food or drinks could be sold for the entire day, as the ordering and payment systems were both down, and they had no on site technical support.
i am sure the damages quoted will turn out to be under estimates.
1
-
There is a lot of talking past each other and marketing inspired misunderstanding of terminology going on here, so I will try and clarify some of it.
When windows 95 was being written in 1992, every developer had a fork of the code, and developed their part of windows 95 in total isolation. Due to networking not really being a thing on desktop computers at the time, this was the standard way of working.
After 18 months of independent work, they finally started trying to merge this mess together, and as you can image the integration hell was something that had to be seen to be believed. Amongst other things, you had multiple cases where the developer needed some code, and wrote it for his fork, while another developer did the same, but in an incompatible way. This lead to their being multiple incompatible implementations of the same basic code in the operating system.
At the same time, they did not notice either the rise of networking, or the importance, so it had no networking stack, until somebody asked Bill Gates about networking in windows 95 at which point he basically took the open source networking stack from bsd Unix and put it into windows.
This release of a network enabled version of windows and the endemic use of networking on every other os enabled the development of centralised version control, and feature branches were just putting these forks into the same repository, without dealing with the long times between integrations, and leaving all the resulting problems unaddressed.
If you only have one or two developers working in their own branches this is an easily mitigated problem, but as the numbers go up, it does not scale.
These are the long lived feature branches which both Dave and primagen dislike. It is worth noting that the hp laser jet division was spending 5 times more time integrating branches than it was spending developing new features.
Gitflow was one attempt to deal with the problem, which largely works by slowing down the integration of code, and making sure that when you develop your large forks, they do not get merged until all the code is compatible with trunk. This leads to races to get your large chunk of code into trunk before someone else does, forcing them to suffer merge hell instead of you. It also promotes rushing to get the code merged when you hear that someone else is close to merging.
Merging from trunk helps a bit, but fundamentally the issue is with the chunks being too big, and there being too many of them, all existing only in their own fork.
With the rise in the recognition of legacy code being a problem, and the need for refactoring to deal with technical debt, it was realised that this did not work, especially as any refactoring work which was more than trivial made it more likely that the merge could not be done at all. One project set up a refactoring branch which had 7 people working on it for months, and when it was time to merge it, the change was so big that it could not be done.
An alternative approach was developed called continuous integration, which instead of slowing down merges was designed to speed them up. It recognised that the cause of merge hell was the size of the divergence, and thus advocated for the reduction in size of the patches, and merging them more often. It was observed that as contributions got faster, manual testing did not work, requiring a move from the ice cream cone model of testing used by a lot of Web developers towards the testing pyramid model.
Even so, it was initially found that the test suite spent most of its time failing, due to the amount of legacy code, and the fragility of code to test legacy code, which lead to a more test required and test first mode of working, which moves the shape of the code away from being shaped like legacy code, and into a shape which is designed to be testable.
One rule introduced was that if the build breaks, the number one job of everyone is to get it back to passing all of the automated tests. Code coverage being good enough was also found to be important.
Another thing that was found is that one you started down the route to keeping the tests green, there was a maximal delay you could have which did not adversely affect this, which turned out to be about once per day.
Testing because increasingly important, and slow test times were deal with the same way slow build times were, by making the testing incremental. So you made a change, only built the bit which it changed, ran only those unit tests which were directly related to it, and one it passed, built and tested the bits that depended on it.
Because the code was all in trunk, refactoring did not usually break the merge any more, which is the single most important benefit of continuous integration, it let's you much more easily deal with technical debt.
Once all of the functional tests (both unit tests and integration tests), which shoukd happen within no more than 10 minutes, and preferably less than 5 minutes, you now have a release candidate which can then be handed over for further testing. The idea is that every change should ideally be able to go into this release candidate, but for some bigger features it is not ready yet, which is where feature flags come in. They replace branches with long lived unmerged code by a flag which hides the feature from the end user.
Because your patch takes less than 15 minutes from creation to integration, this is not a problem. The entire purpose of continuous integration is to prove that the patch you submitted is not fit for release, and if so, it gets rejected and you get to have another try, but as it is very small, this also is not really a problem. The goal is to make integration problems basically a non event, and it works,
The functional tests show that the code does what the programmer intended it to do. At this point it enters the deployment pipeline described in continuous delivery. The job of this is to run every other test need, including acceptance tests, whose job is to show that what the customer intended and what the programmer intended match. Again the aim is to prove that the release candidate is not fit to be released.
In the same way that continuous delivery takes the output from continuous integration, continuous deployment takes the output from continuous delivery and puts it into a further pipeline designed to take the rolling release product of continuous delivery and put it through things like canary releasing so that it eventually ends up in the hands of the end users.
Again it is designed to try it out, and if problems are found, stop them from being deployed further. This is where cloudstrike got it wrong so spectacularly. In the worst case, you just roll back to the previous version, but at all stages you do the fix on trunk, and start the process again, so the next release is only a short time away, and most of your customers will never even see the bug.
This process works even at the level of doing infrastructure as a service, so if you think that your project is somehow unique, and it cannot work for you, you are probably wrong.
Just because it can be released, delivered, and deployed, it does not mean it has to be. That is a business decision, but that comes back to the feature flags. In the meantime you are using feature flags to do dark launching, branch by abstraction to move between different solutions, and enabling the exact same code to go to beta testers and top tier users, just without some of the features being turned on.
1
-
1
-
1
-
@phillipsusi1791 it is entirely about code churn. every branch is basically a fork of upstream (the main branch in centralised version control). the problem with forks is that the code in them diverges, and this causes all sorts of problems with incompatible changes.
one proposed solution to this is to rebase from upstream, which is intended to sort out the problem of your branch not being mergable with upstream, and to an extent this works if the implicit preconditions for doing so are met.
where it falls over is with multiple long lived feature branches which don't get merged until the entire feature is done..during the lifetime of each branch, you have the potential for code in any of the branches to produce incompatible changes with any other branch. the longer the code isn't merged and the bigger the size of the changes, the higher the risk that the next merge will break something in another branch.
The only method found to mitigate this risk is continuous integration, and the only way this works is by having the code guarded by regression tests, and having everybody merge at least once a day. without the tests you are just hoping nobody broke anything, and if the merge is less often than every day, the build from running all the tests has been observed to be mostly broken, thus defeating the purpose of trying to minimise the risks.
the problem is not with the existence of the branch for a long period of time, but with the risk profile of many branches which don't merge for a long time. also, because every branch is a fork of upstream, any large scale changes like refactoring the code by definition is not fully applied to the unmerged code, potentially breaking the completeness and correctness of the refactoring.
this is why people doing continuous integration insist on at worst daily merges with tests which always pass. anything else just does not mitigate the risk that someone in one fork will somehow break things for either another fork, or for upstream refactorings.
it also prevents code sharing between the new code in the unmerged branches, increasing technical debt, and as projects get bigger, move faster, and have more contributers, this problem of unaddressed technical debt grows extremely fast. the only way to address it is with refactoring, which is the additional step added to test driven development, which is broken by long lived branches full of unmerged code.
this is why all the tech giants have moved to continuous integration, to handle the technical debt in large codebases worked on by lots of people, and it is why feature branching 8s being phased out in favour of merging and hiding the new feature behind a feature flag until it is done.
1
-
1
-
The best way to answer is to look how it works with linus Torvalds branch for developing the Linux kernel. Because you are using version control, your local copy is essentially a branch, so you don't need to create a feature branch.
You make your changes in main, which is essentially a branch of Linus's branch, add your tests, and run all of the tests. If this fails, fix the bug. If it works rebase and quickly rerun the tests, then push to your online repository. This then uses hooks to automatically submit a pull request, and linus will getting a whole queue of them, which is then applied in the order in which they came in.
When it is your turn, either it merges ok and becomes part of everyone else's next rebase, or it doesn't the pull is reverted, linus moves on to the next request, and you get to go back, do another rebase and test, and push your new fixes back up to your remote copy which will then automatically generate another pull request. Repeat the process until it merges successfully, and then your local system is a rebased local copy of upstream.
Because you are writing small patches, rather than full features, the chances of a merge conflict are greatly reduced, often to zero if nobody else is working on the code you changed. It is this which allows the kernel to get new changes every 30 seconds all day every day.
Having lots of small fast regression tests is the key to this workflow, combined with committing every time the tests pass, upstreaming with every commit, and having upstream do ci on the master branch.
1
-
1
-
In principle, I agree that all code should be that clean, but that means that there are a bunch of string functions you must not use because msvc uses a different function than gcc for the same functionality.
In practice, people write most code on a specific machine with a specific tool chain, and have a lot of it. Having to go and fix every error right away because the compiler writer has made a breaking change is a bug. So is an optimisation where the test breaks because optimisation is turned on.
In this case, what happened is that a minor version update introduced a breaking change to existing code, and instead of having it as an option you could enable, made it the default.
How most compilers do this is they wrap these changes in a feature flag, which you can then enable.
On the next major version, they enable it by default when you do the equivalent of -Wall, but let you disable it.
On the one after that it becomes the default, but you can still override it for legacy code which has not been fixed yet.
Most programmers live in this later world where you only expect the compiler to break stuff on a major version bump, and you expect there to be a way to revert to the old behavior for legacy code.
1
-
1
-
1
-
1
-
1
-
1
-
1
-
1
-
it is not business ethics which require the shift your company policy, but the resiliency lessons learned after 9/11 which dictate it.
many businesses with what were thought to be good enough plans had them fail dramatically when faced with the loss of the data centers duplicated between the twin towers, the loss of the main telephone exchange covering a large part of the city, and being locked out of their buildings until the area was safe while their backup diesel generators had their air intake filters clog and thus the generator fail due to the dust.
the recovery times for these businesses for those it did not kill were often on the order of weeks to get access to their equipment, and months to get back to the levels they were at previously, directly leading to the rise of chaos engineering to identify and test systems for single points of failure and graceful degradation and recovery, as seen with the simian army of tools at netflix.
load balancing against multiple suppliers across multiple areas is just a mitigation strategy against single points of failure, and in this case the bad actors at cloudflare were clearly a single point of failure.
with a good domain name registrar, you can not only add new nameservers, which i would have done as part of looking for new providers, but you can shorten the time that other people looking up your domain cache the name server entries to under an hour, which i would have also done as soon as potential new hosting was being explored and trialed.
as long as your domain registrar is trustworthy, and you practice resiliency, the mitigation could have been really fast. changing the name server ordering could have been done as soon as they received the 24 hour ransom demand, giving time for the caches to move and making the move invisible for most people.
not only did they not do that, or have any obvious resiliency policy, but they also built critical infrastructure around products from external suppliers without any plan for what to do if there was a problem.
clearly cloudflare's behaviour was dodgy, but the casino shares some of the blame for being an online business with insufficient plans for how to stay online.
1
-
1
-
1
-
1
-
@373323 there are a number of companies who were running n-1 or n-2 versions of the driver, which crowdstrike support, but the issue here is that it was company policy as stated by the ceo to immediately push the signature files out to everyone in one go, without further testing.
the information from crowdstrike is that the engineer in question picked up an untested template, modified it for the case in hand, ran a validator program against it which had not been updated to cover that template (and thus should have failed it), and once that passed, picked up the files, and shipped them out to everyone with no further testing, as per company policy.
it then took them 90 minutes to spot that there was a problem, and do a 2 minute fix to roll back the update to stop the rollout and fix any machines with the bad update that had not yet rebooted.
it took them 6 hours from the rollout to have a solution to the problem of how to fix the rebooted machines, but it only really worked on basic desktops which did not need security.
at least one company reported spending 15 hours manually rebooting and fixing 40,000 machines. some were worse.
1
-
1
-
1
-
we now know what should have happened, and what actually happened, and they acted like amateurs.
first, they generated the file, which went wrong.
then they did the right thing, and ran a home built validator against it, but not as part of ci.
then after passing the validation test they built the deliverable.
then they shipped it out to 8.5 million mission critical systems with no further testing whatsoever which is a level of stupid which has to be seen to be believed.
this then triggered some really poor code in the driver, crashing windows, and their setting it into boot critical mode caused the whole thing to go into the boot loop.
this all could have been stopped before it even left the building.
after validating the file, you should then continue on with the other testing just like if you had changed the file. this would have caught it.
having done some tests, and created the deployment script, you could have installed it on test machines. this also would have caught it.
finally, you start a canary release process, starting with putting it on the machines in your own company. this also would have caught it.
if any of these steps had been done it would never have got out the door, and they would have learned a few things.
1, their driver was rubbish and boot looped if certain things went wrong. this could then have been fixed so it will never boot loop again.
2, their validator was broken. this could then have been fixed.
3, whatever created the file was broken. this could also have been fixed.
instead they learned different lessons.
1, they are a bunch of unprofessional amateurs.
2, their release methodology stinks.
3, shipping without testing is really bad, and causes huge reputational damage.
4, that damage makes the share price drop of a cliff.
5, it harms a lot of your customers, some with very big legal departments and a will to sue. some lawsuits are already announced as pending.
6, lawsuits hurt profits. we just don't know how bad yet.
7, hurting profits makes the share price drop even further.
not a good day to be cloudstrike.
some of those lawsuits could also target microsoft for letting the boot loop disaster happen, as this has happened before, and they still have not fixed it.
1
-
1
-
1
-
1
-
1
-
1
-
1
-
the main problem here is that prime and his followers are responding to the wrong video. this video is aimed at people who already understand 10+ textbooks worth of stuff with lots of agreed upon terminology, and is aimed at explaining to them why the tdd haters don't get it, most of which comes down to the fact that the multiple fields involved build on top of each other, and the haters don't actually share the same definitions for many of the terms, or of the processes involved.
in fact in a lot of cases, especially within this thread, the definitions the commentators use directly contradict the standard usage within the field.
in the field of testing, testing is split into lots of different types, including unit testing, integration testing, acceptance testing, regression testing, exploratory testing, and lots of others,
if you read any textbook on testing, a unit test is very small, blindingly fast, does not usually include io in any form, and does not usually include state across calls or long involved setup and teardown stages.
typically a unit test will only address one line of code, and will be a single assert that when given a particular input, it will respond with the same output every time. everything else is usually an integration test. you will then have a set of unit tests that provide complete coverage for a function.
this set of unit tests is then used as regression tests to determine if the latest change to the codebase has broken the function by virtue of asserting as a group that the change to the codebase has not changed the behaviour of the function.
pretty much all of the available research says that the only way to scale this is to automate it.
tdd uses this understanding by asserting that the regression test for the next line of code should be written before you write that line of code, and because the tests are very simple and very fast, you can run them against the file at every change and still work fast. because you keep them around, and they are fast, you can quickly determine if a change in behaviour in one place broke behaviour somewhere else, as soon as you make the change. this makes debugging trivial, as you know exactly what you just changed, and because you gave your tests meaningful names, you know exactly what that broke.
continuous integration reruns the tests on every change, and runs both unit tests and integration tests to show that the code continues to do what it did before, nothing more. this is designed to run fast, and fail faster. when all the tests pass, the build is described as being green.
when you add the new test, but not the code, you now have a failing test, and the entire build fails, showing that the system as a whole is not ready to release, nothing more. the build is then described as being red.
this is where the red-green terminology comes from, and it is used to show that the green build is ready to check in to version control, which is an integral part of continuous integration. this combination of unit and integration tests is used to show that the system does what the programmer believes the code should do. if this is all you do, you still accumulate technical debt, so tdd adds the refactoring step to manage and reduce technical debt.
refactoring is defined as changing the code in such a way that the functional requirements do not change, and this is tested by rerunning the regression tests to demonstrate that indeed the changes to the code have improved the structure without changing the functional behaviour of the code. this can be deleting dead code, merging duplicate code so you only need to maintain it in one place, or one of hundreds of different behaviour preserving changes in the code which improves it.
during the refactoring step, no functional changes to the code are allowed. adding a test for a bug, or to make the code do something more happens at the stsrt of the next cycle.
continuous delivery then builds on top of this by adding acceptance tests which confirm that the code does what the customer thinks it should be doing.
continuous deployment builds on top of continuous delivery to make it so that the whole system can be deployed with a single push of a button, and this is what is used by netflix for software, hp for printer development, tesla and spacex for their assembly lines, and lots of other companies for lots of things.
the people in this thread have conflated unit tests, integration tests and acceptance tests all under the heading of unit tests, which is not how the wider testing community uses the term. they have also advocated for the deletion of all regression tests based on unit tests. a lot of the talk about needing to know about the requirements in advance is based upon this idea that a unit test is a massive, slow, complex thing with large setup and teardown, but it is not how it is used in tdd. there you are only required to understand how to write the next line of code well enough that you can write a unit test for that line what will act as a regression test.
this appears to be where a lot of the confusion seems to be coming from.
in short, in tdd you have three steps:
1, understand the needs of the next line of code well enough that you can write a regression test for it, write the test, and confirm that it fails.
2, write enough of that line that it makes the test pass.
3, use functionally preserving refactorings to improve the organisation of the codebase.
then go around the loop again. if during stages 2 and 3 you think of any other changes to make to the code, add them to a todo list, and then you can pick one to do on the next cycle. this expanding todo list is what causes the tests to drive the design. you do something extra for flakey tests, but that is ouside the scope off tdd, and is part of continuous integration.
it should be pointed out that both android and chromeos both use the ideas of continuous integration with extremely high levels of unit testing.
tdd fits naturally in this process, which is why so many companies using ci also use tdd, and why so many users of tdd do not want to go back to the old methods.
1
-
1
-
1
-
Code coverage is a measure of how happy you are to push crap to your users. The lower the percentage, the less you care about quality.
Anyone writing a test just to up the numbers will write a bad and fragile test, which they will then be required to debug and fix when it breaks.
Good testing only exercises the public API and tests it for stability. Everything else is either already covered by the API tests, is a sign of a missing test, or is dead code.
As to coverage vs code review, code review doesn't scale, especially in agile workflows, like continuous integration and trunk based development. At that point you need automated regression testing, which you can combine with a ratchet test to take advantage of improvements while not allowing regressions in your coverage numbers.
Remember you always start out with two guaranteed tests. 1, does it build without failing the build. 2, does it run the successfully built code without crashing. Anything better than that is an advantage you can build on top of.
1
-
1
-
1
-
Every branch is essentially forking the entire codebase for the project, with all of the negative connotations implied by that statement. In distributed version control systems, this fork is moved from being implicit in centralized version control to being explicit.
When two forks exist (for simplicity call them upstream and branch), there are only two ways to avoid having them become permanently incompatible. Either you slow everything down and make it so that nothing moves from the branch to upstream until it is perfect, which results in long lived branches with big patches, or you speed things up by merging every change as soon as it does something useful, which leads to continuous integration.
When doing the fast approach, you need a way to show that you have not broken anything with your new small patch. The way this is done is with small fast unit test which act as regression tests against the new code, and you write them before you commit the code for the new patch and commit them at the same time, which is why people using continuous integration end up with a codebase which has extremely high levels of code coverage.
What happens next is you run all the tests, and when they pass, it tells you it is safe to commit the change, this can then be rebased, and pushed upstream, which then runs all the new tests against any new changes, and you end up producing a testing candidate which could be deployed, and it becomes the new master.
When you want to make the next change, as you have already rebased before pushing upstream, you can trivially rebased again before you start, and make new changes. This makes the cycle very fast, and ensures that everyone stays in sync, and works even at the scale of the Linux kernel, which has new changes upstreamed every 30 seconds.
In contrast, the slow version works not by having small changes guarded by tests, but by having nothing moved to upstream until it is both complete and as perfect as can be detected. As it is not guarded by tests, it is not designed with testing in mind, which makes any testing slow and fragile, further discouraging testing, and is why followers of the slow method dislike testing.
It also leads to merge hell, as features without tests get delivered with a big code dump all in one go, which may then cause problems for those on other branches which have incompatible changes. You then have to spend a lot of time finding which part of this large patch with no tests broke your branch. This is avoided with the fast approach as all of the changes are small.
Even worse, all of the code in all of the long lived braches is invisible to anyone taking upstream and trying to do refactoring to reduce technical debt, adding another source of breaking your branch with the next rebase.
Pull requests with peer review add yet another source of delay, as you cannot submit your change upstream until someone else approves your changes, which can take tens to hundreds of minutes depending on the size of your patch. The fast approach replaces manual peer review with comprehensive automated regression testing which is both faster, and more reliable. In return they get to spend a lot less time bug hunting.
The unit tests and integration tests in continuous integration get you to a point where you have a release candidate which does all of the functions the programmer understood was wanted. This does not require all of the features to be enabled by default, only that the code is in the main codebase, and this is usually done by replacing the idea of the long lived feature branch with short lived (in the sense of between code merges) branches with code shipped but hidden behind feature flags, which also allows the people on other branches to reuse the code from your branch rather than having to duplicate it in their own branch.
Continuous delivery goes one step further, and takes the release candidate output from continuous integration and does all of the non functional tests to demonstrate a lack of regressions for performance, memory usage, etc and then adds on top of this a set of acceptance tests that confirm that what the programmer understood matches what the user wanted.
The output from this is a deployable set of code which has already been packaged and deployed to testing, and can thus be deployed to production. Continuous deployment goes one step further and automatically deploys it to your oldest load sharing server, and uses the ideas of chaos engineering and canary deployments to gradually increase the load taken by this server while reducing the load to the next oldest server until either it has moved all of the load from the oldest to the newest, or a new unspotted problem is observed, and the rollout is reversed.
Basically though all of this starts with replacing the slow long lived feature branches with short lived branches which causes the continuous integration build to almost always have lots of regression tests always passing, which by definition cannot be done against code hidden away on a long lived feature branch which does not get committed until the entire feature is finished.
1
-
1
-
it clearly stated that the first email was saying there was a problem affecting the network, and when they turned up it was a meeting with a completely d8fferent department, sales, and that there was no problem. also no mention as to the enterprise offering being mandatory.
at that point i would return to my company and start putting resiliency measures in place with the intent to min8mise exposure to cloudflare with the intent to migrate, but the option to stay if they were not complete dicks.
the second contact was about was about potential issues with multiple national domains, with a clear response that it is due to differing national regulations requiring that.
the only other issue mentioned was a potential tos violation which they refused to name, and an immedia5e attempt to force a contract with a 120k price tag with only 24 hours notice and a threat to kill your websites if you did not comply.
at this point i would then have immediately triggered the move.
on the legal view, they are obviously trying to force a contract, which others have said is illegal in the us where cloudflare has its hardware based. it is thus subject to those laws.
by only giving 24 hours from the time that they were informed it was mandatory, they are clearly guilty of trying to force the contract, and thus likely to win.
if they can win on that, then their threat to pull the plug on their business on short notice in pursuit of an illegal act also probably makes them guilty of tortuous interference, for which they would definitely get actual damages, which would cover loss of business earnings, probably get reputational damages, probably get to include all the costs for having to migrate to new providers, and legal costs.
when i sued them, i would also go after not only cloudflare, but the entire board individually, seeking to make them jointly and severally liable, so that when they tried to delay payment, you could go after them personally.
the lesson is clear, for resiliency, always have a second supplier in the wings which you can move to on short notice, and have that move be a simple yes or no decision that can be acted upon immediately. by virtue of this, don't get overly relient on external tools to allow the business to continue to be able to work to mitigate the disaster if it happens. also keep onsite backups of any business critical information.
m9st importantly, make sure you test the backups. at least one major business i know of did everything right including testing the backup rec9very process, but kept the only copy of the recovery key file on the desktop of one machine in one office, with the only backup of this key being inside the encrypted backups.
th8s killed the business.
1
-
1
-
1
-
1
-
1
-
1
-
1
-
1
-
waterfall can work as described under exactly one scenario, when you know exactly what you need the specs to be, like in the nasa case mentioned in the video.
the problem is that this is amazingly expensive when you can do it, and evidence going all the way back to the original paper point out that for something like 60% of projects even the customer does not know what they need, so it fails miserably.
also, it cannot cope well with changing requirements. this lead to the myth of change control as a way to deal with those cases.
this resulted in product being built, but they were increasingly wrong, as you could not react to the changing needs.
when the manifesto was created, it was realised that it was not working, and that you needed to do incremental development, and the manifesto was a writeup of why it does not work, and what to do instead to work incrementally.
if you do it properly like the companies in the dora state of devops report, agile works, but too many companies don't even learn enough about agile to be able to tell if they are really doing it, resulting in some sort of mangled waterfall process.
1
-
1
-
@dominikvonlavante6113 i don't know of anyone seriously doing continuous integration who does not use the testing pyramid with high levels of code coverage.
as to tests higher up the pyramid, the testing pyramid being better, it depends how you define better.
years of testing research have told us conclusively a number of things:
1, the higher you go, the slower you get, often by orders of magnitude.
2, the higher you go, the more setup and tear down you need around the actual test to be able to do it.
3, the higher you go, the more levels of indirection between the fact that the test broke, and why it broke. for example if you are testing the middle module in a three module chain in an end to end fashion, and the test broke, was it because module 1 broke its output to module 2, that module 2 broke its transform code, or did module three break its input api?
as a consequence of these points, it is better to do thousands of unit tests against the stable api of your functions with known inputs and outputs than dozens of end to end tests which don't give you anything like as detailed feedback as the answer 'the change you just made caused this function to stop returning this value when give these parameters"
this is before you even consider the fragility of end to end tests and the difficulty of testing new legacy code which was not designed with testability in mind.
1
-
1
-
@noblebearaw it used all the points in all the images to come up with a set of weighted values which together enabled a curve to be drawn with all the images in one set on one side of the curve, and all the images in the other set on the other side of the curve.
that is the nature of statistical ai, it does not care about why it comes to the answer, only that the answer fits the training data. the problem with this approach is that you are creating a problem space with as many dimensions as you have free variables, and then trying to draw a curve in that phase space, but there are many curves that fit the historical data, and you only find out which is the right one when you provide additional data which varies from the training data.
symbolic ai works in a completely different way. because it is a white box system, it can still use the same statistical techniques to determine the category which the image falls into, but this acts as the starting point. you then use this classification as a basis to start looking for why it is in that category, wrapping the statistical ai inside another process, which takes the images fed into it, and uses humans to spot where it got it wrong, and look for patterns of wrong answers which help identify features within that multi dimensional problem space which are likely to match one side of the line or the other.
this builds up a knowledge graph analogous to the structure of the statistical ai, but as each feature is recognised, named, and added to the model, it adds new data points to the model, with the difference being that you can drill down from the result to query which features are important, and why. this also provides extra chances for extra feedback loops not found in statistical ai.
if we look at compiled computer programs as an example, using c and makefiles to keep it simple, you would start of by feeding the statistical ai with the code and makefile, and feed it the result of the ci / cd pipeline, determining if the change just made was releasable or not. eventually, it might get good at predicting the answer, but you would not know why.
the code contains additional data implicit within it which provides more useful answers. each step in the process gives usable additional data which can be queried later.
was it a change in the makefile which stopped it building Correctly?
did it build ok, but segfault when it was run?
how good is the code coverage of the tests on the code which was changed?
does some test fail, and is it well enough named that it tells you why it failed?
and so on. also a lot of these failures will give you line numbers and positions within specific files as part of the error message.
if you are using version control, you also know what the code was before and after the change, and if the error report is not good enough, you can feed the difference into a tool to improve the tests so that it can identify not only where the error is, but how to spot it next time.
basically, you are using a human to encode information from the tools into an explicit knowledge graph which ends up detecting that the code got it wrong because the change in line 75 of query.c returns the wrong answer to a specific function when passed specific data because a branch which should have been taken to return the right answer was not taken because the test on that line had 1 less = sign than was needed ad position 12, making it an assignment statement rather than a test, making the test never pass. it could then also suggest replacing the = with == in the new code, thus fixing the problem.
none of that information could be got from the statistical ai, as any features in the code used to find the problem are implicit in the internal model, but it contains none of the feedback loops needed to do more than identify that there is a problem.
going back to the tank example, the symbolic ai would not only be able to identify that there was a camouflaged tank, but point out where it was hiding, using the fact that trees don't have straight edges, and then push the identified parts of the tank through a classification system to try and recognise the make and model of the tank, this providing you with the capabilities and limitations of the identified vehicle as well as the presence and location.
often when it gets stuck, it resorts to the fallback option of presenting the data to the human and saying "what do you know in this case which i don't", adding that information explicitly into the know,edge graph, and trying again to see if it altered the result.
1
-
There is some confusion about branches. Every branch is essentially a fork of the entire codebase from upstream.
In centralized version control, upstream is the main branch, and everyone working on different features has their own branch which eventually merges back into the main branch.
In decentralized version control who is the main branch is a matter of convention, not a feature of the tool, but the process works the same. When you clone upstream, you still get a copy of the entire codebase, but you do not have to bother creating a name for your branch, so people work in the local copy of master.
They then write their next small commit, add tests, run them, rebase, and assuming the tests pass push to an online copy of their local repository and generate a pull request. If the merge succeeds, when they next rebase the local copy will match upstream which will have all of their completed work in it.
At this point, you have no unsynchronized code in your branch, and you can delete the named branch, or if distributed, the entire local copy, and you don't have to worry about it. If later you need to make new changes you can either respawn the branch from main / upstream, or clone from upstream and you are ready to go with every upstream change.
If you leave the branch inactive for a while, you have to remember to do a rebase before you start your new work to get to the same position.
It is having lots of unsynchronized code living for a long time in the branch which causes all of the problems, because by definition anything living in a branch is not integrated and so does not enjoy the benefits granted by being merged. This includes not having multiple branches making incompatible changes, and finding out that things broke because someone did a refactoring and your code was not covered, so you now get to fix that problem.
1
-
1
-
1
-
1
-
1
-
1
-
@ProfessorThock using the calculator example, if you have a test called two times two equals four, and it fails, you know what you are testing, what the answer should be, and what code you just changed which broke it. After that, it is fairly easy to find the bug.
If you write code using test first, first you prove the test fails, then you prove the code makes it pass. If you do tdd, you also prove that your refactoring didn't break it. Finally, when a new change breaks it, you know right away, how it broke, and the new code that did it.
Much easier than having 18 months spent doing waterfall development, throwing the code over the wall, and hoping integration or testing departments can find all the bugs.
1
-
1
-
1
-
1
-
1
-
1
-
1
-
1
-
1
-
1
-
1
-
1