Comments by "" (@grokitall) on "Git Flow is Bad | Prime Reacts" video.

  1. 2
  2. ci came from the realisation that the original paper from the 70s saying that the waterfall development model, while common was fundamentaĺly broken, and agile realised that to fix it, you had to move things that appear late in the process to an earlier point, hence the meme about shift left. the first big change was to impliment continuouse backups, now refered to as version control. another big change was to move tests earlier, and ci takes this to the extreme by making them the first thing you do after a commit. these two things together mean that your fast unit tests find bugs very quickly, and the version control lets you figure out where you broke it. this promotes the use of small changes to minimise the differences in patches, and results in your builds being green most of the time. long lived feature branches subvert this process, especially when you have multiple of them, and they go a long time between merges to the mainline (which you say you rebase from). specifically, you create a pattern of megamerges, which get bigger the longer the delay. also, when you rebase, you are only merging the completed features into your branch, while leaving all the stuff in the other megamerges in their own branch. this means when you finally do your megamerge, while you probably don't break mainline, you have the potential to seriously break any and all other branches when they rebase, causing each of them to have to dive into your megamerge to find out what broke them. as a matter of practice it has been observed time and again that to avoid this you cannot delay merging all branches for much longer than a day, as it gives the other braches time to break something else resulting in the continual red build problem.
    2
  3. There is a lot of talking past each other and marketing inspired misunderstanding of terminology going on here, so I will try and clarify some of it. When windows 95 was being written in 1992, every developer had a fork of the code, and developed their part of windows 95 in total isolation. Due to networking not really being a thing on desktop computers at the time, this was the standard way of working. After 18 months of independent work, they finally started trying to merge this mess together, and as you can image the integration hell was something that had to be seen to be believed. Amongst other things, you had multiple cases where the developer needed some code, and wrote it for his fork, while another developer did the same, but in an incompatible way. This lead to their being multiple incompatible implementations of the same basic code in the operating system. At the same time, they did not notice either the rise of networking, or the importance, so it had no networking stack, until somebody asked Bill Gates about networking in windows 95 at which point he basically took the open source networking stack from bsd Unix and put it into windows. This release of a network enabled version of windows and the endemic use of networking on every other os enabled the development of centralised version control, and feature branches were just putting these forks into the same repository, without dealing with the long times between integrations, and leaving all the resulting problems unaddressed. If you only have one or two developers working in their own branches this is an easily mitigated problem, but as the numbers go up, it does not scale. These are the long lived feature branches which both Dave and primagen dislike. It is worth noting that the hp laser jet division was spending 5 times more time integrating branches than it was spending developing new features. Gitflow was one attempt to deal with the problem, which largely works by slowing down the integration of code, and making sure that when you develop your large forks, they do not get merged until all the code is compatible with trunk. This leads to races to get your large chunk of code into trunk before someone else does, forcing them to suffer merge hell instead of you. It also promotes rushing to get the code merged when you hear that someone else is close to merging. Merging from trunk helps a bit, but fundamentally the issue is with the chunks being too big, and there being too many of them, all existing only in their own fork. With the rise in the recognition of legacy code being a problem, and the need for refactoring to deal with technical debt, it was realised that this did not work, especially as any refactoring work which was more than trivial made it more likely that the merge could not be done at all. One project set up a refactoring branch which had 7 people working on it for months, and when it was time to merge it, the change was so big that it could not be done. An alternative approach was developed called continuous integration, which instead of slowing down merges was designed to speed them up. It recognised that the cause of merge hell was the size of the divergence, and thus advocated for the reduction in size of the patches, and merging them more often. It was observed that as contributions got faster, manual testing did not work, requiring a move from the ice cream cone model of testing used by a lot of Web developers towards the testing pyramid model. Even so, it was initially found that the test suite spent most of its time failing, due to the amount of legacy code, and the fragility of code to test legacy code, which lead to a more test required and test first mode of working, which moves the shape of the code away from being shaped like legacy code, and into a shape which is designed to be testable. One rule introduced was that if the build breaks, the number one job of everyone is to get it back to passing all of the automated tests. Code coverage being good enough was also found to be important. Another thing that was found is that one you started down the route to keeping the tests green, there was a maximal delay you could have which did not adversely affect this, which turned out to be about once per day. Testing because increasingly important, and slow test times were deal with the same way slow build times were, by making the testing incremental. So you made a change, only built the bit which it changed, ran only those unit tests which were directly related to it, and one it passed, built and tested the bits that depended on it. Because the code was all in trunk, refactoring did not usually break the merge any more, which is the single most important benefit of continuous integration, it let's you much more easily deal with technical debt. Once all of the functional tests (both unit tests and integration tests), which shoukd happen within no more than 10 minutes, and preferably less than 5 minutes, you now have a release candidate which can then be handed over for further testing. The idea is that every change should ideally be able to go into this release candidate, but for some bigger features it is not ready yet, which is where feature flags come in. They replace branches with long lived unmerged code by a flag which hides the feature from the end user. Because your patch takes less than 15 minutes from creation to integration, this is not a problem. The entire purpose of continuous integration is to prove that the patch you submitted is not fit for release, and if so, it gets rejected and you get to have another try, but as it is very small, this also is not really a problem. The goal is to make integration problems basically a non event, and it works, The functional tests show that the code does what the programmer intended it to do. At this point it enters the deployment pipeline described in continuous delivery. The job of this is to run every other test need, including acceptance tests, whose job is to show that what the customer intended and what the programmer intended match. Again the aim is to prove that the release candidate is not fit to be released. In the same way that continuous delivery takes the output from continuous integration, continuous deployment takes the output from continuous delivery and puts it into a further pipeline designed to take the rolling release product of continuous delivery and put it through things like canary releasing so that it eventually ends up in the hands of the end users. Again it is designed to try it out, and if problems are found, stop them from being deployed further. This is where cloudstrike got it wrong so spectacularly. In the worst case, you just roll back to the previous version, but at all stages you do the fix on trunk, and start the process again, so the next release is only a short time away, and most of your customers will never even see the bug. This process works even at the level of doing infrastructure as a service, so if you think that your project is somehow unique, and it cannot work for you, you are probably wrong. Just because it can be released, delivered, and deployed, it does not mean it has to be. That is a business decision, but that comes back to the feature flags. In the meantime you are using feature flags to do dark launching, branch by abstraction to move between different solutions, and enabling the exact same code to go to beta testers and top tier users, just without some of the features being turned on.
    1
  4. 1
  5.  @phillipsusi1791  it is entirely about code churn. every branch is basically a fork of upstream (the main branch in centralised version control). the problem with forks is that the code in them diverges, and this causes all sorts of problems with incompatible changes. one proposed solution to this is to rebase from upstream, which is intended to sort out the problem of your branch not being mergable with upstream, and to an extent this works if the implicit preconditions for doing so are met. where it falls over is with multiple long lived feature branches which don't get merged until the entire feature is done..during the lifetime of each branch, you have the potential for code in any of the branches to produce incompatible changes with any other branch. the longer the code isn't merged and the bigger the size of the changes, the higher the risk that the next merge will break something in another branch. The only method found to mitigate this risk is continuous integration, and the only way this works is by having the code guarded by regression tests, and having everybody merge at least once a day. without the tests you are just hoping nobody broke anything, and if the merge is less often than every day, the build from running all the tests has been observed to be mostly broken, thus defeating the purpose of trying to minimise the risks. the problem is not with the existence of the branch for a long period of time, but with the risk profile of many branches which don't merge for a long time. also, because every branch is a fork of upstream, any large scale changes like refactoring the code by definition is not fully applied to the unmerged code, potentially breaking the completeness and correctness of the refactoring. this is why people doing continuous integration insist on at worst daily merges with tests which always pass. anything else just does not mitigate the risk that someone in one fork will somehow break things for either another fork, or for upstream refactorings. it also prevents code sharing between the new code in the unmerged branches, increasing technical debt, and as projects get bigger, move faster, and have more contributers, this problem of unaddressed technical debt grows extremely fast. the only way to address it is with refactoring, which is the additional step added to test driven development, which is broken by long lived branches full of unmerged code. this is why all the tech giants have moved to continuous integration, to handle the technical debt in large codebases worked on by lots of people, and it is why feature branching 8s being phased out in favour of merging and hiding the new feature behind a feature flag until it is done.
    1
  6. The best way to answer is to look how it works with linus Torvalds branch for developing the Linux kernel. Because you are using version control, your local copy is essentially a branch, so you don't need to create a feature branch. You make your changes in main, which is essentially a branch of Linus's branch, add your tests, and run all of the tests. If this fails, fix the bug. If it works rebase and quickly rerun the tests, then push to your online repository. This then uses hooks to automatically submit a pull request, and linus will getting a whole queue of them, which is then applied in the order in which they came in. When it is your turn, either it merges ok and becomes part of everyone else's next rebase, or it doesn't the pull is reverted, linus moves on to the next request, and you get to go back, do another rebase and test, and push your new fixes back up to your remote copy which will then automatically generate another pull request. Repeat the process until it merges successfully, and then your local system is a rebased local copy of upstream. Because you are writing small patches, rather than full features, the chances of a merge conflict are greatly reduced, often to zero if nobody else is working on the code you changed. It is this which allows the kernel to get new changes every 30 seconds all day every day. Having lots of small fast regression tests is the key to this workflow, combined with committing every time the tests pass, upstreaming with every commit, and having upstream do ci on the master branch.
    1
  7. 1
  8. Every branch is essentially forking the entire codebase for the project, with all of the negative connotations implied by that statement. In distributed version control systems, this fork is moved from being implicit in centralized version control to being explicit. When two forks exist (for simplicity call them upstream and branch), there are only two ways to avoid having them become permanently incompatible. Either you slow everything down and make it so that nothing moves from the branch to upstream until it is perfect, which results in long lived branches with big patches, or you speed things up by merging every change as soon as it does something useful, which leads to continuous integration. When doing the fast approach, you need a way to show that you have not broken anything with your new small patch. The way this is done is with small fast unit test which act as regression tests against the new code, and you write them before you commit the code for the new patch and commit them at the same time, which is why people using continuous integration end up with a codebase which has extremely high levels of code coverage. What happens next is you run all the tests, and when they pass, it tells you it is safe to commit the change, this can then be rebased, and pushed upstream, which then runs all the new tests against any new changes, and you end up producing a testing candidate which could be deployed, and it becomes the new master. When you want to make the next change, as you have already rebased before pushing upstream, you can trivially rebased again before you start, and make new changes. This makes the cycle very fast, and ensures that everyone stays in sync, and works even at the scale of the Linux kernel, which has new changes upstreamed every 30 seconds. In contrast, the slow version works not by having small changes guarded by tests, but by having nothing moved to upstream until it is both complete and as perfect as can be detected. As it is not guarded by tests, it is not designed with testing in mind, which makes any testing slow and fragile, further discouraging testing, and is why followers of the slow method dislike testing. It also leads to merge hell, as features without tests get delivered with a big code dump all in one go, which may then cause problems for those on other branches which have incompatible changes. You then have to spend a lot of time finding which part of this large patch with no tests broke your branch. This is avoided with the fast approach as all of the changes are small. Even worse, all of the code in all of the long lived braches is invisible to anyone taking upstream and trying to do refactoring to reduce technical debt, adding another source of breaking your branch with the next rebase. Pull requests with peer review add yet another source of delay, as you cannot submit your change upstream until someone else approves your changes, which can take tens to hundreds of minutes depending on the size of your patch. The fast approach replaces manual peer review with comprehensive automated regression testing which is both faster, and more reliable. In return they get to spend a lot less time bug hunting. The unit tests and integration tests in continuous integration get you to a point where you have a release candidate which does all of the functions the programmer understood was wanted. This does not require all of the features to be enabled by default, only that the code is in the main codebase, and this is usually done by replacing the idea of the long lived feature branch with short lived (in the sense of between code merges) branches with code shipped but hidden behind feature flags, which also allows the people on other branches to reuse the code from your branch rather than having to duplicate it in their own branch. Continuous delivery goes one step further, and takes the release candidate output from continuous integration and does all of the non functional tests to demonstrate a lack of regressions for performance, memory usage, etc and then adds on top of this a set of acceptance tests that confirm that what the programmer understood matches what the user wanted. The output from this is a deployable set of code which has already been packaged and deployed to testing, and can thus be deployed to production. Continuous deployment goes one step further and automatically deploys it to your oldest load sharing server, and uses the ideas of chaos engineering and canary deployments to gradually increase the load taken by this server while reducing the load to the next oldest server until either it has moved all of the load from the oldest to the newest, or a new unspotted problem is observed, and the rollout is reversed. Basically though all of this starts with replacing the slow long lived feature branches with short lived branches which causes the continuous integration build to almost always have lots of regression tests always passing, which by definition cannot be done against code hidden away on a long lived feature branch which does not get committed until the entire feature is finished.
    1
  9. 1
  10. There is some confusion about branches. Every branch is essentially a fork of the entire codebase from upstream. In centralized version control, upstream is the main branch, and everyone working on different features has their own branch which eventually merges back into the main branch. In decentralized version control who is the main branch is a matter of convention, not a feature of the tool, but the process works the same. When you clone upstream, you still get a copy of the entire codebase, but you do not have to bother creating a name for your branch, so people work in the local copy of master. They then write their next small commit, add tests, run them, rebase, and assuming the tests pass push to an online copy of their local repository and generate a pull request. If the merge succeeds, when they next rebase the local copy will match upstream which will have all of their completed work in it. At this point, you have no unsynchronized code in your branch, and you can delete the named branch, or if distributed, the entire local copy, and you don't have to worry about it. If later you need to make new changes you can either respawn the branch from main / upstream, or clone from upstream and you are ready to go with every upstream change. If you leave the branch inactive for a while, you have to remember to do a rebase before you start your new work to get to the same position. It is having lots of unsynchronized code living for a long time in the branch which causes all of the problems, because by definition anything living in a branch is not integrated and so does not enjoy the benefits granted by being merged. This includes not having multiple branches making incompatible changes, and finding out that things broke because someone did a refactoring and your code was not covered, so you now get to fix that problem.
    1