General statistics
List of Youtube channels
Youtube commenter search
Distinguished comments
About
Lee Rothman
Continuous Delivery
comments
Comments by "Lee Rothman" (@leesoftwareengineer) on "Software's HUGE Impact On The World | Crowdstrike Global IT Outage" video.
@Max-wk7cg If they had used canary deployments then they would have limited the impact to the first batch of deployments rather than this mess.
2
And that’s a reason why you do canary deployments. Don’t deploy an update worldwide in one hit.
2
@Max-wk7cg Well I’d like to hear of a reason not to do it in their situation. If they’ve not heard of it then I’d really question their engineering team. It’s not a new thing, here’s a post from 10 years ago https://martinfowler.com/bliki/CanaryRelease.html.
1
The main points here covers canary deployments which would have stopped the deployment to most of the 8.5m affected devices. Further information shows that the fault was with one of the data files containing nothing but 0’s. There is no validation of the inputs to the kernel code so it crashed. I’d say that it was most definitely a fault on their part and easily avoided.
1
Read the state of DevOps report. There is a correlation between the bigger the size of change that is deployed the bigger the failure rate.
1
@SteveBurnap Yup agreed. Have monitoring calling every x seconds to check that it’s getting a successful response.
1
I just caught up with his update, definitely worth a watch to understand what’s going on with software running on the low level.
1
It could have yes. Initial deployment should have been a small amount of customers. It’s easy to set up monitoring so an external systems checks something like a health check endpoint. If it doesn’t get a successful response then something has gone wrong and stop all further deployments.
1
Same company also bricked Linux systems earlier this year (Debian & Rocky). However it feels like they need provide another way of accessing the kernel to make it more robust.
1