General statistics
List of Youtube channels
Youtube commenter search
Distinguished comments
About
Michael Nurse
Continuous Delivery
comments
Comments by "Michael Nurse" (@michaelnurse9089) on "Stop Making THESE MISTAKES With BIG DATA" video.
Humans tend to be unable to understand older code as it gets larger. I remember the Boeing 777 had a dangerous bug somewhere in its 200000 lines (if I recall correctly) and it took them years to find it and delayed the release of the plane - costing them serious $. Somebody where you worked should have abstracted the SP into smaller layers long before this point. Of course, they didn't.
4
I studied data science to completion before moving to software engineering. To be fair, these concepts and traps were brought up many times in the subject matter - remember, most of the lecturers are battle hardened software engineers before becoming data scientists. The question therefore remains why are they struggling to do this in practice? Part of the problem is size of the datasets- part of the problem was the Gold Rush in this area 2016-2019, part of the problem is the individuals concerned, part of the problem is Dave's experience is not representative of what every data department is doing. The field of Data Engineering exists to solve this problem and firms that hire these people will naturally avoid these problems. All that said - every point in the video is golden.
2
I think he means the dataset on which the notebook was based has not been kept as a copy - maybe it is 22TB etc. Of course the notebooks themselves are backed up.
1
I think for small home projects Docker (or even full virtual machines) is a great technology - whether for Data Science or Software Engineering.
1