Comments by "Mikko Rantalainen" (@MikkoRantalainen) on "Thriving Technologist"
channel.
-
12
-
11
-
9
-
7
-
6
-
5
-
5
-
5
-
4
-
3
-
3
-
3
-
3
-
2
-
2
-
2
-
2
-
2
-
2
-
2
-
2
-
2
-
1
-
1
-
1
-
1
-
1
-
1
-
1
-
1
-
1
-
1
-
1
-
I fully agree that comments/documentation should be considered mandatory for any code that's supposed to live long – that is, maintained and developed further.
However, I don't believe in commenting individual lines but whole functions/methods. My rule of thumb is that if a method is public (usable by external code) it should have documentation/promise about what it does and basically state Eiffel-like design-by-contract about the supported inputs. Whether you write it as docblocks above the method implementation or in form of automated tests really doesn't make a huge difference but you should have clear documentation about what the code is supposed to do. That way you can figure out if the implementation actually matches the original intent when you later need to modify the code. Without documentation you cannot know if handling of some specific edge case is intentional or a bug in implementation.
I prefer docblock-style comments in mostly English but I'm getting more and more strict about having to declare if any input parameter and results are trusted data or not. All input (user generated data, files, network sockets, config files) should be considered untrusted and anything directly computed from untrusted data shall be considered tainted, too, and as such, untrusted data, too. If you write all code like this, you end up having a lot less security issues in your code. And for all input and output string values, you have to declare the encoding in the documentation. The input encoding might be untrusted UNICODE string and output could be trusted HTML text fragment – in that case the implementation must encode all the HTML metacharacters or there's a bug in the implementation. Without a docblock you cannot know if that's intented or not.
That said, private methods (in case of class/object oriented programming) do not need to have any documentation because those are just part of the implementation. I also don't think automated unit tests should even bother testing private methods directly but just the behavior or any public methods. I'm on a borderline if even protected methods should be tested with unit tests – I'm currently thinking that if no public method actually uses any private or protected method, those methods are just dead code and should be deleted instead of writing unit tests.
In the end, when I write some code and a team member needs to ask me about the implementation (during code review or later) then I usually end up fixing the implementation to be more readable. In that case I believe in "self-documenting code" that I only use comments within the method as the last resolt – it's much better to write understandable implementation that can be fully understood without comments within the function/method body.
1
-
1
-
1
-
1
-
1
-
1
-
1
-
I'm strongly in the camp "write once, read multiple times" when it comes to code and its documentation. Once you accept that any single of line of code will be read more than once in long term, you should make code as readable and clear as possible. And in practice that may require adding comments. However, comments should be added if they make code more clear, not just because the boss said so. My bar for how much more clear the code would become with comments is pretty low, though. If the code is any easier to understand with extra comments, add those comments. That said, it's a balancing act because extra comments just add extra letters to read if the code would be perfectly understandable without the comment.
If you truly write temporary solution that will be thrown away in near future for real, sure. In my experience, nothing is as permanent as temporary solution that seems to work, though.
1
-
1
-
1
-
1
-
1
-
1
-
I totally agree with you except for the automatic code formatting. I strongly believe that there should be code formatting rules for a project but it shouldn't be enforced by automated process. Sure, have an automated check to tell if you've broken the rules and optionally allow automatically reformatting the new code but there will be always situations where code can be more readable by breaking the arbitrary rules your project ended up with. A basic example could be line length rules: if the code would be more readable if you go over the line length limit by 7 characters, do it. Wrapping that code on two or more lines might follow the formatting rules but it would result in less readable code.
Code readability is the most important thing. Any important piece of code will be written once but re-read many times and you have to make sure that every reader understands it the same way.
I strongly believe in self-documenting code but the bar should be that if any member of your team ever has to ask about the code, it's poorly written. If any member of the team fails to understand your code then it's not self-documenting. It's that simple. And in such cases, I always prefer fixing the code and as a last resort, I write some comments. That said, I also try to write short documentation for every function or method (DocBlock) to help people using editors that can show that documentation while modifying the code on the calling site and design-by-contract rules for the caller, without needing to even see the actual well written code. And after writing server software for a couple of decades, I've come to conclusion that all parameters should have explicit info about if the argument is untrusted (raw user input is okay) or trusted (never ever pass any unfiltered user input here). And note that raw user input may come from TCP/IP socket, file, environment string, command line argument, SQL connection or REST API request. If the bytes in the RAM can be affected by entities outside your code, it's untrusted. And untrusted data is contagious so if your programming language doesn't have something akin to Perl taint mode, you have to track untrusted data by yourself from variable to variable.
Also, a string is just stream of unknown bytes unless you know the encoding and intent. Many security issues happen because programmers fail to understand the data. For example, SQL injection attacks and XSS attacks are actually the same security problem under the hood: missing or wrong encoding for the context. In case of SQL injection attack, typical problem is using raw string when the actual context is "constant UNICODE string within a string in SQL query" and XSS attack is caused by using raw string when the actual context is "constant UNICODE string within a JavaScript string embedded in SVG embedded in data-URL embedded in attribute string embedded in HTML5 document".
Not every context can support raw binary strings but if your function or method takes untrusted string as input, it's your task to encode or otherwise make it safe. If your method cannot accept any random binary input, your method will need to test for binary crap and throw an exception or handle the problem in some other way. Remember that if you don't write this safe code, then every calling site must re-implement it or you'll have security vulnerability waiting in the code.
I'm nowadays writing my functions or methods so that any data passed in must be random binary safe unless the parameter is explicitly marked as trusted and in that case the caller takes the responsibility for data safety. And automated tests for that code should actually use random binary test strings to make sure the code doesn't bitrot in the future.
1
-
1
-
1
-
1