[The context for this piece is slightly dated, but I was hospitalized shortly after I started writing this, and apparently hospitals don’t provide their patients with wireless (or even wired) Internet connections. Go figure.]
I’m concerned that some code may trigger false positives, like some places (destination buffers) that don’t seem to check bounds, but this is because their source of data is already of guaranteed limited length. Someone that goes looking blindly for strcpy would panic at first glance.
Fortunately, good static code analysis tools such as Coverity Prevent, which is at issue here, or FlexeLint, which I use at work, will not give false positives in cases like this, provided that the “guarantee” is sufficiently obvious and sufficiently close to the strcpy call. Claudio Valderrama acknowledges this in his next sentence:
However, Coverity scans seem to be very careful and didn’t report false positives.
The crux of the issue is what constitutes “sufficiently obvious” and “sufficiently close”. Some tools are better than others at tracking code and data flows, but it’s not just about tools. It’s also about people, especially maintenance programmers who may not know the program as well as the original author, and may not have time to focus on more than a small portion of it. That person may even be the original author, who, a year or two down the line, probably does not remember all the details of his own code.
(I will refrain from making the obligatory joke about “assumption”; if it matters to you, just pretend that I did)
Allow me to digress a little bit and explain, very briefly, how program verification works (using logical inference, the way I was taught). Trust me, it’s relevant.
The effect of a program (or a subroutine, since correctness is usually proved one subroutine at a time) is to move the finite state machine that is your computer from one state into another. You prove its correctness by showing (using a formal method of reasoning such as Hoare logic) that the operations it performs will transform the expected initial state into the intended final state.
Program verification would be extremely difficult (and not very useful) if you had to enumerate all possible initial and final states, so instead, you provide a set of predicates which describe the aspects of those states which are relevant to the subroutine whose correctness you are trying to prove. In practical terms, “relevant” usually means the state of the variables and data structures the subroutine examines and modifies, which may be significantly less than the state of the entire system.
The important thing to note is that a proof of correctness only tells you that if the preconditions are satisfied when the subroutine starts, then the postconditions will be satisfied when the subroutine ends. Without the preconditions, no proof is possible, except for very trivial cases such as variable initialization.
(you prove the correctness of a larger piece of code by dividing it up and showing that the postconditions of the first piece satisfy the preconditions of the next, etc.)
Programmers are usually pretty good at thinking in terms of postconditions, because after all, the entire purpose of your subroutine is to ensure that the postcondition is true. The postcondition is on your mind all the time while you write that code (though sometimes you forget it, especially when making small changes to a large piece of old code). In fact, the odds are that the postcondition is described (in varying detail) in external documentation, code comments, and even the subroutine name.
The preconditions, on the other hand…
The preconditions are where we get it wrong. They’re the bits we take for granted. They’re the parts we assume are correct… perhaps because we believe (rightly or wrongly) that preceding code guarantees it.
(there, now you see how this all connects!)
This is why we—programmers—need to document our assumptions, the preconditions without which our code doesn’t do what it’s supposed to do. Documenting them in your design document or API reference is a good start. Documenting them in code comments is even better, because they will be right there for other programmers to read when they need to fix your code (and as I mentioned earlier, one of those “other programmers” could be yourself a few months or years down the road).
The very best option, though, is to document them in the code itself, through the use of assertions—either C’s assert() macro (or the equivalent in whichever language you’re using) or a custom-made assertion check. For Varnish, for instance, I wrote a macro which, if the test fails, calls a “panic” function which dumps a significant amount of internal state before terminating the program, greatly facilitating subsequent debugging.
My advice to Claudio is that whenever you use strcpy() instead of strlcpy() because you know the buffer is big enough, throw in an assertion check. Not only will this make your assumption clear to both humans and static analysis tools, but… you never know. Maybe what you knew wasn’t truly so.