About a year ago, I blogged about writing unit tests for OpenPAM to weed out bugs in the configuration file parser, and mentioned a specific bug which my unit tests caught, and which I wouldn’t have noticed without them.
Yesterday, this came back to bite me.
The patch in question fixed the following edge case:
hello "" world
This is supposed to be parsed as three words (“hello”, “” and “world”), but openpam_readword() would incorrectly return NULL instead of the second, empty word, because it wouldn’t allocate a string buffer until it had something to put in it.
Unfortunately, there is a serious problem with that patch:
/* edge case: empty quoted string */ if (word == NULL && (word = malloc(1)) == NULL) { openpam_log(PAM_LOG_ERROR, "malloc(): %m"); errno = ENOMEM; return (NULL); } *word = '\0'; size = 1;
It may be hard to spot if you’re not already familiar with the code, but the last two lines, which were intended to initialize the newly allocated empty string, run regardless of whether a string already existed. What this means is that if your input mixes quoted and unquoted text, such as this:
auth required pam_unix.so try_first_pass authtok_prompt="Password for %u@%h: "
then the moment openpam_readword() hits that double quote, it will overwrite the “a” in “authtok” with a NUL character. No matter how much text is later appended, the caller will get an empty string.
Even worse, since that code set size to 1 and openpam_straddch() trusts the caller not to mess with size and len, there is a risk of heap corruption. The next time openpam_straddch() is called, it will see that len (the length of the string) exceeds size (the size of the buffer) and blindly resize the buffer to what it thinks is twice its current size—but in reality, it shrinks the buffer to two bytes, and proceeds to write outside the buffer. Luckily, this is unlikely to have practical consequences since modern allocators will usually not reuse that memory until the entire buffer is freed, and subsequent openpam_straddch() calls will quickly reclaim it.
The code was meant to look like this:
/* edge case: empty quoted string */ if (word == NULL) { if ((word = malloc(1)) == NULL) { openpam_log(PAM_LOG_ERROR, "malloc(): %m"); errno = ENOMEM; return (NULL); } *word = '\0'; size = 1; }
However, I chose a different solution: I modified openpam_straddch() so it can be used to bootstrap an empty string. This allowed me to reduce the code to this:
/* edge case: empty quoted string */ if (openpam_straddch(&word, &size, &len, 0) != 0) return (NULL);
The issue of trusting the caller not to mess with size and len is thornier. In the end, I decided that there is no way to guard against all possible cases of caller idiocy. In this particular case, I could have noticed that size <= len, but I wouldn’t have known which of the two—if any!—had the correct value.
There are several lessons here, but the most important one is probably something I will call “test blindness”: trusting that because your tests passed, the code is correct. This is a fallacy: it assumes that your tests are perfect, but they aren’t, because you aren’t. After all, if you were, you wouldn’t need tests in the first place!
Anyway, I’ve added six unit tests (regression tests, technically) for input that mixes quoted and unquoted text within a single word. I’m also going to set up Bitten to automatically build and test OpenPAM after every commit—this wouldn’t have helped me catch this bug, but it will help me catch regressions.