Not up to our usual standards

For a few years now, I’ve been working on and off on a set of libraries which collect cryptography- and security-related code I’ve written for other projects as well as functionality which is not already available under a permissive license, or where existing implementations do not meet my expectations of cleanliness, readability, portability and embeddability.

(Aside: the reasons why this has taken years, when I initially expected to publish the first release in the spring or summer of 2014, are too complex to explain here; I may write about them at a later date. Keywords are health, family and world events.)

Two of the major features of that collection are the OATH Authentication Methods (which includes the algorithm used by Google Authenticator and a number of commercial one-time code fobs) and the Common Platform Enumeration, part of the Security Content Automation Protocol. I implemented the former years ago for my employer, and it has languished in the OpenPAM repository since 2012. The latter, however, has proven particularly elusive and frustrating, to the point where it has existed for two years as merely a header file and a set of mostly empty functions, just to sketch out the API. I decided to have another go at it yesterday, and actually made quite a bit of progress, only to hit the wall again. And this morning, I realized why.

The CPE standard exists as a set of NIST Interagency reports: NISTIR 7695 (naming), NISTIR 7696 (name matching), NISTIR 7697 (dictionary) and NISTIR 7698 (applicability language). The one I’ve been struggling with is 7695—it is the foundation for the other three, so I can’t get started on them until I’m done with 7695.

It should have been a breeze. On the surface, the specification seems quite thorough: basic concepts, representations, conversion between representations (including pseudocode). You know the kind of specification that you can read through once, then sit down at the computer, start from the top, and code your way down to the bottom? RFC 4226 and RFC 6238, which describe OATH event-based and time-based one-time passwords respectively, are like that. NISTIR 7695 looks like it should be. But it isn’t. And I’ve been treating it like it was, with my nose so close to the code that I couldn’t see the big picture and realize that it is actually not very well written at all, and that the best way to implement it is to read it, understand it, and then set it aside before coding.

One sign that NISTIR 7695 is a bad specification is the pseudocode. It is common for specifications to describe algorithms, protocols and / or interfaces in the normative text and provide examples, pseudocode and / or a reference implementation (sometimes of dubious quality, as is the case for RFC 4226 and RFC 6238) as non-normative appendices. NISTIR 7695, however, eschews natural-language descriptions and includes pseudocode and examples in the normative text. By way of example, here is the description of the algorithm used to convert (“bind”, in their terminology) a well-formed name to a formatted string, in its entirety:

6.2.2.1 Summary of algorithm

The procedure iterates over the eleven allowed attributes in a fixed order. Corresponding attribute values are obtained from the input WFN and conversions of logical values are applied. A result string is formed by concatenating the attribute values separated by colons.

This is followed by one page of pseudocode and two pages of examples. But the examples are far from exhaustive; as unit tests, they wouldn’t even cover all of the common path, let alone any of the error handling paths. And the pseudocode looks like it was written by someone who learned Pascal in college thirty years ago and hasn’t programmed since.

The description of the reverse operation, converting a formatted string to a well-formed name, is slightly better in some respects and much worse in others. There is more pseudocode, and the examples include one—one!—instance of invalid input… but the pseudocode includes two functions—about one third of the total—which consist almost entirely of comments describing what the functions should do, rather than actual code.

You think I’m joking? Here is one of them:

function get_comp_fs(fs,i)
  ;; Return the i’th field of the formatted string. If i=0,
  ;; return the string to the left of the first forward slash.
  ;; The colon is the field delimiter unless prefixed by a
  ;; backslash.
  ;; For example, given the formatted string:
  ;; cpe:2.3:a:foo:bar\:mumble:1.0:*:...
  ;; get_comp_fs(fs,0) = "cpe"
  ;; get_comp_fs(fs,1) = "2.3"
  ;; get_comp_fs(fs,2) = "a"
  ;; get_comp_fs(fs,3) = "foo"
  ;; get_comp_fs(fs,4) = "bar\:mumble"
  ;; get_comp_fs(fs,5) = "1.0"
  ;; etc.
end.

This function shouldn’t even exist. It should just be a lookup in an associative array, or a call to an accessor if the pseudocode was object-oriented. So why does it exist? Because the main problem with NISTIR 7695, which I should have identified on my first read-through but stupidly didn’t, is that it assumes that implementations would use well-formed names—a textual representation of a CPE name—as their internal representation. The bind and unbind functions, which should be described in terms of how to format and parse URIs and formatted strings, are instead described in terms of how to convert to and from WFNs. I cannot overstate how wrong this is. A specification should never describe a particular internal representation, except in a non-normative reference implementation, because it prevents conforming implementations from choosing more efficient representations, or representations which are better suited to a particular language and environment, and because it leads to this sort of nonsense.

So, is the CPE naming specification salvageable? Well, it includes complete ABNF grammars for URIs and formatted strings, which is good, and a partial ABNF grammar for well-formed names, which is… less good, but fixable. It also explains the meanings of the different fields; it would be useless otherwise. But apart from that, and the boilerplate at the top and bottom, it should be completely rewritten, including the pseudocode and examples, which should reference fictional, rather than real, vendors and products. Here is how I would structure it (text in italic is carried over from the original):

  1. Introduction
    1.1. Purpose and scope
    1.2. Document structure
    1.3. Document conventions
    1.4. Relationship to existing specifications and standards
  2. Definitions and abbreviations
  3. Conformance
  4. CPE data model
    4.1 Required attributes
    4.2 Optional attributes
    4.3 Special attribute values
  5. Textual representations
    5.1. Well-formed name
    5.2. URI
    5.3. Formatted string
  6. API
    6.1. Creating and destroying names
    6.2. Setting and getting attributes
    6.3. Binding and unbinding
  7. Non-normative examples
    7.1. Valid and invalid attribute values
    7.2. Valid and invalid well-formed names
    7.3. Valid and invalid URIs
    7.4. Valid and invalid formatted strings
  8. Non-normative pseudo-code
  9. References
  10. Change log

I’m still going to implement CPE naming, but I’m going to implement it the way I think the standard should have been written, not the way it actually was written. Amusingly, the conformance chapter is so vague that I can do this and still claim conformance with the Terrible, Horrible, No Good, Very Bad specification. And it should only take a few hours.

By the way, if anybody from MITRE or NIST reads this and genuinely wants to improve the specification, I’ll be happy to help.

PS: possibly my favorite feature of NISTIR 7695, and additional proof that the authors are not programmers: the specification mandates that WFNs are UTF-8 strings, which are fine for storage and transmission but horrible to work with in memory. But in the next sentence, it notes that only characters with hexadecimal values between x00 and x7F may be used, and subsequent sections further restrict the set of allowable characters. In case you didn’t know, the normalized UTF-8 representation of a sequence of characters with hexadecimal values between x00 and x7F is identical, bit by bit, to the ASCII representation of the same sequence.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.