RFC 4648 defines the Base16, Base32 and Base64 encodings. Base16 (aka hex) and Base64 are widely known and used, but Base32 is an odd duck. It is rarely used, and there are several incompatible variants, of which the RFC acknowledges two: [A-Z2-7]
and [0-9A-V]
.
One of the uses of Base32, and the reason for my interest in it, is in Google’s otpauth
URI scheme for exchanging HOTP and TOTP keys. I needed a Base32 codec for my OATH library, so when a cursory search for a lightweight permissive-licensed implementation failed to turn up anything, I wrote my own.
My OATH implementation is currently deployed in an environment in which OTP keys for new users (or new OTP keys for existing users) are generated by the primary provisioning system, which passes them on to a smaller provisioning system in charge of firewalls and authentication (codenamed Nexus), which passes them on to a RADIUS server, which uses my code to validate user responses. When we transitioned from generating OTP keys manually to having the provisioning system generate them for us, we ran into trouble: some keys worked, others didn’t. It turned out to be a combination of factors:
- The keys generated by the provisioning system were syntactically correct but out of spec. Most importantly, their length was not always a multiple of 40 bits, so their Base32 representation included padding.
- Nexus performed only cursory validation of the keys it received from the provisioning system, so it accepted the out-of-spec keys.
- The Google Authenticator app (at least the Android version, but possibly the iOS version as well) does not handle padded keys well. If I recall correctly, the original Android app rejected them outright; the current version simply rounds them down. (Why don’t the Android system libraries provide Base32 encoding and decoding?)
- My Base32 decoder didn’t handle padding correctly either… and of course, I only had tests for the encoder, because I was in a rush when I wrote it and I didn’t need decoding until later. Yes, this is stupid. Yes, I fixed it and now have 100% condition/decision coverage (thanks to BullseyeCoverage, with a caveat: 100% C/D coverage of table-driven code does not guarantee correctness, because it only checks the code, not the table).
Having fixed both the provisioning system and the OATH verification tool, I decided to add stronger input validation to Nexus. The easiest way to validate a Base32-encoded key, I figured, is to decode it. And wouldn’t you know, there are not one but two Perl implementations of Base32!
Unfortunately, they’re both broken, and have been for years.
MIME::Base32
(the latest release is dated 2010-08-25, but the code hasn’t changed since the original release on 2003-12-10) does not generate padding, and decodes it into garbage. In addition, it does not accept lower-case code.Convert::Base32
(the latest release is dated 2012-04-22, but the code hasn’t changed since the original release on 2001-07-17) does not generate padding, and dies when it encounters what it calls “non-base32 characters”. In addition, while it accepts lower-case code (which is commendable, even though the RFC specifies an upper-case alphabet), it also generates lower-case code, which is wrong.
Both packages ship with tests. MIME::Base32
’s tests simply encodes a string, decodes the result, and checks that it got the original string back.
Convert::Base32
’s tests are more complex and include length and padding tests, but it defines padding as the lower, unused bits of the last non-padding character in the output.
MIME::Base32
references RFC 3548 (the predecessor to RFC 4648) but does not come close to implementing it correctly. Convert::Base32
predates the RFC and conforms to the old RACE Internet draft, which is small consolation since RACE was never standardized and was eventually replaced by Punycode.
I wrote a script which runs the RFC 4648 test vectors through either or both MIME::Base32
and Convert::Base32
, depending on what’s available. The first two columns are the input and output to and from the encoder, and the last two are the input and output to and from the decoder. Note that the script adds the correct amount of padding before feeding the encoded string back to the decoder.
MIME::Base32
1 f | 2 MY | 8 MY====== | 7 fOOOOO
2 fo | 4 MZXQ | 8 MZXQ==== | 6 fo����
3 foo | 5 MZXW6 | 8 MZXW6=== | 6 foo���
4 foob | 7 MZXW6YQ | 8 MZXW6YQ= | 5 foob
5 fooba | 8 MZXW6YTB | 8 MZXW6YTB | 5 fooba
6 foobar | 10 MZXW6YTBOI | 16 MZXW6YTBOI====== | 12 foobarOOOOO
Convert::Base32
Data contains non-base32 characters at base32-test.pl line 16
1 f | 2 my | 8 my====== | %
(the final %
is my shell indicating that the output did not end with a line feed).
The same test, with forced conversion to upper-case before decoding:
MIME::Base32
1 f | 2 MY | 8 MY====== | 7 fOOOOO
2 fo | 4 MZXQ | 8 MZXQ==== | 6 fo����
3 foo | 5 MZXW6 | 8 MZXW6=== | 6 foo���
4 foob | 7 MZXW6YQ | 8 MZXW6YQ= | 5 foob
5 fooba | 8 MZXW6YTB | 8 MZXW6YTB | 5 fooba
6 foobar | 10 MZXW6YTBOI | 16 MZXW6YTBOI====== | 12 foobarOOOOO
Convert::Base32
Data contains non-base32 characters at base32-test.pl line 17
1 f | 2 my | 8 MY====== | %
Once again, with forced conversion to lower-case:
MIME::Base32
1 f | 2 MY | 8 my====== | 8 my======
2 fo | 4 MZXQ | 8 mzxq==== | 7 mz{����
3 foo | 5 MZXW6 | 8 mzxw6=== | 7 mz{�O
4 foob | 7 MZXW6YQ | 8 mzxw6yq= | 6 mz{��^
5 fooba | 8 MZXW6YTB | 8 mzxw6ytb | 6 mz{��]
6 foobar | 10 MZXW6YTBOI | 16 mzxw6ytboi====== | 14 mz{��]���zzzzz
Convert::Base32
Data contains non-base32 characters at base32-test.pl line 17
1 f | 2 my | 8 my====== | %
sigh