Tag Archives: tech

Bump

Time for my annual “oh shit, I forgot to bump the copyright year again” round-up!

In the F/OSS community, there are two different philosophies when it comes to applying copyright statements to a project. If the code base consists exclusively (or almost exclusively) of code developed for that specific project by the project’s author or co-authors, many projects will have a single file (usually named LICENSE) containing the license, a list of copyright holders, and the copyright dates or ranges. However, if the code base incorporates a significant body of code taken from other projects or contributed by parties outside the project, it is customary to include the copyright statements and either the complete license or a reference to it in each individual file. In my experience, projects that use the BSD, ISC, MIT, adjacent licenses tend to use the latter model regardless.

The advantage of the second model is that it’s hard to get wrong. You might forget to add a name to a central list, but you’re far less likely to forget to state the name of the author when you add a new file. The disadvantage is that it’s really, really easy to forget to update the copyright year when you commit a change to an existing file that hasn’t been touched in a while.

So, how can we automate this?

One possibility is to have a pre-commit hook that updates it for you (generally a bad idea), or one that rejects the commit if it thinks you forgot (better, but not perfect; what if you’re adding a file from an outside source?), or one that prints a big fat warning if it thinks you forgot (much better, especially with Git since you can commit --amend once you’ve fixed it, before pushing).

But how do you fix the mistake retroactively, without poring over commit logs to figure out what was modified when?

Let’s start by assuming that you have a list of files that were modified in 2017, and that each file only has one copyright statement that needs to be updated to reflect that fact. The following Perl one-liner should do the trick:

perl -p -i -e 'if (/Copyright/) { s/ ([0-9]{4})-20(?:0[0-9]|1[0-6]) / $1-2017 /; s/ (20(?:0[0-9]|1[0-6])) / $1-2017 /; }'

It should be fairly self-explanatory if you know regular expressions. The first substitution handles the case where the existing statement contains a range, in which case we extend it to include 2017, and the second substitution handles the case where the existing statement contains a single year, which we replace with a range starting with the original year and ending with 2017. The complexity stems mostly from having to take care not to replace 2018 (or later) with 2017; our regexes only match years in the range 2000-2016.

OK, so now we know how to fix the files, but how do we figure out which ones need fixing?

With Git, we could try something like this:

git diff --name-only 'HEAD@{2017-01-01}..HEAD@{2018-01-01}'

This is… imperfect, though. The first problem is that it will list every file that was touched, including files that were added, moved, renamed, or deleted. Files that were added should be assumed to have had a correct copyright statement at the time they were added; files that were only moved or renamed should not be updated, since their contents did not change; and files that were deleted are no longer there to be updated.¹ We should restrict our search to files that were actually modified:

git diff --name-only --diff-filter M 'HEAD@{2017-01-01}..HEAD@{2018-01-01}'

Some of those changes might be too trivial to copyright, though. This is a fairly complex legal matter, but to simplify, if the change was inevitable and there was no room for creative expression — for instance, a function in a third-party library you are using was renamed, so both the reason for the change and the nature of the change are external to the work itself — then it is not protected. So perhaps you should remove --name-only and review the diff, which is when you realize that half those files were only modified to update their copyright statements because you forgot to do so in 2016. Let’s try to exclude them mechanically, rather than manually. Unfortunately, git diff does not have anything that resembles diff -I, so we have to write our own diff command which does that, and ask git to use it:

$ echo 'exec diff -u -ICopyright "$@"' >diff-no-copyright
$ chmod a+rx diff-no-copyright
$ git difftool --diff-filter M --extcmd $PWD/diff-no-copyright 'HEAD@{2017-01-01}..HEAD@{2018-01-01}'

This gives us a diff, though, not a list of files. We can try to extract the names as follows:

$ git difftool --diff-filter M --no-prompt --extcmd $PWD/diff-no-copyright 'HEAD@{2017-01-01}..HEAD@{2018-01-01}' | awk '/^---/ { print $2 }'

Wait… no, that’s just garbage. The thing is, git difftool works by checking out both versions of a file and diffing them, so what we get is a list of the names of the temporary files it created. We have to be a little more creative:

$ echo '/usr/bin/diff -q -ICopyright "$@" >/dev/null || echo "$BASE"' >list-no-copyright
$ chmod a+rx list-no-copyright
$ git difftool --diff-filter M --no-prompt --extcmd $PWD/list-no-copyright 'HEAD@{2017-01-01}..HEAD@{2018-01-01}'

Much better. We can glue this together with our Perl one-liner using xargs, then repeat the process for 2018.

Finally, how about Subversion? On the one hand, Subversion is far simpler than Git, so we can get 90% of the way much more easily. On the other hand, Subversion is far less flexible than Git, so we can’t go the last 10% of the way. Here’s the best I could do:

$ echo 'exec diff -u -ICopyright "$@"' >diff-no-copyright
$ chmod a+rx diff-no-copyright
$ svn diff --ignore-properties --diff-cmd $PWD/diff-no-copyright -r'{2017-01-01}:{2018-01-01}' | awk '/^---/ { print $2 }'

This will not work properly if you have files with names that contain whitespace; you’ll have to use sed with a much more complicated regex, which I leave as an exercise.


¹ I will leave the issue of move-and-modify being incorrectly recorded as delete-and-add to the reader. One possibility is to include added files in the list by using --diff-filter AM, and review them manually before committing.

DNS over TLS in FreeBSD 12

With the arrival of OpenSSL 1.1.1, an upgraded Unbound, and some changes to the setup and init scripts, FreeBSD 12.0, currently in beta, now supports DNS over TLS out of the box.

DNS over TLS is just what it sounds like: DNS over TCP, but wrapped in a TLS session. It encrypts your requests and the server’s replies, and optionally allows you to verify the identity of the server. The advantages are protection against eavesdropping and manipulation of your DNS traffic; the drawbacks are a slight performance degradation and potential firewall traversal issues, as it runs over a non-standard port (TCP port 853) which may be blocked on some networks. Let’s take a look at how to set it up.

Basic setup

As a simple test case, let’s set up our 12.0-ALPHA10 VM to use Cloudflare’s DNS service:

# uname -r
12.0-ALPHA10
# cat >/etc/rc.conf.d/local_unbound <<EOF
local_unbound_enable="YES"
local_unbound_tls="YES"
local_unbound_forwarders="1.1.1.1@853 1.0.0.1@853"
EOF
# service local_unbound start
Performing initial setup.
destination:
/var/unbound/forward.conf created
/var/unbound/lan-zones.conf created
/var/unbound/control.conf created
/var/unbound/unbound.conf created
/etc/resolvconf.conf not modified
Original /etc/resolv.conf saved as /var/backups/resolv.conf.20181021.192629
Starting local_unbound.
Waiting for nameserver to start... good
# host www.freebsd.org
www.freebsd.org is an alias for wfe0.nyi.freebsd.org.
wfe0.nyi.freebsd.org has address 96.47.72.84
wfe0.nyi.freebsd.org has IPv6 address 2610:1c1:1:606c::50:15
wfe0.nyi.freebsd.org mail is handled by 0 .

Note that this is not a configuration you want to run in production—we will come back to this later.

Performance

The downside of DNS over TLS is the performance hit of the TCP and TLS session setup and teardown. We demonstrate this by flushing our cache and (rather crudely) measuring a cache miss and a cache hit:

# local-unbound-control reload
ok
# time host www.freebsd.org >x
host www.freebsd.org > x 0.00s user 0.00s system 0% cpu 0.553 total
# time host www.freebsd.org >x
host www.freebsd.org > x 0.00s user 0.00s system 0% cpu 0.005 total

Compare this to querying our router, a puny Soekris net5501 running Unbound 1.8.1 on FreeBSD 11.1-RELEASE:

# time host www.freebsd.org gw >x
host www.freebsd.org gw > x 0.00s user 0.00s system 0% cpu 0.232 total
# time host www.freebsd.org 192.168.144.1 >x
host www.freebsd.org gw > x 0.00s user 0.00s system 0% cpu 0.008 total

or to querying Cloudflare directly over UDP:

# time host www.freebsd.org 1.1.1.1 >x      
host www.freebsd.org 1.1.1.1 > x 0.00s user 0.00s system 0% cpu 0.272 total
# time host www.freebsd.org 1.1.1.1 >x
host www.freebsd.org 1.1.1.1 > x 0.00s user 0.00s system 0% cpu 0.013 total

(Cloudflare uses anycast routing, so it is not so unreasonable to see a cache miss during off-peak hours.)

This clearly shows the advantage of running a local caching resolver—it absorbs the cost of DNSSEC and TLS. And speaking of DNSSEC, we can separate that cost from that of TLS by reconfiguring our server without the latter:

# cat >/etc/rc.conf.d/local_unbound <<EOF
local_unbound_enable="YES"
local_unbound_tls="NO"
local_unbound_forwarders="1.1.1.1 1.0.0.1"
EOF
# service local_unbound setup
Performing initial setup.
destination:
Original /var/unbound/forward.conf saved as /var/backups/forward.conf.20181021.205328
/var/unbound/lan-zones.conf not modified
/var/unbound/control.conf not modified
Original /var/unbound/unbound.conf saved as /var/backups/unbound.conf.20181021.205328
/etc/resolvconf.conf not modified
/etc/resolv.conf not modified
# service local_unbound start
Starting local_unbound.
Waiting for nameserver to start... good
# time host www.freebsd.org >x
host www.freebsd.org > x 0.00s user 0.00s system 0% cpu 0.080 total
# time host www.freebsd.org >x
host www.freebsd.org > x 0.00s user 0.00s system 0% cpu 0.004 total

So does TLS add nearly half a second to every cache miss? Not quite, fortunately—in our previous tests, our first query was not only a cache miss but also the first query after a restart or a cache flush, resulting in a complete load and validation of the entire path from the name we queried to the root. The difference between a first and second cache miss is quite noticeable:

# time host www.freebsd.org >x 
host www.freebsd.org > x 0.00s user 0.00s system 0% cpu 0.546 total
# time host www.freebsd.org >x
host www.freebsd.org > x 0.00s user 0.00s system 0% cpu 0.004 total
# time host repo.freebsd.org >x
host repo.freebsd.org > x 0.00s user 0.00s system 0% cpu 0.168 total
# time host repo.freebsd.org >x
host repo.freebsd.org > x 0.00s user 0.00s system 0% cpu 0.004 total

Revisiting our configuration

Remember when I said that you shouldn’t run the sample configuration in production, and that I’d get back to it later? This is later.

The problem with our first configuration is that while it encrypts our DNS traffic, it does not verify the identity of the server. Our ISP could be routing all traffic to 1.1.1.1 to its own servers, logging it, and selling the information to the highest bidder. We need to tell Unbound to validate the server certificate, but there’s a catch: Unbound only knows the IP addresses of its forwarders, not their names. We have to provide it with names that will match the x509 certificates used by the servers we want to use. Let’s double-check the certificate:

# :| openssl s_client -connect 1.1.1.1:853 |& openssl x509 -noout -text |& grep DNS
DNS:*.cloudflare-dns.com, IP Address:1.1.1.1, IP Address:1.0.0.1, DNS:cloudflare-dns.com, IP Address:2606:4700:4700:0:0:0:0:1111, IP Address:2606:4700:4700:0:0:0:0:1001

This matches Cloudflare’s documentation, so let’s update our configuration:

# cat >/etc/rc.conf.d/local_unbound <<EOF
local_unbound_enable="YES"
local_unbound_tls="YES"
local_unbound_forwarders="1.1.1.1@853#cloudflare-dns.com 1.0.0.1@853#cloudflare-dns.com"
EOF
# service local_unbound setup
Performing initial setup.
destination:
Original /var/unbound/forward.conf saved as /var/backups/forward.conf.20181021.212519
/var/unbound/lan-zones.conf not modified
/var/unbound/control.conf not modified
/var/unbound/unbound.conf not modified
/etc/resolvconf.conf not modified
/etc/resolv.conf not modified
# service local_unbound restart
Stopping local_unbound.
Starting local_unbound.
Waiting for nameserver to start... good
# host www.freebsd.org
www.freebsd.org is an alias for wfe0.nyi.freebsd.org.
wfe0.nyi.freebsd.org has address 96.47.72.84
wfe0.nyi.freebsd.org has IPv6 address 2610:1c1:1:606c::50:15
wfe0.nyi.freebsd.org mail is handled by 0 .

How can we confirm that Unbound actually validates the certificate? Well, we can run Unbound in debug mode (/usr/sbin/unbound -dd -vvv) and read the debugging output… or we can confirm that it fails when given a name that does not match the certificate:

# perl -p -i -e 's/cloudflare/cloudfire/g' /etc/rc.conf.d/local_unbound
# service local_unbound setup
Performing initial setup.
destination:
Original /var/unbound/forward.conf saved as /var/backups/forward.conf.20181021.215808
/var/unbound/lan-zones.conf not modified
/var/unbound/control.conf not modified
/var/unbound/unbound.conf not modified
/etc/resolvconf.conf not modified
/etc/resolv.conf not modified
# service local_unbound restart
Stopping local_unbound.
Waiting for PIDS: 33977.
Starting local_unbound.
Waiting for nameserver to start... good
# host www.freebsd.org
Host www.freebsd.org not found: 2(SERVFAIL)

But is this really a failure to validate the certificate? Actually, no. When provided with a server name, Unbound will pass it to the server during the TLS handshake, and the server will reject the handshake if that name does not match any of its certificates. To truly verify that Unbound validates the server certificate, we have to confirm that it fails when it cannot do so. For instance, we can remove the root certificate used to sign the DNS server’s certificate from the test system’s trust store. Note that we cannot simply remove the trust store entirely, as Unbound will refuse to start if the trust store is missing or empty.

While we’re talking about trust stores, I should point out that you currently must have ca_root_nss installed for DNS over TLS to work. However, 12.0-RELEASE will ship with a pre-installed copy.

Conclusion

We’ve seen how to set up Unbound—specifically, the local_unbound service in FreeBSD 12.0—to use DNS over TLS instead of plain UDP or TCP, using Cloudflare’s public DNS service as an example. We’ve looked at the performance impact, and at how to ensure (and verify) that Unbound validates the server certificate to prevent man-in-the-middle attacks.

The question that remains is whether it is all worth it. There is undeniably a performance hit, though this may improve with TLS 1.3. More importantly, there are currently very few DNS-over-TLS providers—only one, really, since Quad9 filter their responses—and you have to weigh the advantage of encrypting your DNS traffic against the disadvantage of sending it all to a single organization. I can’t answer that question for you, but I can tell you that the parameters are evolving quickly, and if your answer is negative today, it may not remain so for long. More providers will appear. Performance will improve with TLS 1.3 and QUIC. Within a year or two, running DNS over TLS may very well become the rule rather than the experimental exception.

Twenty years

Yesterday was the twentieth anniversary of my FreeBSD commit bit, and tomorrow will be the twentieth anniversary of my first commit. I figured I’d split the difference and write a few words about it today.

My level of engagement with the FreeBSD project has varied greatly over the twenty years I’ve been a committer. There have been times when I worked on it full-time, and times when I did not touch it for months. The last few years, health issues and life events have consumed my time and sapped my energy, and my contributions have come in bursts. Commit statistics do not tell the whole story, though: even when not working on FreeBSD directly, I have worked on side projects which, like OpenPAM, may one day find their way into FreeBSD.

My contributions have not been limited to code. I was the project’s first Bugmeister; I’ve served on the Security Team for a long time, and have been both Security Officer and Deputy Security Officer; I managed the last four Core Team elections and am doing so again this year.

In return, the project has taught me much about programming and software engineering. It taught me code hygiene and the importance of clarity over cleverness; it taught me the ins and outs of revision control; it taught me the importance of good documentation, and how to write it; and it taught me good release engineering practices.

Last but not least, it has provided me with the opportunity to work with some of the best people in the field. I have the privilege today to count several of them among my friends.

For better or worse, the FreeBSD project has shaped my career and my life. It set me on the path to information security in general and IAA in particular, and opened many a door for me. I would not be where I am now without it.

I won’t pretend to be able to tell the future. I don’t know how long I will remain active in the FreeBSD project and community. It could be another twenty years; or it could be ten, or five, or less. All I know is that FreeBSD and I still have things to teach each other, and I don’t intend to call it quits any time soon.

Previously

Yes, all men

Since Susan Fowler blogged about her experience at Uber in February, the debate about sexism in tech has dominated IT and business news. Note that this debate is not new, and Fowler’s story isn’t all that different from many other stories we’ve heard before. It’s just that for once, finally, people were paying attention, and There Were Consequences. Then matters escalated in June with a string of revelations about sexism—not just discrimination, but full on sexual harassment.

Then came the apologies. Let me tell you about the apologies. The average response from a manager or venture capitalist accused of sexism went something like this:

I apologized unreservedly for my treatment of X. I realize now that my innocent jokes may have been misinterpreted. I’m actually a pretty nice guy, and X’s refusal to sleep with me had no impact whatsoever on my decision not to invest in her startup.

Then came the White Knights:

As a VC, I’m appalled to hear about my colleagues’ behavior towards women. I would like to reassure you all that Not All Men are like that. I myself am actually a pretty nice guy and completely innocent in all this.

Guys, it’s time to face the music. We have all been That Guy. We have all made sexist jokes, or laughed when others made them, or stood by silently while our male bosses, coworkers and colleagues ignored or patronized or belittled or humiliated women. We have all benefited from a system that eliminates close to 50% of our competition before the race has even started.

We are all complicit. We are all guilty.

So what do we do? Where do we go next?

First, take a deep breath, do a little soul-searching, and re-read that paragraph until any impulse, however minor, to say to yourself “OK, but not me” is gone. Yes, you too.

Next, if you’ve ever acted inappropriately towards a female coworker or friend or acquaintance, or stood by silently while others did, consider apologizing.

Third, vow to never do so again, and work hard to keep that vow. Respect the women around you as much as you respect the men. If someone around you acts or speaks inappropriately, speak up, even if there are no women present. Be proactive: make sure that women are given equal opportunity to join those career-building projects, and are included in those backstage chats where decisions are made. If you are hiring, seek out female candidates, keeping in mind that women have a tendency to underestimate their abilities and experience just as men have a tendency to overestimate them. If you are teaching, encourage and mentor female students. Reach out to them if they seem discouraged. Don’t wait until they drop out.

Open your eyes. Open your ears. Listen to the women around you. Believe them. Respect them. Be someone they can vent to and someone they can count on for support when push comes to shove.

You will slip up. When you do, apologize and vow to do better.

As a man, you are, and always have been, part of the problem. Accept it, and start being part of the solution.

Not up to our usual standards

For a few years now, I’ve been working on and off on a set of libraries which collect cryptography- and security-related code I’ve written for other projects as well as functionality which is not already available under a permissive license, or where existing implementations do not meet my expectations of cleanliness, readability, portability and embeddability.

(Aside: the reasons why this has taken years, when I initially expected to publish the first release in the spring or summer of 2014, are too complex to explain here; I may write about them at a later date. Keywords are health, family and world events.)

Two of the major features of that collection are the OATH Authentication Methods (which includes the algorithm used by Google Authenticator and a number of commercial one-time code fobs) and the Common Platform Enumeration, part of the Security Content Automation Protocol. I implemented the former years ago for my employer, and it has languished in the OpenPAM repository since 2012. The latter, however, has proven particularly elusive and frustrating, to the point where it has existed for two years as merely a header file and a set of mostly empty functions, just to sketch out the API. I decided to have another go at it yesterday, and actually made quite a bit of progress, only to hit the wall again. And this morning, I realized why.

The CPE standard exists as a set of NIST Interagency reports: NISTIR 7695 (naming), NISTIR 7696 (name matching), NISTIR 7697 (dictionary) and NISTIR 7698 (applicability language). The one I’ve been struggling with is 7695—it is the foundation for the other three, so I can’t get started on them until I’m done with 7695.

It should have been a breeze. On the surface, the specification seems quite thorough: basic concepts, representations, conversion between representations (including pseudocode). You know the kind of specification that you can read through once, then sit down at the computer, start from the top, and code your way down to the bottom? RFC 4226 and RFC 6238, which describe OATH event-based and time-based one-time passwords respectively, are like that. NISTIR 7695 looks like it should be. But it isn’t. And I’ve been treating it like it was, with my nose so close to the code that I couldn’t see the big picture and realize that it is actually not very well written at all, and that the best way to implement it is to read it, understand it, and then set it aside before coding.

One sign that NISTIR 7695 is a bad specification is the pseudocode. It is common for specifications to describe algorithms, protocols and / or interfaces in the normative text and provide examples, pseudocode and / or a reference implementation (sometimes of dubious quality, as is the case for RFC 4226 and RFC 6238) as non-normative appendices. NISTIR 7695, however, eschews natural-language descriptions and includes pseudocode and examples in the normative text. By way of example, here is the description of the algorithm used to convert (“bind”, in their terminology) a well-formed name to a formatted string, in its entirety:

6.2.2.1 Summary of algorithm

The procedure iterates over the eleven allowed attributes in a fixed order. Corresponding attribute values are obtained from the input WFN and conversions of logical values are applied. A result string is formed by concatenating the attribute values separated by colons.

This is followed by one page of pseudocode and two pages of examples. But the examples are far from exhaustive; as unit tests, they wouldn’t even cover all of the common path, let alone any of the error handling paths. And the pseudocode looks like it was written by someone who learned Pascal in college thirty years ago and hasn’t programmed since.

The description of the reverse operation, converting a formatted string to a well-formed name, is slightly better in some respects and much worse in others. There is more pseudocode, and the examples include one—one!—instance of invalid input… but the pseudocode includes two functions—about one third of the total—which consist almost entirely of comments describing what the functions should do, rather than actual code.

You think I’m joking? Here is one of them:

function get_comp_fs(fs,i)
  ;; Return the i’th field of the formatted string. If i=0,
  ;; return the string to the left of the first forward slash.
  ;; The colon is the field delimiter unless prefixed by a
  ;; backslash.
  ;; For example, given the formatted string:
  ;; cpe:2.3:a:foo:bar\:mumble:1.0:*:...
  ;; get_comp_fs(fs,0) = "cpe"
  ;; get_comp_fs(fs,1) = "2.3"
  ;; get_comp_fs(fs,2) = "a"
  ;; get_comp_fs(fs,3) = "foo"
  ;; get_comp_fs(fs,4) = "bar\:mumble"
  ;; get_comp_fs(fs,5) = "1.0"
  ;; etc.
end.

This function shouldn’t even exist. It should just be a lookup in an associative array, or a call to an accessor if the pseudocode was object-oriented. So why does it exist? Because the main problem with NISTIR 7695, which I should have identified on my first read-through but stupidly didn’t, is that it assumes that implementations would use well-formed names—a textual representation of a CPE name—as their internal representation. The bind and unbind functions, which should be described in terms of how to format and parse URIs and formatted strings, are instead described in terms of how to convert to and from WFNs. I cannot overstate how wrong this is. A specification should never describe a particular internal representation, except in a non-normative reference implementation, because it prevents conforming implementations from choosing more efficient representations, or representations which are better suited to a particular language and environment, and because it leads to this sort of nonsense.

So, is the CPE naming specification salvageable? Well, it includes complete ABNF grammars for URIs and formatted strings, which is good, and a partial ABNF grammar for well-formed names, which is… less good, but fixable. It also explains the meanings of the different fields; it would be useless otherwise. But apart from that, and the boilerplate at the top and bottom, it should be completely rewritten, including the pseudocode and examples, which should reference fictional, rather than real, vendors and products. Here is how I would structure it (text in italic is carried over from the original):

  1. Introduction
    1.1. Purpose and scope
    1.2. Document structure
    1.3. Document conventions
    1.4. Relationship to existing specifications and standards
  2. Definitions and abbreviations
  3. Conformance
  4. CPE data model
    4.1 Required attributes
    4.2 Optional attributes
    4.3 Special attribute values
  5. Textual representations
    5.1. Well-formed name
    5.2. URI
    5.3. Formatted string
  6. API
    6.1. Creating and destroying names
    6.2. Setting and getting attributes
    6.3. Binding and unbinding
  7. Non-normative examples
    7.1. Valid and invalid attribute values
    7.2. Valid and invalid well-formed names
    7.3. Valid and invalid URIs
    7.4. Valid and invalid formatted strings
  8. Non-normative pseudo-code
  9. References
  10. Change log

I’m still going to implement CPE naming, but I’m going to implement it the way I think the standard should have been written, not the way it actually was written. Amusingly, the conformance chapter is so vague that I can do this and still claim conformance with the Terrible, Horrible, No Good, Very Bad specification. And it should only take a few hours.

By the way, if anybody from MITRE or NIST reads this and genuinely wants to improve the specification, I’ll be happy to help.

PS: possibly my favorite feature of NISTIR 7695, and additional proof that the authors are not programmers: the specification mandates that WFNs are UTF-8 strings, which are fine for storage and transmission but horrible to work with in memory. But in the next sentence, it notes that only characters with hexadecimal values between x00 and x7F may be used, and subsequent sections further restrict the set of allowable characters. In case you didn’t know, the normalized UTF-8 representation of a sequence of characters with hexadecimal values between x00 and x7F is identical, bit by bit, to the ASCII representation of the same sequence.