Netlink, auditing, and counting bytes

I’ve been messing around with Linux auditing lately, because of reasons, and ended up having to replicate most of libaudit, because of other reasons, and in the process I found bugs in both the kernel and userspace parts of the Linux audit subsystem.

Let us start with what Netlink is, for readers who aren’t very familiar with Linux: it is a mechanism for communicating directly with kernel subsystems using the BSD socket API, rather than by opening device nodes or files in a synthetic filesystem such as /proc. It has pros and cons, but mostly pros, especially as a replacement for ioctl(2), since Netlink sockets are buffered, can be poll(2)ed, and can more easily accommodate variable-length messages and partial reads.

Note: all links to source code in this post point to the versions used in Ubuntu 18.04 as of 2020-08-21: kernel 5.4, userspace 2.8.2.

Netlink messages start with a 16-byte header which looks like this: ^{(source, man page)}

struct nlmsghdr {
	__u32		nlmsg_len;	/* Length of message including header */
	__u16		nlmsg_type;	/* Message content */
	__u16		nlmsg_flags;	/* Additional flags */
	__u32		nlmsg_seq;	/* Sequence number */
	__u32		nlmsg_pid;	/* Sending process port ID */
};

The same header also provides a few macros to help populate and interpret Netlink messages: ^{(source, man page)}

#define NLMSG_ALIGNTO	4U
#define NLMSG_ALIGN(len) ( ((len)+NLMSG_ALIGNTO-1) & ~(NLMSG_ALIGNTO-1) )
#define NLMSG_HDRLEN	 ((int) NLMSG_ALIGN(sizeof(struct nlmsghdr)))
#define NLMSG_LENGTH(len) ((len) + NLMSG_HDRLEN)
#define NLMSG_SPACE(len) NLMSG_ALIGN(NLMSG_LENGTH(len))
#define NLMSG_DATA(nlh)  ((void*)(((char*)nlh) + NLMSG_LENGTH(0)))
#define NLMSG_NEXT(nlh,len)	 ((len) -= NLMSG_ALIGN((nlh)->nlmsg_len), \
				  (struct nlmsghdr*)(((char*)(nlh)) + NLMSG_ALIGN((nlh)->nlmsg_len)))
#define NLMSG_OK(nlh,len) ((len) >= (int)sizeof(struct nlmsghdr) && \
			   (nlh)->nlmsg_len >= sizeof(struct nlmsghdr) && \
			   (nlh)->nlmsg_len <= (len))
#define NLMSG_PAYLOAD(nlh,len) ((nlh)->nlmsg_len - NLMSG_SPACE((len)))

Going by these definitions and the documentation, it is clear that the length field of the message header reflects the total length of the message, header included. What is somewhat less clear is that Netlink messages are supposed to be padded out to a multiple of four bytes before transmission or storage.

The Linux audit subsystem not only breaks these rules, but does not even agree with itself on precisely how to break them.

The userspace tools (auditctl(8), auditd(8)…) all use libaudit to communicate with the kernel audit subsystem. When passing a message of length size to the kernel, libaudit copies the payload into a large pre-zeroed buffer, sets the type, flags, and sequence number fields to the appropriate values, sets the pid field to zero (which is probably a bad idea but, strictly speaking, permitted), and finally sets the length field to NLMSG_SPACE(size), which evaluates to sizeof(struct nlmsghdr) + size rounded up to a multiple of four. It then writes that exact number of bytes to the socket.

Bug #1: The length field should not be rounded up; the purpose of the NLMSG_SPACE() and NLMSG_NEXT() macros is to ensure proper alignment of subsequent message headers when multiple messages are stored or transmitted consecutively. The length field should be computed using NLMSG_LENGTH(), which simply adds the length of the header to its single argument.

Note: to my understanding, Netlink supports sending multiple messages in a single send / receive provided that they are correctly aligned, that they all have the NLM_F_MULTI flag set, and that the last message in the sequence is a zero-length message of type NLMSG_DONE. The audit subsystem does not use this feature.

Moving on: NETLINK_AUDIT messages essentially fall into one of four categories:

Requests from userspace; for instance, an AUDIT_GET message which requests the current status of the audit subsystem, an AUDIT_SET message which changes parameters, or an AUDIT_LIST_RULES message which requests a list of currently active auditing rules.
Responses from the kernel; these usually have the same type as the request. For instance, the kernel will respond to an AUDIT_GET request with a message of the same type containing a struct audit_status, and to an AUDIT_LIST_RULES request with a sequence of messages of the same type each containing a single struct audit_rule_data.
Errors and acknowledgements. These use standard Netlink message types: NLMSG_ERROR in response to an invalid request (or a valid request with the NLM_F_ACK flag set), or NLMSG_DONE at the end of a multi-part response.
Audit data. Every event that matches an auditing rule will trigger a series of messages with varying types: usually one which describes the system call that triggered the event, one each for every file or directory affected by the call, one or more describing the process, etc. Each message consists of a header of the form audit(timestamp:serial):␣ which uniquely identifies the event, followed by a space-separated list of key-value pairs. The final message has the type AUDIT_EOE and has the same header, trailing space included, but no data.

The kernel pads responses, errors and acknowledgements, but does not include that padding in the length reported in the message header. So far, so good. However…

Bug #2: Audit data messages are sent from the kernel without padding.

This is not critical, but it does mean that an implementation that batches up incoming messages and stores them consecutively must take extra care to keep them properly aligned.

Bug #3: The length field on audit data messages does not include the length of the header.

This is jaw-dropping. It is so fundamentally wrong. It means that anyone who wants to talk to the audit subsystem using their own code instead of libaudit will have to add a workaround to the Netlink layer of their stack to either fix or ignore the error, and apply that workaround only for certain message types.

How has this gone unnoticed? Well, libaudit doesn’t do much input validation. It relies on the NLMSG_OK() macro, which checks only three things:

That the length of the buffer (as returned by recvfrom(2), for instance) is no less than the length of a Netlink message header.
That the length field in the message header is no less than the length of a Netlink message header.
That the length field in the message header is less than or equal to the length of the buffer.

Since every audit data message, even the empty AUDIT_EOE message, begins with a timestamp and serial number, the length of the payload is never less than 25-30 bytes, and NLMSG_OK() is always satisfied. And since the audit subsystem never sends multiple messages in a single send / receive, it does not matter that NLMSG_NEXT() will be off by 16 bytes.

Consumers of libaudit don’t notice either because they never look at the header; libaudit wraps the message in its own struct audit_reply with its own length and type fields and pointers of the appropriate types for messages that contain binary data (this is a bad idea for entirely different reasons which we won’t go into here). The only case in which the caller needs to know the length of the message is for audit events, when the length field just happens to be the length of the payload, just like the caller expects.

The odds of these bugs getting fixed is approximately zero, because existing applications will break in interesting ways if the kernel starts setting the length field correctly.

Turing wept.

THIS IS WHY WE CAN’T HAVE NICE THINGS

Leave a Reply Cancel reply