Wednesday, August 8, 2018

Working with UUIDs in the Linux Kernel

Universally Unique Identifiers in the Linux Kernel

UUID's are 128-bit numbers that are (virtually) unique. The UUID format was developed during the '80s as part of Remote Procedure Call (RPC) technology. It is standardized today by several organizations including the IETF with RFC 4122.

(Yes, I do personally remember Apollo workstations and the Network Computing System - "The Network is the Computer." I was just getting started with networking company Novell at the time.)

UUIDs (and variations) are in use more than ever today because there is always a need for unique numbers and the uuid format is:

  1. Easy to generate
  2. Standardized across platforms and architectures
  3. Reliable
The third item--reliability--is worth discussing briefly. The most frequently used UUIDs are generated from a 60-bit time stamp and a MAC address. There is a theoretical chance that such a UUID could collide (have the same 128-bit value) with one generated by another node, but in practice this has not been a concern. Some algorithms generate UUIDs that are are provably unique. (For more information read RFC 4122.)

UUID kernel module

I uploaded the source code for a demonstration kernel module that shows how to use the interfaces I've made use of in my current project.  This demo code does the following:
  1. generate a random binary UUID
  2. convert the UUID to a 36-bit string
  3. parse the 36-bit string into a second binary UUID
  4. compare the 128-bit numbers from steps (1) and (3)

The Linux Kernel's UUID interface

The kernel's UUID API is defined in /include/linux/uuid.h and implemented in /lib/uuid.c. When I needed to make use of the UUID format in my current Linux kernel project, I found it straight forward, with one exception:
  • There is no uuid_unparse function in the Linux kernel.  In user space, it is standard practice to convert a 16-byte binary UUID into a string by calling uuid_unparse, and from a 36-byte string into a 16-byte binary by calling uuid_parse
Instead, the kernel provides a special printk format that works with the kernel's string library, which you can read in /Documentation/printk-formats.txt.

UUID/GUID addresses
===================
%pUb 00010203-0405-0607-0809-0a0b0c0d0e0f
%pUB 00010203-0405-0607-0809-0A0B0C0D0E0F
%pUl 03020100-0504-0706-0809-0a0b0c0e0e0f
%pUL 03020100-0504-0706-0809-0A0B0C0E0E0F
For printing 16-byte UUID/GUIDs addresses. The additional 'l', 'L','b' and 'B' specifiers are used to specify a little endian order in lower ('l') or upper case ('L') hex characters - and big endian order in lower ('b') or upper case ('B') hex characters.
 Where no additional specifiers are used the default big endian order with lower case hex characters will be printed.
For example,  in user space to convert from a 16-byte, 128-bit binary number a 36-byte string you would do the following: 

uuid_t uuid = {0};
uint8_t uuid_string[UUID_STRING_LEN + 1] = {0};

/* create a random uuid */
generate_random_uuid(uuid.b);

/* convert the binary uuid into a properly formatted string */
uuid_unparse(uuid, uuid_string);

But, in the Linux kernel, we must instead use the string library in combination with the special uuid printk formatter:

/* convert the binary uuid into a properly formatted string */
snprintf(uuid_string, UUID_STRING_LEN + 1, "%pUb", &uuid);

Two More Minor Points

  • The kernel APIs pass uuid_t values by reference, whereas in user space they are passed by value. 
  • The kernel by default generates and parses UUIDs in big-endian format, although you can change this to be little-endian.