Wednesday, August 8, 2018

Working with UUIDs in the Linux Kernel

Universally Unique Identifiers in the Linux Kernel

UUID's are 128-bit numbers that are (virtually) unique. The UUID format was developed during the '80s as part of Remote Procedure Call (RPC) technology. It is standardized today by several organizations including the IETF with RFC 4122.

(Yes, I do personally remember Apollo workstations and the Network Computing System - "The Network is the Computer." I was just getting started with networking company Novell at the time.)

UUIDs (and variations) are in use more than ever today because there is always a need for unique numbers and the uuid format is:

  1. Easy to generate
  2. Standardized across platforms and architectures
  3. Reliable
The third item--reliability--is worth discussing briefly. The most frequently used UUIDs are generated from a 60-bit time stamp and a MAC address. There is a theoretical chance that such a UUID could collide (have the same 128-bit value) with one generated by another node, but in practice this has not been a concern. Some algorithms generate UUIDs that are are provably unique. (For more information read RFC 4122.)

UUID kernel module

I uploaded the source code for a demonstration kernel module that shows how to use the interfaces I've made use of in my current project.  This demo code does the following:
  1. generate a random binary UUID
  2. convert the UUID to a 36-bit string
  3. parse the 36-bit string into a second binary UUID
  4. compare the 128-bit numbers from steps (1) and (3)

The Linux Kernel's UUID interface

The kernel's UUID API is defined in /include/linux/uuid.h and implemented in /lib/uuid.c. When I needed to make use of the UUID format in my current Linux kernel project, I found it straight forward, with one exception:
  • There is no uuid_unparse function in the Linux kernel.  In user space, it is standard practice to convert a 16-byte binary UUID into a string by calling uuid_unparse, and from a 36-byte string into a 16-byte binary by calling uuid_parse
Instead, the kernel provides a special printk format that works with the kernel's string library, which you can read in /Documentation/printk-formats.txt.

UUID/GUID addresses
===================
%pUb 00010203-0405-0607-0809-0a0b0c0d0e0f
%pUB 00010203-0405-0607-0809-0A0B0C0D0E0F
%pUl 03020100-0504-0706-0809-0a0b0c0e0e0f
%pUL 03020100-0504-0706-0809-0A0B0C0E0E0F
For printing 16-byte UUID/GUIDs addresses. The additional 'l', 'L','b' and 'B' specifiers are used to specify a little endian order in lower ('l') or upper case ('L') hex characters - and big endian order in lower ('b') or upper case ('B') hex characters.
 Where no additional specifiers are used the default big endian order with lower case hex characters will be printed.
For example,  in user space to convert from a 16-byte, 128-bit binary number a 36-byte string you would do the following: 

uuid_t uuid = {0};
uint8_t uuid_string[UUID_STRING_LEN + 1] = {0};

/* create a random uuid */
generate_random_uuid(uuid.b);

/* convert the binary uuid into a properly formatted string */
uuid_unparse(uuid, uuid_string);

But, in the Linux kernel, we must instead use the string library in combination with the special uuid printk formatter:

/* convert the binary uuid into a properly formatted string */
snprintf(uuid_string, UUID_STRING_LEN + 1, "%pUb", &uuid);

Two More Minor Points

  • The kernel APIs pass uuid_t values by reference, whereas in user space they are passed by value. 
  • The kernel by default generates and parses UUIDs in big-endian format, although you can change this to be little-endian.

Tuesday, July 31, 2018

A Linux Kernel Module File Reader

A Kernel Module that Reads the Hosts File Systems

In my previous blog post I discussed reading files from within a kernel module, which remains a controversial topic among kernel developers. I am following through with a kernel module that will read both normal files and virtual files in /proc, /sys, and other sysfs-related file systems.

The source code for this module is located on my GitHub: https://github.com/ncultra/procfs.

Reading Normal Files

Reading normal files is as easy as:

void *buffer = NULL;
loff_t max_size = 0x1000, size = 0;

int ccode = 
kernel_read_file_from_path(name, &buffer, &size, max_size, READING_MODULE);

The result code will be returned by the function, and if successful the contents of the file will be contained in buffer, and the size of the file will be stored in size.

It bears time looking at the last parameter: READING_MODULE is an enumerated value that tells the kernel what and why it is reading. Other enumerated values include READING_FIRMWARE, READING_KEXEC_IMAGE, and READING_X509_CERTIFICATE. (See enum kernel_read_file_id in /include/linux/fs.h.)

The idea is that a kernel module needs to have a very specific reason to open and read a file. Otherwise its easier and safer to open and read a file from a user space program.


Reading Virtual Files

Some entries in the kernel's virtual file systems don't stat normally. (For example  /proc/cpuinfo or /proc/1/mounts.) When you call vfs_getattr the return struct kstat shows the file having zero size and zero blocks. But you know many of the files can be read using system calls because you can cat them from user space.

If you call kernel_read_file_from_path using "/proc/cpuinfo", for example, it will return -EINVAL as follows:

if (!S_ISREG(file_inode(file)->i_mode) || max_size < 0)
return -EINVAL;

if (i_size_read(file_inode(f)) <= 0)

return -EINVAL;




The solution is to call the lower-level kernel_read function in a loop, managing file handles, memory, file size, and errors yourself.

My kernel module does just that with its vfs_read_file function:

ssize_t vfs_read_file(char *name, void **buf, size_t max_count, loff_t *pos)


vfs_read_file loops through the file in chunks, re-allocing memory as is needed. Upon failing to read data, or upon reading the last chunk, it will exit the loop. The result is similar to the earlier kernel_read_file_from_path, except that we must manage the file handle and memory ourselves.

Building and Running procfs.ko

The kernel module accepts module (command-line) parameters for the file name, type, chunk size, and max buffer size. It will simply dump the file contents out into /var/log/messages. The same code is being used in security research project.

Tuesday, June 26, 2018

More Things You Should Never Do in the Kernel

Some things should never be done in the (Linux) kernel

One should never read a file from within the kernel, said Greg Kroah-Hartman in 2005. Nevertheless, he published some code showing how to read a file from within the kernel. That's the reality, and advantage, of open-source software: You can do anything you can code, just don't expect it to be merged upstream.

As part of a research project, I recently had to read an entire file system from within the kernel. The idea is to investigate the kernel's view of the system--files, processes, resources--and compare the results with those reported from user space. If there is a discrepancy, it could be due to malware or instability.

I'm not the only one

The code published by Kroah-Hartman is long-ago broken by changes in the linux kernel, so I started looking around in the current upstream sources for options.

It turns out that the current upstream kernel (4.17.2 at the time of this writing) does have some exported symbols for doing some things that "should never be done:"
  • reading and writing files
  • reading and writing sockets
  • investigating open files
These symbols were likely merged because of the need for kernel drivers to load firmware or other binary material, which is becoming an increasingly common thing in Linux.

Reading and Writing Files

If you need to read from and write to files, you can use the symbols kernel_read and kernel_write, which are prototyped in include/linux/fs.h. Here are user-like wrapper functions I wrote to generalize these two symbols. They allow you to open the file using a path:

#include <linux/fs.h>

ssize_t
write_file(char *name, void *buf, size_t count, loff_t *pos)
{
ssize_t ccode;
struct file *f;
f = filp_open(name, O_WRONLY, 0);
if (f) {
ccode = kernel_write(f, buf, count, pos);
if (ccode < 0) {
pr_err("Unable to write file: %s (%ld)", name, ccode);
filp_close(f, 0);
return ccode;
}
} else {
ccode = -EBADF;
pr_err("Unable to open file: %s (%ld)", name, ccode);
}
return ccode;
}

ssize_t
read_file(char *name, void *buf, size_t count, loff_t *pos)
{
ssize_t ccode;
struct file *f;
f = filp_open(name, O_RDONLY, 0);
if (f) {
ccode = kernel_read(f, buf, count, pos);
if (ccode < 0) {
pr_err("Unable to read file: %s (%ld)", name, ccode);
filp_close(f, 0);
return ccode;
}
filp_close(f, 0);
} else {
ccode = -EBADF;
  pr_err("Unable to open file: %s (%ld)", name, ccode);
}
return ccode;
}


/proc and /sys

It turns out that these functions work with /proc/ and /sys/ files, with some additional helper code that I will publish shortly. And they work with all the kernel vfs file systems. They will not work with user-space file systems such as fuse.

For example, my research kernel module can read /proc/1/mounts and send that file to an external monitor that will compare the data to the output of the user space mount command.

You probably shouldn't do this, unless you should

It's so much easier to read and write files from user space, and the consequences of a bug in your code are less severe. But the entire point of my current project is to get data from within the kernel and then compare that to what should be the same data obtained from user space.