Tuesday, June 26, 2018

More Things You Should Never Do in the Kernel

Some things should never be done in the (Linux) kernel

One should never read a file from within the kernel, said Greg Kroah-Hartman in 2005. Nevertheless, he published some code showing how to read a file from within the kernel. That's the reality, and advantage, of open-source software: You can do anything you can code, just don't expect it to be merged upstream.

As part of a research project, I recently had to read an entire file system from within the kernel. The idea is to investigate the kernel's view of the system--files, processes, resources--and compare the results with those reported from user space. If there is a discrepancy, it could be due to malware or instability.

I'm not the only one

The code published by Kroah-Hartman is long-ago broken by changes in the linux kernel, so I started looking around in the current upstream sources for options.

It turns out that the current upstream kernel (4.17.2 at the time of this writing) does have some exported symbols for doing some things that "should never be done:"
  • reading and writing files
  • reading and writing sockets
  • investigating open files
These symbols were likely merged because of the need for kernel drivers to load firmware or other binary material, which is becoming an increasingly common thing in Linux.

Reading and Writing Files

If you need to read from and write to files, you can use the symbols kernel_read and kernel_write, which are prototyped in include/linux/fs.h. Here are user-like wrapper functions I wrote to generalize these two symbols. They allow you to open the file using a path:

#include <linux/fs.h>

ssize_t
write_file(char *name, void *buf, size_t count, loff_t *pos)
{
ssize_t ccode;
struct file *f;
f = filp_open(name, O_WRONLY, 0);
if (f) {
ccode = kernel_write(f, buf, count, pos);
if (ccode < 0) {
pr_err("Unable to write file: %s (%ld)", name, ccode);
filp_close(f, 0);
return ccode;
}
} else {
ccode = -EBADF;
pr_err("Unable to open file: %s (%ld)", name, ccode);
}
return ccode;
}

ssize_t
read_file(char *name, void *buf, size_t count, loff_t *pos)
{
ssize_t ccode;
struct file *f;
f = filp_open(name, O_RDONLY, 0);
if (f) {
ccode = kernel_read(f, buf, count, pos);
if (ccode < 0) {
pr_err("Unable to read file: %s (%ld)", name, ccode);
filp_close(f, 0);
return ccode;
}
filp_close(f, 0);
} else {
ccode = -EBADF;
  pr_err("Unable to open file: %s (%ld)", name, ccode);
}
return ccode;
}


/proc and /sys

It turns out that these functions work with /proc/ and /sys/ files, with some additional helper code that I will publish shortly. And they work with all the kernel vfs file systems. They will not work with user-space file systems such as fuse.

For example, my research kernel module can read /proc/1/mounts and send that file to an external monitor that will compare the data to the output of the user space mount command.

You probably shouldn't do this, unless you should

It's so much easier to read and write files from user space, and the consequences of a bug in your code are less severe. But the entire point of my current project is to get data from within the kernel and then compare that to what should be the same data obtained from user space.