linux

Develop a system call in Linux

Ech0

Mar 25, 2022 — 11 min read

In this post we will we will see how to develop a system call by:

Coding a function in C
Compiling it with the Linux kernel
Charging it in the syscalls table
Calling it from user-land

In other terms: we will add our own system call to the Linux kernel.

Prerequisites

If you want to follow the development and test by yourself, there are a few prerequisites.

Operating system running on a relatively recent Linux kernel.
Usual development tools (gcc, make, …)
A text editor (vim, VSCode, …)

In order to develop and test the system call we will recompile the current Linux kernel and boot on it. If you don’t know what you’re doing, don’t! At the risk of no longer being able to boot a kernel on your machine. That said, you can still follow the development without compiling. The best is to go through a VM, without altering your native system!

Preparing the field

If you don’t want to compile and test the code, you can skip this step. Take into account that the compilation of the kernel can be very long…

Before starting to develop our system call, we will have to prepare our environment a little. We will act differently depending on whether you are on a VM or your native system. I strongly recommend you to do these manipulations on a Linux VM.

On a Linux VM (recommended)

For this article, I downloaded an Ubuntu Server 20.04.4 ISO from the official website and installed it in a Virtual Box VM. You can choose the distribution, version or emulator/hypervisor you want, but you may have some differences with what I will show next.

I configured the VM with bridged network access, 4 GB of RAM, and I connect to it by SSH to work.

ech0@host$ ssh [email protected]
ech0@ubuntu-vm:~$

At this stage, I recommend that you turn off your virtual machine and take a snapshot, in case you break your bootloader during the manipulations, so that you don’t have to reinstall your VM. I’ll let you search on the internet how to do it, but it’s quite intuitive.

By default, Linux distributions do not ship kernel source code, only headers and object files. You must install the linux-source package from APT in order to have the current kernel sources. Then, you will have to unarchive the installed archive in order to have access to the directory containing the sources. Personally, I am on the Linux kernel 5.4.0 (you can check it with the command uname -r).

sudo apt install linux-source
cd /usr/src/linux-source-5.4.0
sudo bunzip2 linux-source-5.4.0.tar.bz2
sudo tar xf linux-source-5.4.0.tar
cd linux-source-5.4.0/

You just need the kernel configuration to compile it, we will generate a default and minimalist one.

sudo make defconfig

You will need the flex, bison, libelf-dev and libssl-dev packages to compile the kernel later. These packages are to be installed in the classic way through APT.

On your native system (not recommended)

For security reasons we will not modify the Linux kernel on which our system is currently booted… It would be quite irresponsible to alter and overwrite stable code for this demo.

For this reason we will clone the official Linux kernel git directory and work on a “dirty” version.

git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

We will then switch to the latest stable version deployed with our distribution, i.e. 5.13 at the time of writing this article on Ubuntu focal. (check your version with the uname -r command).

cd linux
git checkout v5.13

On this branch, we are going to prepare the kernel configuration and we have two options for that.

You will need the flex, bison, libelf-dev and libssl-dev packages to compile the kernel later. These packages are to be installed in the classic way through APT.

Option 1 (recommended): generation of a minimal configuration

As a first option, we will generate a minimal configuration. Indeed, we don’t really need to have a working operating system just to test our Linux system call. For example, we do not need KVM, touchpad, touchscreen, camera/microphone etc.

The advantage of this option is also to be able to compile the kernel relatively quickly and to have a relatively light image. I recommend this option for this exercise.

make defconfig

If your hard disk is encrypted via cryptsetup, you will have to activate an option which is not present by default on the minimal version: DM_CRYPT. To do this, you can type the command make menuconfig and activate the option named "Crypt target support" (or search for DM_CRYPT to find it). This is necessary to be able to decrypt your disk after booting to the new kernel.

Option 2: copy the current configuration

As a second option we will take the configuration of the kernel on which we are currently booted. The advantage of this option is that it will allow us to obtain a functional operating system with all its characteristics. But the compilation will be very long (> 1h30) and the result very heavy (> 10 GB).

If there are variables to specify, leave the default values (enter key).

make oldconfig

Finally, to avoid compilation errors, open the generated .config file with a text editor and assign the CONFIG_SYSTEM_TRUSTED_KEYS and CONFIG_SYSTEM_REVOCATION_KEYS variables to an empty string (""), this way:

CONFIG_SYSTEM_TRUSTED_KEYS=""
CONFIG_SYSTEM_REVOCATION_KEYS=""

Conclusion of field preparation

And there you go! Our working environment is ready and we can start developing our system call. Please note that it is this new Linux kernel that we will compile and boot on, and not the kernel already installed with our distribution!

If you have Secure Boot enabled in your BIOS and you are working on your native system, you will not be able to boot into the newly compiled kernel without generating a signature for it. I will not explain this step in this article since that is not the purpose here, you will find the needed information on the internet.

Develop the system call

Like any system call, it must have a specific purpose, take arguments and return something. We are not going to do something very complicated but still more interesting than a simple "hello world".

In short, our system call will take a process ID (PID) as a parameter and return a whole bunch of information about it.

Basically, a system call (like open, write, read…) is nothing but a function (or series of functions) in ring 0 (kernel-land). We use the term system call if this function can be called from the user-land.

We will therefore first develop a simple function (system call) in the kernel without worrying about the fact that it must be called from the user-land.

Files structure of the system call

At the root of our Linux kernel sources, we will create a new folder that will contain the sources of our system call, which we will call "infopid". We will directly create the infopid.c, infopid.h and Makefile files there, respectively the source code, header and compilation files.

$ tree infopid/
infopid/
├── Makefile
├── infopid.c
└── infopid.h
0 directories, 3 files

Writing the code of the system call

We will start by writing the infopid.h header file which will contain the structure containing the data that we will return to the user. Explanations are in the comments.

#ifndef INFOPID_H
#define INFOPID_H

#include <linux/sched.h>
#include <linux/limits.h>
#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/init.h>
#include <linux/fs_struct.h>
#include <linux/slab.h>

/*
 * pid: pid of process
 * name: name of the process
 * state: unrunnable, runnable, stopped
 * stack: pointer to the beginning of process's stack
 * age: birth time in nanoseconds
 * child: array of all child processes pid
 * ppid: parent process id
 * root: root path of process
 * pwd: working directory of process
 */

struct info_pid {
    pid_t pid;
    char name[TASK_COMM_LEN];
    long state;
    void *stack;
    uint64_t age;
    pid_t child[256];
    pid_t ppid;
    char root[PATH_MAX];
    char pwd[PATH_MAX];
};

#endif

infopid.h

Then we write the C code and the sys_infopid function which will fill this structure and return it to the user in user-land.

Attention, it is important here to use the SYSCALL_DEFINE2 macro to define our system call with two parameters. You will also notice the special layout of the arguments and their types, which are actually separated by commas.

#include "infopid.h"
#include <linux/syscalls.h>

SYSCALL_DEFINE2(infopid, struct info_pid *, ret_pid, int, pid) {
    struct task_struct *cur, *child;
    struct info_pid *new;
    struct path root, pwd;
    struct pid *spid;
    char *tmp, buffer[PATH_MAX] = {0};
    int i = 0;

    if (!(spid = find_get_pid(pid)))
    {
        return -ESRCH;
    }

    cur = pid_task(spid, PIDTYPE_PID);

    if (!cur) {
        return -ESRCH;
    }

    new = kmalloc(sizeof(struct info_pid), GFP_KERNEL);

    if (!new)
        return -ENOMEM;

    memset(new->child, 0, 256 * sizeof(pid_t));
    get_fs_root(cur->fs, &root);
    get_fs_pwd(cur->fs, &pwd);    
    get_task_comm(new->name, cur);
    new->pid = task_pid_nr(cur);
    new->state = cur->state;
    new->stack = cur->stack;
    new->age = cur->start_time;

    list_for_each_entry(child, &cur->children, sibling) {
        if (i > 255)
            goto out;
        new->child[i++] = child->pid;
    }

out:
    new->ppid = task_pid_nr(cur->parent);
    spin_lock(&root.dentry->d_lock);
    tmp = dentry_path_raw(root.dentry, buffer, PATH_MAX);
    strcpy(new->root, tmp);
    spin_unlock(&root.dentry->d_lock);
    
    spin_lock(&pwd.dentry->d_lock);
    tmp = dentry_path_raw(pwd.dentry, buffer, PATH_MAX);
    strcpy(new->pwd, tmp);
    spin_unlock(&pwd.dentry->d_lock);

    if (copy_to_user(ret_pid, new, sizeof(struct info_pid))) {
        kfree(new);
        return -ESRCH;
    }
    
    kfree(new);
    
    return 0;
}

infopid.c

System call export

However in order to test our system call we need to make some changes to the kernel itself. First we need to tell the kernel Makefile that our file exists and that it should compile it. To do this, we will modify the concatenation of the core-y variable in the kernel Makefile by adding our folder to it.

core-y += kernel/ certs/ mm/ fs/ ipc/ security/ crypto/ block/ infopid/

Makefile

Then we must modify the file include/linux/syscalls.h containing the prototypes of the kernel system calls, to add our function there (obviously respect your own path, don’t copy paste mine).

/* Other declarations */
#include "/usr/src/linux-source-5.4.0/linux-source-5.4.0/infopid/infopid.h"
asmlinkage long sys_infopid(struct info_pid *, int);

include/linux/syscalls.h

Finally, we are going to modify the arch/x86/entry/syscalls/syscall_64.tbl file in order to add our system call to it, taking care to place it in the last position with the correct index incremented by +1 compared to the other system calls by respecting the nomenclature.

# Index  Arch  Name     Entrypoint
  335    64    infopid  __x64_sys_infopid

arch/x86/entry/syscalls/syscall_64.tbl

Compile the system call

Now that we did develop our system call and export it as a kernel-level system call, we can start the full kernel compilation. We therefore return to the root of the kernel sources, and run the make command.

If you have several physical CPU cores on your machine or on your VM, do not hesitate to use the -j <number_of_cores> option in order to speed up the compilation since it can take quite a long time depending on the performance of your machine (especially if you are in a VM).

Once the compilation is complete, you will need to install the kernel because we are going to boot on it!

sudo make
sudo make modules_install
sudo make install

Now that the kernel has been compiled and installed, you need to make sure that you are going to boot into the new kernel on the next reboot.

In my case, the new kernel is version 5.4.174, which for grub is superior to the generic version 5.4.0-105 and will therefore automatically put it in first priority and will boot on it without any action from me. However if this is not your case you will need the grub menu to be able to choose the new version, and for that you must modify the /etc/default/grub file:

#GRUB_TIMEOUT_STYLE=hidden
GRUB_TIMEOUT=4

/etc/default/grub

Concretely, you must comment out the GRUB_TIMEOUT_STYLE variable and make sure that the GRUB_TIMEOUT is high enough (in seconds) to allow you to choose another entry in the grub menu. Don’t forget to run the sudo update-grub command after your changes.

I therefore suggest that you now restart the machine and test our system call!

Execution and testing

Immediately after the restart we check that we are indeed on the correct version of the kernel:

$ uname -r
5.4.174

Then we just have to write a little code in C which will call our system call by giving it a PID as an argument, and will display the returned information.

As we saw earlier, our system call takes two parameters: a structure that will be filled in and returned to user-land via the copy_to_user() function, and a PID of the process for which we are requesting the information.

I therefore propose the C code below which we will name test_infopid.c.

#include <stdio.h>
#include <sys/syscall.h>
#include <unistd.h>
#include <inttypes.h>
#include <stdlib.h>
#include <string.h>

#define TASK_COMM_LEN 16
#define PATH_MAX 4096

struct info_pid {
    pid_t pid;
    char name[TASK_COMM_LEN];
    long state;
    void *stack;
    uint64_t age;
    pid_t child[256];
    pid_t ppid;
    char root[PATH_MAX];
    char pwd[PATH_MAX];
};

void print_parents(pid_t pid)
{
    struct info_pid new;
    static int index = 0;
    printf("\tParent %d : %d\n", index++, pid);
	
    if (!pid)
        return ;
    int ret = syscall(335, &new, pid);
    if (ret) {
        printf("syscall failed...\n");
        perror("");
        exit(EXIT_FAILURE);
    }
    print_parents(new.ppid);
}

int main(int ac, char **av)
{
    pid_t pid;
    struct info_pid new;
    memset(&new, 0, sizeof(new));
    new.age = 0;

    if (ac == 1)
        pid = getpid();
    else
        pid = atoi(av[1]);
	
    int ret = syscall(335, &new, pid);
    if (ret) {
        printf("syscall failed...\n");
        perror("");
        return EXIT_FAILURE;
    }
	
    printf("Printing struct info_pid...\n");
        
    printf("PID       : %d\n", new.pid);
    printf("Name      : %s\n", new.name);
    printf("State     : %ld\n", new.state);
    printf("Stack     : %p\n", new.stack);
    printf("Birthtime : %ld\n", new.age);
        
    for (int j = 0; j < 255; j++)
    {
        if (!new.child[j])
            break ;
        printf("\tChild %d  : %d\n", j, new.child[j]);
    }
	
    print_parents(new.ppid);
        
    printf("Root      : %s\n", new.root);
    printf("PWD       : %s\n", new.pwd);
	
    return EXIT_SUCCESS;
}

test_infopid.c

You’ll notice that we use the syscall() function specifying the index of our system call, since we don’t have a wrapper in libc to call it in more conventional ways (such as open(), read(), etc.).

The code is relatively simple: we call our function for a given process id (PID), and we display a lot of information about this process, including child and parent processes.

$ gcc test_infopid.c -o test_infopid
$ ./test_infopid # With its own PID by default

Printing struct info_pid...
PID       : 1354
Name      : test_infopid
State     : 0
Stack     : 0xffffb76ac0750000
Birthtime : 1203877852032
	Parent 0 : 776
	Parent 1 : 775
	Parent 2 : 656
	Parent 3 : 551
	Parent 4 : 1
	Parent 5 : 0
Root      : /
PWD       : /home/ech0

$ ./test_infopid 1 # With PID 1

Printing struct info_pid...
PID       : 1
Name      : systemd
State     : 1
Stack     : 0xffffb76ac0010000
Birthtime : 15000000
	Child 0  : 290
	Child 1  : 317
	Child 2  : 500
	Child 3  : 509
	Child 4  : 511
	Child 5  : 523
	Child 6  : 526
	Child 7  : 527
	Child 8  : 533
	Child 9  : 535
	Child 10  : 538
	Child 11  : 541
	Child 12  : 543
	Child 13  : 544
	Child 14  : 551
	Child 15  : 570
	Child 16  : 573
	Child 17  : 579
	Child 18  : 659
	Parent 0 : 0
Root      : /
PWD       : /

Our system call seems to be working fine.

Conclusion

That’s it, it’s already finished, you now know how to develop and integrate a system call into the Linux kernel and how to call it.

I did not go into the details of the system call code itself, which is not very important in this article since it only serves as an example, it could have been any other code. The goal was mainly to show you how the integration of a system call is done in the kernel, and that you understand more generally what it is.

I hope this will have allowed you to understand certain things and do not hesitate to write me if you have any questions, whether in my explanations or in relation to the code.

I now advise you to read the development of a module, which is a good follow-up to this article.

Thank you for reading.