In this post we will we will see how to develop a system call by:
- Coding a function in C
- Compiling it with the Linux kernel
- Charging it in the syscalls table
- Calling it from user-land
In other terms: we will add our own system call to the Linux kernel.
1. Prerequisites
If you want to follow the development and test by yourself, there are a few prerequisites.
- Operating system running on a relatively recent Linux kernel.
- Usual development tools (gcc, make, …)
- A text editor (vim, VSCode, …)
2. Preparing the field
Before starting to develop our system call, we will have to prepare our environment a little. We will act differently depending on whether you are on a VM or your native system. I strongly recommend you to do these manipulations on a Linux VM.
2.1. On a Linux VM (recommended)
For this article, I downloaded an Ubuntu Server 20.04.4 ISO from the official website and installed it in a Virtual Box VM. You can choose the distribution, version or emulator/hypervisor you want, but you may have some differences with what I will show next.
I configured the VM with bridged network access, 4 GB of RAM, and I connect to it by SSH to work.
ech0@host$ ssh [email protected]
ech0@ubuntu-vm:~$
At this stage, I recommend that you turn off your virtual machine and take a snapshot, in case you break your bootloader during the manipulations, so that you don’t have to reinstall your VM. I’ll let you search on the internet how to do it, but it’s quite intuitive.
By default, Linux distributions do not ship kernel source code, only headers and object files. You must install the linux-source package from APT in order to have the current kernel sources. Then, you will have to unarchive the installed archive in order to have access to the directory containing the sources. Personally, I am on the Linux kernel 5.4.0 (you can check it with the command uname -r
).
sudo apt install linux-source
cd /usr/src/linux-source-5.4.0
sudo bunzip2 linux-source-5.4.0.tar.bz2
sudo tar xf linux-source-5.4.0.tar
cd linux-source-5.4.0/
You just need the kernel configuration to compile it, we will generate a default and minimalist one.
sudo make defconfig
2.2. On your native system (not recommended)
For security reasons we will not modify the Linux kernel on which our system is currently booted… It would be quite irresponsible to alter and overwrite stable code for this demo.
For this reason we will clone the official Linux kernel git directory and work on a “dirty” version.
git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
We will then switch to the latest stable version deployed with our distribution, i.e. 5.13 at the time of writing this article on Ubuntu focal. (check your version with the uname -r
command).
cd linux
git checkout v5.13
On this branch, we are going to prepare the kernel configuration and we have two options for that.
Option 1 (recommended): generation of a minimal configuration
As a first option, we will generate a minimal configuration. Indeed, we don’t really need to have a working operating system just to test our Linux system call. For example, we do not need KVM, touchpad, touchscreen, camera/microphone etc.
The advantage of this option is also to be able to compile the kernel relatively quickly and to have a relatively light image. I recommend this option for this exercise.
make defconfig
make menuconfig
and activate the option named “Crypt target support” (or search for DM_CRYPT to find it). This is necessary to be able to decrypt your disk after booting to the new kernel.Option 2: copy the current configuration
As a second option we will take the configuration of the kernel on which we are currently booted. The advantage of this option is that it will allow us to obtain a functional operating system with all its characteristics. But the compilation will be very long (> 1h30) and the result very heavy (> 10 GB).
If there are variables to specify, leave the default values (enter key).
make oldconfig
Finally, to avoid compilation errors, open the generated .config file with a text editor and assign the CONFIG_SYSTEM_TRUSTED_KEYS and CONFIG_SYSTEM_REVOCATION_KEYS variables to an empty string (“”), this way:
CONFIG_SYSTEM_TRUSTED_KEYS=""
CONFIG_SYSTEM_REVOCATION_KEYS=""
2.3. Conclusion of field preparation
And there you go! Our working environment is ready and we can start developing our system call. Please note that it is this new Linux kernel that we will compile and boot on, and not the kernel already installed with our distribution!
3. Develop the system call
Like any system call, it must have a specific purpose, take arguments and return something. We are not going to do something very complicated but still more interesting than a simple “hello world”.
In short, our system call will take a process ID (PID) as a parameter and return a whole bunch of information about it.
Basically, a system call (like open, write, read…) is nothing but a function (or series of functions) in ring 0 (kernel-land). We use the term system call if this function can be called from the user-land.
We will therefore first develop a simple function (system call) in the kernel without worrying about the fact that it must be called from the user-land.
3.1. Files structure of the system call
At the root of our Linux kernel sources, we will create a new folder that will contain the sources of our system call, which we will call “infopid”. We will directly create the infopid.c, infopid.h and Makefile files there, respectively the source code, header and compilation files.
$ tree infopid/
infopid/
├── Makefile
├── infopid.c
└── infopid.h
0 directories, 3 files
3.2. Writing the code of the system call
We will start by writing the infopid.h header file which will contain the structure containing the data that we will return to the user. Explanations are in the comments.
#ifndef INFOPID_H
#define INFOPID_H
#include <linux/sched.h>
#include <linux/limits.h>
#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/init.h>
#include <linux/fs_struct.h>
#include <linux/slab.h>
/*
* pid: pid of process
* name: name of the process
* state: unrunnable, runnable, stopped
* stack: pointer to the beginning of process's stack
* age: birth time in nanoseconds
* child: array of all child processes pid
* ppid: parent process id
* root: root path of process
* pwd: working directory of process
*/
struct info_pid {
pid_t pid;
char name[TASK_COMM_LEN];
long state;
void *stack;
uint64_t age;
pid_t child[256];
pid_t ppid;
char root[PATH_MAX];
char pwd[PATH_MAX];
};
#endif
Then we write the C code and the sys_infopid function which will fill this structure and return it to the user in user-land.
#include "infopid.h"
#include <linux/syscalls.h>
SYSCALL_DEFINE2(infopid, struct info_pid *, ret_pid, int, pid) {
struct task_struct *cur, *child;
struct info_pid *new;
struct path root, pwd;
struct pid *spid;
char *tmp, buffer[PATH_MAX] = {0};
int i = 0;
if (!(spid = find_get_pid(pid)))
{
return -ESRCH;
}
cur = pid_task(spid, PIDTYPE_PID);
if (!cur) {
return -ESRCH;
}
new = kmalloc(sizeof(struct info_pid), GFP_KERNEL);
if (!new)
return -ENOMEM;
memset(new->child, 0, 256 * sizeof(pid_t));
get_fs_root(cur->fs, &root);
get_fs_pwd(cur->fs, &pwd);
get_task_comm(new->name, cur);
new->pid = task_pid_nr(cur);
new->state = cur->state;
new->stack = cur->stack;
new->age = cur->start_time;
list_for_each_entry(child, &cur->children, sibling) {
if (i > 255)
goto out;
new->child[i++] = child->pid;
}
out:
new->ppid = task_pid_nr(cur->parent);
spin_lock(&root.dentry->d_lock);
tmp = dentry_path_raw(root.dentry, buffer, PATH_MAX);
strcpy(new->root, tmp);
spin_unlock(&root.dentry->d_lock);
spin_lock(&pwd.dentry->d_lock);
tmp = dentry_path_raw(pwd.dentry, buffer, PATH_MAX);
strcpy(new->pwd, tmp);
spin_unlock(&pwd.dentry->d_lock);
if (copy_to_user(ret_pid, new, sizeof(struct info_pid))) {
kfree(new);
return -ESRCH;
}
kfree(new);
return 0;
}
4. System call export
However in order to test our system call we need to make some changes to the kernel itself. First we need to tell the kernel Makefile that our file exists and that it should compile it. To do this, we will modify the concatenation of the core-y variable in the kernel Makefile by adding our folder to it.
core-y += kernel/ certs/ mm/ fs/ ipc/ security/ crypto/ block/ infopid/
Then we must modify the file include/linux/syscalls.h containing the prototypes of the kernel system calls, to add our function there (obviously respect your own path, don’t copy paste mine).
/* Other declarations */
#include "/usr/src/linux-source-5.4.0/linux-source-5.4.0/infopid/infopid.h"
asmlinkage long sys_infopid(struct info_pid *, int);
Finally, we are going to modify the arch/x86/entry/syscalls/syscall_64.tbl file in order to add our system call to it, taking care to place it in the last position with the correct index incremented by +1 compared to the other system calls by respecting the nomenclature.
# Index Arch Name Entrypoint
335 64 infopid __x64_sys_infopid
5. Compile the system call
Now that we did develop our system call and export it as a kernel-level system call, we can start the full kernel compilation. We therefore return to the root of the kernel sources, and run the make
command.
If you have several physical CPU cores on your machine or on your VM, do not hesitate to use the -j <number_of_cores> option in order to speed up the compilation since it can take quite a long time depending on the performance of your machine (especially if you are in a VM).
Once the compilation is complete, you will need to install the kernel because we are going to boot on it!
sudo make
sudo make modules_install
sudo make install
Now that the kernel has been compiled and installed, you need to make sure that you are going to boot into the new kernel on the next reboot.
In my case, the new kernel is version 5.4.174, which for grub is superior to the generic version 5.4.0-105 and will therefore automatically put it in first priority and will boot on it without any action from me. However if this is not your case you will need the grub menu to be able to choose the new version, and for that you must modify the /etc/default/grub file:
#GRUB_TIMEOUT_STYLE=hidden
GRUB_TIMEOUT=4
Concretely, you must comment out the GRUB_TIMEOUT_STYLE variable and make sure that the GRUB_TIMEOUT is high enough (in seconds) to allow you to choose another entry in the grub menu. Don’t forget to run the sudo update-grub
command after your changes.
I therefore suggest that you now restart the machine and test our system call!
6. Execution and testing
Immediately after the restart we check that we are indeed on the correct version of the kernel:
$ uname -r
5.4.174
Then we just have to write a little code in C which will call our system call by giving it a PID as an argument, and will display the returned information.
As we saw earlier, our system call takes two parameters: a structure that will be filled in and returned to user-land via the copy_to_user()
function, and a PID of the process for which we are requesting the information.
I therefore propose the C code below which we will name test_infopid.c.
#include <stdio.h>
#include <sys/syscall.h>
#include <unistd.h>
#include <inttypes.h>
#include <stdlib.h>
#include <string.h>
#define TASK_COMM_LEN 16
#define PATH_MAX 4096
struct info_pid {
pid_t pid;
char name[TASK_COMM_LEN];
long state;
void *stack;
uint64_t age;
pid_t child[256];
pid_t ppid;
char root[PATH_MAX];
char pwd[PATH_MAX];
};
void print_parents(pid_t pid)
{
struct info_pid new;
static int index = 0;
printf("\tParent %d : %d\n", index++, pid);
if (!pid)
return ;
int ret = syscall(335, &new, pid);
if (ret) {
printf("syscall failed...\n");
perror("");
exit(EXIT_FAILURE);
}
print_parents(new.ppid);
}
int main(int ac, char **av)
{
pid_t pid;
struct info_pid new;
memset(&new, 0, sizeof(new));
new.age = 0;
if (ac == 1)
pid = getpid();
else
pid = atoi(av[1]);
int ret = syscall(335, &new, pid);
if (ret) {
printf("syscall failed...\n");
perror("");
return EXIT_FAILURE;
}
printf("Printing struct info_pid...\n");
printf("PID : %d\n", new.pid);
printf("Name : %s\n", new.name);
printf("State : %ld\n", new.state);
printf("Stack : %p\n", new.stack);
printf("Birthtime : %ld\n", new.age);
for (int j = 0; j < 255; j++)
{
if (!new.child[j])
break ;
printf("\tChild %d : %d\n", j, new.child[j]);
}
print_parents(new.ppid);
printf("Root : %s\n", new.root);
printf("PWD : %s\n", new.pwd);
return EXIT_SUCCESS;
}
You’ll notice that we use the syscall()
function specifying the index of our system call, since we don’t have a wrapper in libc to call it in more conventional ways (such as open()
, read()
, etc.).
The code is relatively simple: we call our function for a given process id (PID), and we display a lot of information about this process, including child and parent processes.
gcc test_infopid.c -o test_infopid
./test_infopid # With its own PID by default
Printing struct info_pid...
PID : 1354
Name : test_infopid
State : 0
Stack : 0xffffb76ac0750000
Birthtime : 1203877852032
Parent 0 : 776
Parent 1 : 775
Parent 2 : 656
Parent 3 : 551
Parent 4 : 1
Parent 5 : 0
Root : /
PWD : /home/ech0
$ ./test_infopid 1 # With PID 1
Printing struct info_pid...
PID : 1
Name : systemd
State : 1
Stack : 0xffffb76ac0010000
Birthtime : 15000000
Child 0 : 290
Child 1 : 317
Child 2 : 500
Child 3 : 509
Child 4 : 511
Child 5 : 523
Child 6 : 526
Child 7 : 527
Child 8 : 533
Child 9 : 535
Child 10 : 538
Child 11 : 541
Child 12 : 543
Child 13 : 544
Child 14 : 551
Child 15 : 570
Child 16 : 573
Child 17 : 579
Child 18 : 659
Parent 0 : 0
Root : /
PWD : /
7. Conclusion
That’s it, it’s already finished, you now know how to develop and integrate a system call into the Linux kernel and how to call it.
I did not go into the details of the system call code itself, which is not very important in this article since it only serves as an example, it could have been any other code. The goal was mainly to show you how the integration of a system call is done in the kernel, and that you understand more generally what it is.
I hope this will have allowed you to understand certain things and do not hesitate to write me if you have any questions, whether in my explanations or in relation to the code.
I now advise you to read the development of a module, which is a good follow-up to this article.
Thank you for reading.