Level: Intermediate
M. Tim Jones ([email protected]), Consultant Engineer, Emulex
21 Mar 2007
Linux® system calls -- we use them every day. But do you know how a system call is performed from user-space to the kernel? Explore the Linux system call interface (SCI), learn how to add new system calls (and alternatives for doing so), and discover utilities related to the SCI.
A system call is an interface between a user-space application and a service that the kernel provides. Because the service is provided in the kernel, a direct call cannot be performed; instead, you must use a process of crossing the user-space/kernel boundary. The way you do this differs based on the particular architecture. For this reason, I'll stick to the most common architecture, i386.
In this article, I explore the Linux SCI, demonstrate adding a system call to the 2.6.20 kernel, and then use this function from user-space. I also investigate some of the functions that you'll find useful for system call development and alternatives to system calls. Finally, I look at some of the ancillary mechanisms related to system calls, such as tracing their usage from a given process.
The SCI
The implementation of system calls in Linux is varied based on the architecture, but it can also differ within a given architecture. For example, older x86 processors used an interrupt mechanism to migrate from user-space to kernel-space, but new IA-32 processors provide instructions that optimize this transition (using sysenter
and sysexit
instructions). Because so many options exist and the end-result is so complicated, I'll stick to a surface-level discussion of the interface details. See the Resources at the end of this article for the gory details.
You needn't fully understand the internals of the SCI to amend it, so I explore a simple version of the system call process (see Figure 1). Each system call is multiplexed into the kernel through a single entry point. The eax register is used to identify the particular system call that should be invoked, which is specified in the C
library (per the call from the user-space application). When the C
library has loaded the system call index and any arguments, a software interrupt is invoked (interrupt 0x80), which results in execution (through the interrupt handler) of the system_call
function. This function handles all system calls, as identified by the contents of eax. After a few simple tests, the actual system call is invoked using the system_call_table
and index contained in eax. Upon return from the system call, syscall_exit
is eventually reached, and a call to resume_userspace
transitions back to user-space. Execution resumes in the C
library, which then returns to the user application.
Figure 1. The simplified flow of a system call using the interrupt method
At the core of the SCI is the system call demultiplexing table. This table, shown in Figure 2, uses the index provided in eax to identify which system call to invoke from the table (sys_call_table
). A sample of the contents of this table and the locations of these entities is also shown. (For more about demultiplexing, see the sidebar, "System call demultiplexing.")
Figure 2. The system call table and various linkages共8页: 上一页 1 [2] [3] [4] [5] [6] [7] [8] 下一页