CAPABILITIES(7) Linux Programmer s Manual CAPABILITIES(7)

NAME capabilities - overview of Linux capabilities

DESCRIPTION For the purpose of performing permission checks, traditional Unix implementations distinguish two categories of processes: privileged processes (whose effective user ID is 0, referred to as superuser or root), and unprivileged processes (whose effective UID is non-zero). Privileged processes bypass all kernel permission checks, while unpriv- ileged processes are subject to full permission checking based on the processs credentials (usually: effective UID, effective GID, and sup- plementary group list).

Starting with kernel 2.2, Linux divides the privileges traditionally associated with superuser into distinct units, known as capabilities, which can be independently enabled and disabled. Capabilities are a per-thread attribute.

Capabilities List As at Linux 2.6.14, the following capabilities are implemented:

CAP_AUDIT_CONTROL (since Linux 2.6.11) Enable and disable kernel auditing; change auditing filter rules; retrieve auditing status and filtering rules.

CAP_AUDIT_WRITE (since Linux 2.6.11) Allow records to be written to kernel auditing log.

CAP_CHOWN Allow arbitrary changes to file UIDs and GIDs (see chown(2)).

CAP_DAC_OVERRIDE Bypass file read, write, and execute permission checks. (DAC = "discretionary access control".)

CAP_DAC_READ_SEARCH Bypass file read permission checks and directory read and exe- cute permission checks.

CAP_FOWNER Bypass permission checks on operations that normally require the file system UID of the process to match the UID of the file (e.g., chmod(2), utime(2)), excluding those operations covered by the CAP_DAC_OVERRIDE and CAP_DAC_READ_SEARCH; set extended file attributes (see chattr(1)) on arbitrary files; set Access Control Lists (ACLs) on arbitrary files; ignore directory sticky bit on file deletion; specify O_NOATIME for arbitrary files in open(2) and fcntl(2).

CAP_FSETID Dont clear set-user-ID and set-group-ID bits when a file is modified; permit setting of the set-group-ID bit for a file whose GID does not match the file system or any of the supple- mentary GIDs of the calling process.

CAP_IPC_LOCK Permit memory locking (mlock(2), mlockall(2), mmap(2), shm- ctl(2)).

CAP_IPC_OWNER Bypass permission checks for operations on System V IPC objects.

CAP_KILL Bypass permission checks for sending signals (see kill(2)). This includes use of the KDSIGACCEPT ioctl.

CAP_LEASE (Linux 2.4 onwards) Allow file leases to be established on arbitrary files (see fcntl(2)).

CAP_LINUX_IMMUTABLE Allow setting of the EXT2_APPEND_FL and EXT2_IMMUTABLE_FL extended file attributes (see chattr(1)).

CAP_MKNOD (Linux 2.4 onwards) Allow creation of special files using mknod(2).

CAP_NET_ADMIN Allow various network-related operations (e.g., setting privi- leged socket options, enabling multicasting, interface configu- ration, modifying routing tables).

CAP_NET_BIND_SERVICE Allow binding to Internet domain reserved socket ports (port numbers less than 1024).

CAP_NET_BROADCAST (Unused) Allow socket broadcasting, and listening multicasts.

CAP_NET_RAW Permit use of RAW and PACKET sockets.

CAP_SETGID Allow arbitrary manipulations of process GIDs and supplementary GID list; allow forged GID when passing socket credentials via Unix domain sockets.

CAP_SETPCAP Grant or remove any capability in the callers permitted capa- bility set to or from any other process.

CAP_SETUID Allow arbitrary manipulations of process UIDs (setuid(2), setreuid(2), setresuid(2), setfsuid(2)); allow forged UID when passing socket credentials via Unix domain sockets.

CAP_SYS_ADMIN Permit a range of system administration operations including: quotactl(2), mount(2), umount(2), swapon(2), swapoff(2), sethostname(2), setdomainname(2), IPC_SET and IPC_RMID opera- tions on arbitrary System V IPC objects; perform operations on trusted and security Extended Attributes (see attr(5)); call lookup_dcookie(2); use ioprio_set(2) to assign IOPRIO_CLASS_RT and IOPRIO_CLASS_IDLE I/O scheduling classes; perform keyctl(2) KEYCTL_CHOWN and KEYCTL_SETPERM operations. allow forged UID when passing socket credentials; exceed /proc/sys/fs/file-max, the system-wide limit on the number of open files, in system calls that open files (e.g., accept(2), execve(2), open(2), pipe(2); without this capability these system calls will fail with the error ENFILE if this limit is encountered); employ CLONE_NEWNS flag with clone(2) and unshare(2); perform KEYCTL_CHOWN and KEYCTL_SETPERM keyctl(2) operations.

CAP_SYS_BOOT Permit calls to reboot(2) and kexec_load(2).

CAP_SYS_CHROOT Permit calls to chroot(2).

CAP_SYS_MODULE Allow loading and unloading of kernel modules; allow modifica- tions to capability bounding set (see init_module(2) and delete_module(2)).

CAP_SYS_NICE Allow raising process nice value (nice(2), setpriority(2)) and changing of the nice value for arbitrary processes; allow set- ting of real-time scheduling policies for calling process, and setting scheduling policies and priorities for arbitrary pro- cesses (sched_setscheduler(2), sched_setparam(2)); set CPU affinity for arbitrary processes (sched_setaffinity(2)); set I/O scheduling class and priority for arbitrary processes (ioprio_set(2)); allow migrate_pages(2) to be applied to arbi- trary processes and allow processes to be migrated to arbitrary nodes; allow move_pages(2) to be applied to arbitrary processes; use the MPOL_MF_MOVE_ALL flag with mbind(2) and move_pages(2).

CAP_SYS_PACCT Permit calls to acct(2).

CAP_SYS_PTRACE Allow arbitrary processes to be traced using ptrace(2)

CAP_SYS_RAWIO Permit I/O port operations (iopl(2) and ioperm(2)); access /proc/kcore.

CAP_SYS_RESOURCE Permit: use of reserved space on ext2 file systems; ioctl(2) calls controlling ext3 journaling; disk quota limits to be over- ridden; resource limits to be increased (see setrlimit(2)); RLIMIT_NPROC resource limit to be overridden; msg_qbytes limit for a message queue to be raised above the limit in /proc/sys/kernel/msgmnb (see msgop(2) and msgctl(2).

CAP_SYS_TIME Allow modification of system clock (settimeofday(2), stime(2), adjtimex(2)); allow modification of real-time (hardware) clock

CAP_SYS_TTY_CONFIG Permit calls to vhangup(2).

Capability Sets Each thread has three capability sets containing zero or more of the above capabilities:

Effective: the capabilities used by the kernel to perform permission checks for the thread.

Permitted: the capabilities that the thread may assume (i.e., a limiting superset for the effective and inheritable sets). If a thread drops a capability from its permitted set, it can never re- acquire that capability (unless it exec()s a set-user-ID-root program).

inheritable: the capabilities preserved across an execve(2).

A child created via fork(2) inherits copies of its parent s capability sets. See below for a discussion of the treatment of capabilities dur- ing exec().

Using capset(2), a thread may manipulate its own capability sets, or, if it has the CAP_SETPCAP capability, those of a thread in another pro- cess.

Capability bounding set When a program is execed, the permitted and effective capabilities are ANDed with the current value of the so-called capability bounding set, defined in the file /proc/sys/kernel/cap-bound. This parameter can be used to place a system-wide limit on the capabilities granted to all subsequently executed programs. (Confusingly, this bit mask parameter is expressed as a signed decimal number in /proc/sys/kernel/cap-bound.)

Only the init process may set bits in the capability bounding set; other than that, the superuser may only clear bits in this set.

On a standard system the capability bounding set always masks out the CAP_SETPCAP capability. To remove this restriction (dangerous!), mod- ify the definition of CAP_INIT_EFF_SET in include/linux/capability.h and rebuild the kernel.

The capability bounding set feature was added to Linux starting with kernel version 2.2.11.

Current and Future Implementation A full implementation of capabilities requires:

2. that the kernel provide system calls allowing a thread s capability sets to be changed and retrieved.

3. file system support for attaching capabilities to an executable file, so that a process gains those capabilities when the file is execed.

As at Linux 2.6.14, only the first two of these requirements are met.

Eventually, it should be possible to associate three capability sets with an executable file, which, in conjunction with the capability sets of the thread, will determine the capabilities of a thread after an exec():

Inheritable (formerly known as allowed): this set is ANDed with the threads inheritable set to determine which inheritable capabilities are permitted to the thread after the exec().

Permitted (formerly known as forced): the capabilities automatically permitted to the thread, regard- less of the threads inheritable capabilities.

Effective: those capabilities in the threads new permitted set are also to be set in the new effective set. (F(effective) would normally be either all zeroes or all ones.)

In the meantime, since the current implementation does not support file capability sets, during an exec():

1. All three file capability sets are initially assumed to be cleared.

2. If a set-user-ID-root program is being execed, or the real user ID of the process is 0 (root) then the file inheritable and permitted sets are defined to be all ones (i.e., all capabilities enabled).

3. If a set-user-ID-root program is being executed, then the file effective set is defined to be all ones.

Transformation of Capabilities During exec() During an exec(), the kernel calculates the new capabilities of the process using the following algorithm:

P(permitted) = (P(inheritable) & F(inheritable)) | (F(permitted) & cap_bset)

P(effective) = P(permitted) & F(effective)

P(inheritable) = P(inheritable) [i.e., unchanged]


P denotes the value of a thread capability set before the exec()

P denotes the value of a capability set after the exec()

F denotes a file capability set

cap_bset is the value of the capability bounding set.

In the current implementation, the upshot of this algorithm is that when a process exec()s a set-user-ID-root program, or when a process with an effective UID of 0 exec()s a program, it gains all capabilities in its permitted and effective capability sets, except those masked out by the capability bounding set (i.e., CAP_SETPCAP). This provides semantics that are the same as those provided by traditional Unix sys- tems.

Effect of User ID Changes on Capabilities To preserve the traditional semantics for transitions between 0 and non-zero user IDs, the kernel makes the following changes to a threads capability sets on changes to the threads real, effective, saved set, and file system user IDs (using setuid(2), setresuid(2), or similar):

1. If one or more of the real, effective or saved set user IDs was previously 0, and as a result of the UID changes all of these IDs have a non-zero value, then all capabilities are cleared from the permitted and effective capability sets.

2. If the effective user ID is changed from 0 to non-zero, then all capabilities are cleared from the effective set.

3. If the effective user ID is changed from non-zero to 0, then the permitted set is copied to the effective set.

4. If the file system user ID is changed from 0 to non-zero (see setf- suid(2)) then the following capabilities are cleared from the effective set: CAP_CHOWN, CAP_DAC_OVERRIDE, CAP_DAC_READ_SEARCH, CAP_FOWNER, and CAP_FSETID. If the file system UID is changed from non-zero to 0, then any of these capabilities that are enabled in the permitted set are enabled in the effective set.

If a thread that has a 0 value for one or more of its user IDs wants to prevent its permitted capability set being cleared when it resets all of its user IDs to non-zero values, it can do so using the prctl() PR_SET_KEEPCAPS operation.

NOTES The libcap package provides a suite of routines for setting and getting capabilities that is more comfortable and less likely to change than the interface provided by capset(2) and capget(2).

CONFORMING TO No standards govern capabilities, but the Linux capability implementa- tion is based on the withdrawn POSIX.1e draft standard.

BUGS There is as yet no file system support allowing capabilities to be associated with executable files.

SEE ALSO capget(2), prctl(2), setfsuid(2), pthreads(7)

Linux 2.6.18 2006-07-31 CAPABILITIES(7)