Blame - Documentation/credentials.txt - android_kernel_oneplus_msm8996

blob: df03169782ea1811cc46ac93a82fad09aaeec8c8 [file] [log] [blame]

David Howells	98870ab	2008-11-14 10:39:26 +1100	[diff] [blame]	1	====================
				2	CREDENTIALS IN LINUX
				3	====================
				4
				5	By: David Howells <dhowells@redhat.com>
				6
				7	Contents:
				8
				9	(*) Overview.
				10
				11	(*) Types of credentials.
				12
				13	(*) File markings.
				14
				15	(*) Task credentials.
				16
				17	- Immutable credentials.
				18	- Accessing task credentials.
				19	- Accessing another task's credentials.
				20	- Altering credentials.
				21	- Managing credentials.
				22
				23	(*) Open file credentials.
				24
				25	(*) Overriding the VFS's use of credentials.
				26
				27
				28	========
				29	OVERVIEW
				30	========
				31
				32	There are several parts to the security check performed by Linux when one
				33	object acts upon another:
				34
				35	(1) Objects.
				36
				37	Objects are things in the system that may be acted upon directly by
				38	userspace programs. Linux has a variety of actionable objects, including:
				39
				40	- Tasks
				41	- Files/inodes
				42	- Sockets
				43	- Message queues
				44	- Shared memory segments
				45	- Semaphores
				46	- Keys
				47
				48	As a part of the description of all these objects there is a set of
				49	credentials. What's in the set depends on the type of object.
				50
				51	(2) Object ownership.
				52
				53	Amongst the credentials of most objects, there will be a subset that
				54	indicates the ownership of that object. This is used for resource
				55	accounting and limitation (disk quotas and task rlimits for example).
				56
				57	In a standard UNIX filesystem, for instance, this will be defined by the
				58	UID marked on the inode.
				59
				60	(3) The objective context.
				61
				62	Also amongst the credentials of those objects, there will be a subset that
				63	indicates the 'objective context' of that object. This may or may not be
				64	the same set as in (2) - in standard UNIX files, for instance, this is the
				65	defined by the UID and the GID marked on the inode.
				66
				67	The objective context is used as part of the security calculation that is
				68	carried out when an object is acted upon.
				69
				70	(4) Subjects.
				71
				72	A subject is an object that is acting upon another object.
				73
				74	Most of the objects in the system are inactive: they don't act on other
				75	objects within the system. Processes/tasks are the obvious exception:
				76	they do stuff; they access and manipulate things.
				77
				78	Objects other than tasks may under some circumstances also be subjects.
				79	For instance an open file may send SIGIO to a task using the UID and EUID
				80	given to it by a task that called fcntl(F_SETOWN) upon it. In this case,
				81	the file struct will have a subjective context too.
				82
				83	(5) The subjective context.
				84
				85	A subject has an additional interpretation of its credentials. A subset
				86	of its credentials forms the 'subjective context'. The subjective context
				87	is used as part of the security calculation that is carried out when a
				88	subject acts.
				89
				90	A Linux task, for example, has the FSUID, FSGID and the supplementary
				91	group list for when it is acting upon a file - which are quite separate
				92	from the real UID and GID that normally form the objective context of the
				93	task.
				94
				95	(6) Actions.
				96
				97	Linux has a number of actions available that a subject may perform upon an
				98	object. The set of actions available depends on the nature of the subject
				99	and the object.
				100
				101	Actions include reading, writing, creating and deleting files; forking or
				102	signalling and tracing tasks.
				103
				104	(7) Rules, access control lists and security calculations.
				105
				106	When a subject acts upon an object, a security calculation is made. This
				107	involves taking the subjective context, the objective context and the
				108	action, and searching one or more sets of rules to see whether the subject
				109	is granted or denied permission to act in the desired manner on the
				110	object, given those contexts.
				111
				112	There are two main sources of rules:
				113
				114	(a) Discretionary access control (DAC):
				115
				116	Sometimes the object will include sets of rules as part of its
				117	description. This is an 'Access Control List' or 'ACL'. A Linux
				118	file may supply more than one ACL.
				119
				120	A traditional UNIX file, for example, includes a permissions mask that
				121	is an abbreviated ACL with three fixed classes of subject ('user',
				122	'group' and 'other'), each of which may be granted certain privileges
				123	('read', 'write' and 'execute' - whatever those map to for the object
				124	in question). UNIX file permissions do not allow the arbitrary
				125	specification of subjects, however, and so are of limited use.
				126
				127	A Linux file might also sport a POSIX ACL. This is a list of rules
				128	that grants various permissions to arbitrary subjects.
				129
				130	(b) Mandatory access control (MAC):
				131
				132	The system as a whole may have one or more sets of rules that get
				133	applied to all subjects and objects, regardless of their source.
				134	SELinux and Smack are examples of this.
				135
				136	In the case of SELinux and Smack, each object is given a label as part
				137	of its credentials. When an action is requested, they take the
				138	subject label, the object label and the action and look for a rule
				139	that says that this action is either granted or denied.
				140
				141
				142	====================
				143	TYPES OF CREDENTIALS
				144	====================
				145
				146	The Linux kernel supports the following types of credentials:
				147
				148	(1) Traditional UNIX credentials.
				149
				150	Real User ID
				151	Real Group ID
				152
				153	The UID and GID are carried by most, if not all, Linux objects, even if in
				154	some cases it has to be invented (FAT or CIFS files for example, which are
				155	derived from Windows). These (mostly) define the objective context of
				156	that object, with tasks being slightly different in some cases.
				157
				158	Effective, Saved and FS User ID
				159	Effective, Saved and FS Group ID
				160	Supplementary groups
				161
				162	These are additional credentials used by tasks only. Usually, an
				163	EUID/EGID/GROUPS will be used as the subjective context, and real UID/GID
				164	will be used as the objective. For tasks, it should be noted that this is
				165	not always true.
				166
				167	(2) Capabilities.
				168
				169	Set of permitted capabilities
				170	Set of inheritable capabilities
				171	Set of effective capabilities
				172	Capability bounding set
				173
				174	These are only carried by tasks. They indicate superior capabilities
				175	granted piecemeal to a task that an ordinary task wouldn't otherwise have.
				176	These are manipulated implicitly by changes to the traditional UNIX
				177	credentials, but can also be manipulated directly by the capset() system
				178	call.
				179
				180	The permitted capabilities are those caps that the process might grant
				181	itself to its effective or permitted sets through capset(). This
				182	inheritable set might also be so constrained.
				183
				184	The effective capabilities are the ones that a task is actually allowed to
				185	make use of itself.
				186
				187	The inheritable capabilities are the ones that may get passed across
				188	execve().
				189
				190	The bounding set limits the capabilities that may be inherited across
				191	execve(), especially when a binary is executed that will execute as UID 0.
				192
				193	(3) Secure management flags (securebits).
				194
				195	These are only carried by tasks. These govern the way the above
				196	credentials are manipulated and inherited over certain operations such as
				197	execve(). They aren't used directly as objective or subjective
				198	credentials.
				199
				200	(4) Keys and keyrings.
				201
				202	These are only carried by tasks. They carry and cache security tokens
				203	that don't fit into the other standard UNIX credentials. They are for
				204	making such things as network filesystem keys available to the file
				205	accesses performed by processes, without the necessity of ordinary
				206	programs having to know about security details involved.
				207
				208	Keyrings are a special type of key. They carry sets of other keys and can
				209	be searched for the desired key. Each process may subscribe to a number
				210	of keyrings:
				211
				212	Per-thread keying
				213	Per-process keyring
				214	Per-session keyring
				215
				216	When a process accesses a key, if not already present, it will normally be
				217	cached on one of these keyrings for future accesses to find.
				218
				219	For more information on using keys, see Documentation/keys.txt.
				220
				221	(5) LSM
				222
				223	The Linux Security Module allows extra controls to be placed over the
				224	operations that a task may do. Currently Linux supports two main
				225	alternate LSM options: SELinux and Smack.
				226
				227	Both work by labelling the objects in a system and then applying sets of
				228	rules (policies) that say what operations a task with one label may do to
				229	an object with another label.
				230
				231	(6) AF_KEY
				232
				233	This is a socket-based approach to credential management for networking
				234	stacks [RFC 2367]. It isn't discussed by this document as it doesn't
				235	interact directly with task and file credentials; rather it keeps system
				236	level credentials.
				237
				238
				239	When a file is opened, part of the opening task's subjective context is
				240	recorded in the file struct created. This allows operations using that file
				241	struct to use those credentials instead of the subjective context of the task
				242	that issued the operation. An example of this would be a file opened on a
				243	network filesystem where the credentials of the opened file should be presented
				244	to the server, regardless of who is actually doing a read or a write upon it.
				245
				246
				247	=============
				248	FILE MARKINGS
				249	=============
				250
				251	Files on disk or obtained over the network may have annotations that form the
				252	objective security context of that file. Depending on the type of filesystem,
				253	this may include one or more of the following:
				254
				255	(*) UNIX UID, GID, mode;
				256
				257	(*) Windows user ID;
				258
				259	(*) Access control list;
				260
				261	(*) LSM security label;
				262
				263	(*) UNIX exec privilege escalation bits (SUID/SGID);
				264
				265	(*) File capabilities exec privilege escalation bits.
				266
				267	These are compared to the task's subjective security context, and certain
				268	operations allowed or disallowed as a result. In the case of execve(), the
				269	privilege escalation bits come into play, and may allow the resulting process
				270	extra privileges, based on the annotations on the executable file.
				271
				272
				273	================
				274	TASK CREDENTIALS
				275	================
				276
				277	In Linux, all of a task's credentials are held in (uid, gid) or through
				278	(groups, keys, LSM security) a refcounted structure of type 'struct cred'.
				279	Each task points to its credentials by a pointer called 'cred' in its
				280	task_struct.
				281
				282	Once a set of credentials has been prepared and committed, it may not be
				283	changed, barring the following exceptions:
				284
				285	(1) its reference count may be changed;
				286
				287	(2) the reference count on the group_info struct it points to may be changed;
				288
				289	(3) the reference count on the security data it points to may be changed;
				290
				291	(4) the reference count on any keyrings it points to may be changed;
				292
				293	(5) any keyrings it points to may be revoked, expired or have their security
				294	attributes changed; and
				295
				296	(6) the contents of any keyrings to which it points may be changed (the whole
				297	point of keyrings being a shared set of credentials, modifiable by anyone
				298	with appropriate access).
				299
				300	To alter anything in the cred struct, the copy-and-replace principle must be
				301	adhered to. First take a copy, then alter the copy and then use RCU to change
				302	the task pointer to make it point to the new copy. There are wrappers to aid
				303	with this (see below).
				304
				305	A task may only alter its _own_ credentials; it is no longer permitted for a
				306	task to alter another's credentials. This means the capset() system call is no
				307	longer permitted to take any PID other than the one of the current process.
				308	Also keyctl_instantiate() and keyctl_negate() functions no longer permit
				309	attachment to process-specific keyrings in the requesting process as the
				310	instantiating process may need to create them.
				311
				312
				313	IMMUTABLE CREDENTIALS
				314	---------------------
				315
				316	Once a set of credentials has been made public (by calling commit_creds() for
				317	example), it must be considered immutable, barring two exceptions:
				318
				319	(1) The reference count may be altered.
				320
				321	(2) Whilst the keyring subscriptions of a set of credentials may not be
				322	changed, the keyrings subscribed to may have their contents altered.
				323
				324	To catch accidental credential alteration at compile time, struct task_struct
				325	has _const_ pointers to its credential sets, as does struct file. Furthermore,
				326	certain functions such as get_cred() and put_cred() operate on const pointers,
				327	thus rendering casts unnecessary, but require to temporarily ditch the const
				328	qualification to be able to alter the reference count.
				329
				330
				331	ACCESSING TASK CREDENTIALS
				332	--------------------------
				333
				334	A task being able to alter only its own credentials permits the current process
				335	to read or replace its own credentials without the need for any form of locking
				336	- which simplifies things greatly. It can just call:
				337
				338	const struct cred *current_cred()
				339
				340	to get a pointer to its credentials structure, and it doesn't have to release
				341	it afterwards.
				342
				343	There are convenience wrappers for retrieving specific aspects of a task's
				344	credentials (the value is simply returned in each case):
				345
				346	uid_t current_uid(void) Current's real UID
				347	gid_t current_gid(void) Current's real GID
				348	uid_t current_euid(void) Current's effective UID
				349	gid_t current_egid(void) Current's effective GID
				350	uid_t current_fsuid(void) Current's file access UID
				351	gid_t current_fsgid(void) Current's file access GID
				352	kernel_cap_t current_cap(void) Current's effective capabilities
				353	void *current_security(void) Current's LSM security pointer
				354	struct user_struct *current_user(void) Current's user account
				355
				356	There are also convenience wrappers for retrieving specific associated pairs of
				357	a task's credentials:
				358
				359	void current_uid_gid(uid_t , gid_t );
				360	void current_euid_egid(uid_t , gid_t );
				361	void current_fsuid_fsgid(uid_t , gid_t );
				362
				363	which return these pairs of values through their arguments after retrieving
				364	them from the current task's credentials.
				365
				366
				367	In addition, there is a function for obtaining a reference on the current
				368	process's current set of credentials:
				369
				370	const struct cred *get_current_cred(void);
				371
				372	and functions for getting references to one of the credentials that don't
				373	actually live in struct cred:
				374
				375	struct user_struct *get_current_user(void);
				376	struct group_info *get_current_groups(void);
				377
				378	which get references to the current process's user accounting structure and
				379	supplementary groups list respectively.
				380
				381	Once a reference has been obtained, it must be released with put_cred(),
				382	free_uid() or put_group_info() as appropriate.
				383
				384
				385	ACCESSING ANOTHER TASK'S CREDENTIALS
				386	------------------------------------
				387
				388	Whilst a task may access its own credentials without the need for locking, the
				389	same is not true of a task wanting to access another task's credentials. It
				390	must use the RCU read lock and rcu_dereference().
				391
				392	The rcu_dereference() is wrapped by:
				393
				394	const struct cred __task_cred(struct task_struct task);
				395
				396	This should be used inside the RCU read lock, as in the following example:
				397
				398	void foo(struct task_struct t, struct foo_data f)
				399	{
				400	const struct cred *tcred;
				401	...
				402	rcu_read_lock();
				403	tcred = __task_cred(t);
				404	f->uid = tcred->uid;
				405	f->gid = tcred->gid;
				406	f->groups = get_group_info(tcred->groups);
				407	rcu_read_unlock();
				408	...
				409	}
				410
				411	A function need not get RCU read lock to use __task_cred() if it is holding a
				412	spinlock at the time as this implicitly holds the RCU read lock.
				413
				414	Should it be necessary to hold another task's credentials for a long period of
				415	time, and possibly to sleep whilst doing so, then the caller should get a
				416	reference on them using:
				417
				418	const struct cred get_task_cred(struct task_struct task);
				419
				420	This does all the RCU magic inside of it. The caller must call put_cred() on
				421	the credentials so obtained when they're finished with.
				422
				423	There are a couple of convenience functions to access bits of another task's
				424	credentials, hiding the RCU magic from the caller:
				425
				426	uid_t task_uid(task) Task's real UID
				427	uid_t task_euid(task) Task's effective UID
				428
				429	If the caller is holding a spinlock or the RCU read lock at the time anyway,
				430	then:
				431
				432	__task_cred(task)->uid
				433	__task_cred(task)->euid
				434
				435	should be used instead. Similarly, if multiple aspects of a task's credentials
				436	need to be accessed, RCU read lock or a spinlock should be used, __task_cred()
				437	called, the result stored in a temporary pointer and then the credential
				438	aspects called from that before dropping the lock. This prevents the
				439	potentially expensive RCU magic from being invoked multiple times.
				440
				441	Should some other single aspect of another task's credentials need to be
				442	accessed, then this can be used:
				443
				444	task_cred_xxx(task, member)
				445
				446	where 'member' is a non-pointer member of the cred struct. For instance:
				447
				448	uid_t task_cred_xxx(task, suid);
				449
				450	will retrieve 'struct cred::suid' from the task, doing the appropriate RCU
				451	magic. This may not be used for pointer members as what they point to may
				452	disappear the moment the RCU read lock is dropped.
				453
				454
				455	ALTERING CREDENTIALS
				456	--------------------
				457
				458	As previously mentioned, a task may only alter its own credentials, and may not
				459	alter those of another task. This means that it doesn't need to use any
				460	locking to alter its own credentials.
				461
				462	To alter the current process's credentials, a function should first prepare a
				463	new set of credentials by calling:
				464
				465	struct cred *prepare_creds(void);
				466
				467	this locks current->cred_replace_mutex and then allocates and constructs a
				468	duplicate of the current process's credentials, returning with the mutex still
				469	held if successful. It returns NULL if not successful (out of memory).
				470
				471	The mutex prevents ptrace() from altering the ptrace state of a process whilst
				472	security checks on credentials construction and changing is taking place as
				473	the ptrace state may alter the outcome, particularly in the case of execve().
				474
				475	The new credentials set should be altered appropriately, and any security
				476	checks and hooks done. Both the current and the proposed sets of credentials
				477	are available for this purpose as current_cred() will return the current set
				478	still at this point.
				479
				480
				481	When the credential set is ready, it should be committed to the current process
				482	by calling:
				483
				484	int commit_creds(struct cred *new);
				485
				486	This will alter various aspects of the credentials and the process, giving the
				487	LSM a chance to do likewise, then it will use rcu_assign_pointer() to actually
				488	commit the new credentials to current->cred, it will release
				489	current->cred_replace_mutex to allow ptrace() to take place, and it will notify
				490	the scheduler and others of the changes.
				491
				492	This function is guaranteed to return 0, so that it can be tail-called at the
				493	end of such functions as sys_setresuid().
				494
				495	Note that this function consumes the caller's reference to the new credentials.
				496	The caller should _not_ call put_cred() on the new credentials afterwards.
				497
				498	Furthermore, once this function has been called on a new set of credentials,
				499	those credentials may _not_ be changed further.
				500
				501
				502	Should the security checks fail or some other error occur after prepare_creds()
				503	has been called, then the following function should be invoked:
				504
				505	void abort_creds(struct cred *new);
				506
				507	This releases the lock on current->cred_replace_mutex that prepare_creds() got
				508	and then releases the new credentials.
				509
				510
				511	A typical credentials alteration function would look something like this:
				512
				513	int alter_suid(uid_t suid)
				514	{
				515	struct cred *new;
				516	int ret;
				517
				518	new = prepare_creds();
				519	if (!new)
				520	return -ENOMEM;
				521
				522	new->suid = suid;
				523	ret = security_alter_suid(new);
				524	if (ret < 0) {
				525	abort_creds(new);
				526	return ret;
				527	}
				528
				529	return commit_creds(new);
				530	}
				531
				532
				533	MANAGING CREDENTIALS
				534	--------------------
				535
				536	There are some functions to help manage credentials:
				537
				538	() void put_cred(const struct cred cred);
				539
				540	This releases a reference to the given set of credentials. If the
				541	reference count reaches zero, the credentials will be scheduled for
				542	destruction by the RCU system.
				543
				544	() const struct cred get_cred(const struct cred *cred);
				545
				546	This gets a reference on a live set of credentials, returning a pointer to
				547	that set of credentials.
				548
				549	() struct cred get_new_cred(struct cred *cred);
				550
				551	This gets a reference on a set of credentials that is under construction
				552	and is thus still mutable, returning a pointer to that set of credentials.
				553
				554
				555	=====================
				556	OPEN FILE CREDENTIALS
				557	=====================
				558
				559	When a new file is opened, a reference is obtained on the opening task's
				560	credentials and this is attached to the file struct as 'f_cred' in place of
				561	'f_uid' and 'f_gid'. Code that used to access file->f_uid and file->f_gid
				562	should now access file->f_cred->fsuid and file->f_cred->fsgid.
				563
				564	It is safe to access f_cred without the use of RCU or locking because the
				565	pointer will not change over the lifetime of the file struct, and nor will the
				566	contents of the cred struct pointed to, barring the exceptions listed above
				567	(see the Task Credentials section).
				568
				569
				570	=======================================
				571	OVERRIDING THE VFS'S USE OF CREDENTIALS
				572	=======================================
				573
				574	Under some circumstances it is desirable to override the credentials used by
				575	the VFS, and that can be done by calling into such as vfs_mkdir() with a
				576	different set of credentials. This is done in the following places:
				577
				578	(*) sys_faccessat().
				579
				580	(*) do_coredump().
				581
				582	(*) nfs4recover.c.