Blame - Documentation/development-process/4.Coding - android_kernel_htc_msm8960

blob: 2278693c8ffa4dc56baf265514efc798fbafd2ea [file] [log] [blame]

Jonathan Corbet	75b0214	2008-09-30 15:15:56 -0600	[diff] [blame]	1	4: GETTING THE CODE RIGHT
				2
				3	While there is much to be said for a solid and community-oriented design
				4	process, the proof of any kernel development project is in the resulting
				5	code. It is the code which will be examined by other developers and merged
				6	(or not) into the mainline tree. So it is the quality of this code which
				7	will determine the ultimate success of the project.
				8
				9	This section will examine the coding process. We'll start with a look at a
				10	number of ways in which kernel developers can go wrong. Then the focus
				11	will shift toward doing things right and the tools which can help in that
				12	quest.
				13
				14
				15	4.1: PITFALLS
				16
				17	* Coding style
				18
				19	The kernel has long had a standard coding style, described in
				20	Documentation/CodingStyle. For much of that time, the policies described
				21	in that file were taken as being, at most, advisory. As a result, there is
				22	a substantial amount of code in the kernel which does not meet the coding
				23	style guidelines. The presence of that code leads to two independent
				24	hazards for kernel developers.
				25
				26	The first of these is to believe that the kernel coding standards do not
				27	matter and are not enforced. The truth of the matter is that adding new
				28	code to the kernel is very difficult if that code is not coded according to
				29	the standard; many developers will request that the code be reformatted
				30	before they will even review it. A code base as large as the kernel
				31	requires some uniformity of code to make it possible for developers to
				32	quickly understand any part of it. So there is no longer room for
				33	strangely-formatted code.
				34
				35	Occasionally, the kernel's coding style will run into conflict with an
				36	employer's mandated style. In such cases, the kernel's style will have to
				37	win before the code can be merged. Putting code into the kernel means
				38	giving up a degree of control in a number of ways - including control over
				39	how the code is formatted.
				40
				41	The other trap is to assume that code which is already in the kernel is
				42	urgently in need of coding style fixes. Developers may start to generate
				43	reformatting patches as a way of gaining familiarity with the process, or
				44	as a way of getting their name into the kernel changelogs - or both. But
				45	pure coding style fixes are seen as noise by the development community;
				46	they tend to get a chilly reception. So this type of patch is best
				47	avoided. It is natural to fix the style of a piece of code while working
				48	on it for other reasons, but coding style changes should not be made for
				49	their own sake.
				50
				51	The coding style document also should not be read as an absolute law which
				52	can never be transgressed. If there is a good reason to go against the
				53	style (a line which becomes far less readable if split to fit within the
				54	80-column limit, for example), just do it.
				55
				56
				57	* Abstraction layers
				58
				59	Computer Science professors teach students to make extensive use of
				60	abstraction layers in the name of flexibility and information hiding.
				61	Certainly the kernel makes extensive use of abstraction; no project
				62	involving several million lines of code could do otherwise and survive.
				63	But experience has shown that excessive or premature abstraction can be
				64	just as harmful as premature optimization. Abstraction should be used to
				65	the level required and no further.
				66
				67	At a simple level, consider a function which has an argument which is
				68	always passed as zero by all callers. One could retain that argument just
				69	in case somebody eventually needs to use the extra flexibility that it
				70	provides. By that time, though, chances are good that the code which
				71	implements this extra argument has been broken in some subtle way which was
				72	never noticed - because it has never been used. Or, when the need for
				73	extra flexibility arises, it does not do so in a way which matches the
				74	programmer's early expectation. Kernel developers will routinely submit
				75	patches to remove unused arguments; they should, in general, not be added
				76	in the first place.
				77
				78	Abstraction layers which hide access to hardware - often to allow the bulk
				79	of a driver to be used with multiple operating systems - are especially
				80	frowned upon. Such layers obscure the code and may impose a performance
				81	penalty; they do not belong in the Linux kernel.
				82
				83	On the other hand, if you find yourself copying significant amounts of code
				84	from another kernel subsystem, it is time to ask whether it would, in fact,
				85	make sense to pull out some of that code into a separate library or to
				86	implement that functionality at a higher level. There is no value in
				87	replicating the same code throughout the kernel.
				88
				89
				90	* #ifdef and preprocessor use in general
				91
				92	The C preprocessor seems to present a powerful temptation to some C
				93	programmers, who see it as a way to efficiently encode a great deal of
				94	flexibility into a source file. But the preprocessor is not C, and heavy
				95	use of it results in code which is much harder for others to read and
				96	harder for the compiler to check for correctness. Heavy preprocessor use
				97	is almost always a sign of code which needs some cleanup work.
				98
				99	Conditional compilation with #ifdef is, indeed, a powerful feature, and it
				100	is used within the kernel. But there is little desire to see code which is
				101	sprinkled liberally with #ifdef blocks. As a general rule, #ifdef use
				102	should be confined to header files whenever possible.
				103	Conditionally-compiled code can be confined to functions which, if the code
				104	is not to be present, simply become empty. The compiler will then quietly
				105	optimize out the call to the empty function. The result is far cleaner
				106	code which is easier to follow.
				107
				108	C preprocessor macros present a number of hazards, including possible
				109	multiple evaluation of expressions with side effects and no type safety.
				110	If you are tempted to define a macro, consider creating an inline function
				111	instead. The code which results will be the same, but inline functions are
				112	easier to read, do not evaluate their arguments multiple times, and allow
				113	the compiler to perform type checking on the arguments and return value.
				114
				115
				116	* Inline functions
				117
				118	Inline functions present a hazard of their own, though. Programmers can
				119	become enamored of the perceived efficiency inherent in avoiding a function
				120	call and fill a source file with inline functions. Those functions,
				121	however, can actually reduce performance. Since their code is replicated
				122	at each call site, they end up bloating the size of the compiled kernel.
				123	That, in turn, creates pressure on the processor's memory caches, which can
				124	slow execution dramatically. Inline functions, as a rule, should be quite
				125	small and relatively rare. The cost of a function call, after all, is not
				126	that high; the creation of large numbers of inline functions is a classic
				127	example of premature optimization.
				128
				129	In general, kernel programmers ignore cache effects at their peril. The
				130	classic time/space tradeoff taught in beginning data structures classes
				131	often does not apply to contemporary hardware. Space is time, in that a
				132	larger program will run slower than one which is more compact.
				133
				134
				135	* Locking
				136
				137	In May, 2006, the "Devicescape" networking stack was, with great
				138	fanfare, released under the GPL and made available for inclusion in the
				139	mainline kernel. This donation was welcome news; support for wireless
				140	networking in Linux was considered substandard at best, and the Devicescape
				141	stack offered the promise of fixing that situation. Yet, this code did not
				142	actually make it into the mainline until June, 2007 (2.6.22). What
				143	happened?
				144
				145	This code showed a number of signs of having been developed behind
				146	corporate doors. But one large problem in particular was that it was not
				147	designed to work on multiprocessor systems. Before this networking stack
				148	(now called mac80211) could be merged, a locking scheme needed to be
				149	retrofitted onto it.
				150
				151	Once upon a time, Linux kernel code could be developed without thinking
				152	about the concurrency issues presented by multiprocessor systems. Now,
				153	however, this document is being written on a dual-core laptop. Even on
				154	single-processor systems, work being done to improve responsiveness will
				155	raise the level of concurrency within the kernel. The days when kernel
				156	code could be written without thinking about locking are long past.
				157
				158	Any resource (data structures, hardware registers, etc.) which could be
				159	accessed concurrently by more than one thread must be protected by a lock.
				160	New code should be written with this requirement in mind; retrofitting
				161	locking after the fact is a rather more difficult task. Kernel developers
				162	should take the time to understand the available locking primitives well
				163	enough to pick the right tool for the job. Code which shows a lack of
				164	attention to concurrency will have a difficult path into the mainline.
				165
				166
				167	* Regressions
				168
				169	One final hazard worth mentioning is this: it can be tempting to make a
				170	change (which may bring big improvements) which causes something to break
				171	for existing users. This kind of change is called a "regression," and
				172	regressions have become most unwelcome in the mainline kernel. With few
				173	exceptions, changes which cause regressions will be backed out if the
				174	regression cannot be fixed in a timely manner. Far better to avoid the
				175	regression in the first place.
				176
				177	It is often argued that a regression can be justified if it causes things
				178	to work for more people than it creates problems for. Why not make a
				179	change if it brings new functionality to ten systems for each one it
				180	breaks? The best answer to this question was expressed by Linus in July,
				181	2007:
				182
				183	So we don't fix bugs by introducing new problems. That way lies
				184	madness, and nobody ever knows if you actually make any real
				185	progress at all. Is it two steps forwards, one step back, or one
				186	step forward and two steps back?
				187
				188	(http://lwn.net/Articles/243460/).
				189
				190	An especially unwelcome type of regression is any sort of change to the
				191	user-space ABI. Once an interface has been exported to user space, it must
				192	be supported indefinitely. This fact makes the creation of user-space
				193	interfaces particularly challenging: since they cannot be changed in
				194	incompatible ways, they must be done right the first time. For this
				195	reason, a great deal of thought, clear documentation, and wide review for
				196	user-space interfaces is always required.
				197
				198
				199
				200	4.2: CODE CHECKING TOOLS
				201
				202	For now, at least, the writing of error-free code remains an ideal that few
				203	of us can reach. What we can hope to do, though, is to catch and fix as
				204	many of those errors as possible before our code goes into the mainline
				205	kernel. To that end, the kernel developers have put together an impressive
				206	array of tools which can catch a wide variety of obscure problems in an
				207	automated way. Any problem caught by the computer is a problem which will
				208	not afflict a user later on, so it stands to reason that the automated
				209	tools should be used whenever possible.
				210
				211	The first step is simply to heed the warnings produced by the compiler.
				212	Contemporary versions of gcc can detect (and warn about) a large number of
				213	potential errors. Quite often, these warnings point to real problems.
				214	Code submitted for review should, as a rule, not produce any compiler
				215	warnings. When silencing warnings, take care to understand the real cause
				216	and try to avoid "fixes" which make the warning go away without addressing
				217	its cause.
				218
				219	Note that not all compiler warnings are enabled by default. Build the
				220	kernel with "make EXTRA_CFLAGS=-W" to get the full set.
				221
				222	The kernel provides several configuration options which turn on debugging
				223	features; most of these are found in the "kernel hacking" submenu. Several
				224	of these options should be turned on for any kernel used for development or
				225	testing purposes. In particular, you should turn on:
				226
				227	- ENABLE_WARN_DEPRECATED, ENABLE_MUST_CHECK, and FRAME_WARN to get an
				228	extra set of warnings for problems like the use of deprecated interfaces
				229	or ignoring an important return value from a function. The output
				230	generated by these warnings can be verbose, but one need not worry about
				231	warnings from other parts of the kernel.
				232
				233	- DEBUG_OBJECTS will add code to track the lifetime of various objects
				234	created by the kernel and warn when things are done out of order. If
				235	you are adding a subsystem which creates (and exports) complex objects
				236	of its own, consider adding support for the object debugging
				237	infrastructure.
				238
				239	- DEBUG_SLAB can find a variety of memory allocation and use errors; it
				240	should be used on most development kernels.
				241
				242	- DEBUG_SPINLOCK, DEBUG_SPINLOCK_SLEEP, and DEBUG_MUTEXES will find a
				243	number of common locking errors.
				244
				245	There are quite a few other debugging options, some of which will be
				246	discussed below. Some of them have a significant performance impact and
				247	should not be used all of the time. But some time spent learning the
				248	available options will likely be paid back many times over in short order.
				249
				250	One of the heavier debugging tools is the locking checker, or "lockdep."
				251	This tool will track the acquisition and release of every lock (spinlock or
				252	mutex) in the system, the order in which locks are acquired relative to
				253	each other, the current interrupt environment, and more. It can then
				254	ensure that locks are always acquired in the same order, that the same
				255	interrupt assumptions apply in all situations, and so on. In other words,
				256	lockdep can find a number of scenarios in which the system could, on rare
				257	occasion, deadlock. This kind of problem can be painful (for both
				258	developers and users) in a deployed system; lockdep allows them to be found
				259	in an automated manner ahead of time. Code with any sort of non-trivial
				260	locking should be run with lockdep enabled before being submitted for
				261	inclusion.
				262
				263	As a diligent kernel programmer, you will, beyond doubt, check the return
				264	status of any operation (such as a memory allocation) which can fail. The
				265	fact of the matter, though, is that the resulting failure recovery paths
				266	are, probably, completely untested. Untested code tends to be broken code;
				267	you could be much more confident of your code if all those error-handling
				268	paths had been exercised a few times.
				269
				270	The kernel provides a fault injection framework which can do exactly that,
				271	especially where memory allocations are involved. With fault injection
				272	enabled, a configurable percentage of memory allocations will be made to
				273	fail; these failures can be restricted to a specific range of code.
				274	Running with fault injection enabled allows the programmer to see how the
				275	code responds when things go badly. See
				276	Documentation/fault-injection/fault-injection.text for more information on
				277	how to use this facility.
				278
				279	Other kinds of errors can be found with the "sparse" static analysis tool.
				280	With sparse, the programmer can be warned about confusion between
				281	user-space and kernel-space addresses, mixture of big-endian and
				282	small-endian quantities, the passing of integer values where a set of bit
				283	flags is expected, and so on. Sparse must be installed separately (it can
Justin P. Mattock	0ea6e61	2010-07-23 20:51:24 -0700	[diff] [blame]	284	be found at https://sparse.wiki.kernel.org/index.php/Main_Page if your
Jonathan Corbet	75b0214	2008-09-30 15:15:56 -0600	[diff] [blame]	285	distributor does not package it); it can then be run on the code by adding
				286	"C=1" to your make command.
				287
				288	Other kinds of portability errors are best found by compiling your code for
				289	other architectures. If you do not happen to have an S/390 system or a
				290	Blackfin development board handy, you can still perform the compilation
				291	step. A large set of cross compilers for x86 systems can be found at
				292
				293	http://www.kernel.org/pub/tools/crosstool/
				294
				295	Some time spent installing and using these compilers will help avoid
				296	embarrassment later.
				297
				298
				299	4.3: DOCUMENTATION
				300
				301	Documentation has often been more the exception than the rule with kernel
				302	development. Even so, adequate documentation will help to ease the merging
				303	of new code into the kernel, make life easier for other developers, and
				304	will be helpful for your users. In many cases, the addition of
				305	documentation has become essentially mandatory.
				306
				307	The first piece of documentation for any patch is its associated
				308	changelog. Log entries should describe the problem being solved, the form
				309	of the solution, the people who worked on the patch, any relevant
				310	effects on performance, and anything else that might be needed to
				311	understand the patch.
				312
				313	Any code which adds a new user-space interface - including new sysfs or
				314	/proc files - should include documentation of that interface which enables
				315	user-space developers to know what they are working with. See
				316	Documentation/ABI/README for a description of how this documentation should
				317	be formatted and what information needs to be provided.
				318
				319	The file Documentation/kernel-parameters.txt describes all of the kernel's
				320	boot-time parameters. Any patch which adds new parameters should add the
				321	appropriate entries to this file.
				322
				323	Any new configuration options must be accompanied by help text which
				324	clearly explains the options and when the user might want to select them.
				325
				326	Internal API information for many subsystems is documented by way of
				327	specially-formatted comments; these comments can be extracted and formatted
				328	in a number of ways by the "kernel-doc" script. If you are working within
				329	a subsystem which has kerneldoc comments, you should maintain them and add
				330	them, as appropriate, for externally-available functions. Even in areas
				331	which have not been so documented, there is no harm in adding kerneldoc
				332	comments for the future; indeed, this can be a useful activity for
				333	beginning kernel developers. The format of these comments, along with some
				334	information on how to create kerneldoc templates can be found in the file
				335	Documentation/kernel-doc-nano-HOWTO.txt.
				336
				337	Anybody who reads through a significant amount of existing kernel code will
				338	note that, often, comments are most notable by their absence. Once again,
				339	the expectations for new code are higher than they were in the past;
				340	merging uncommented code will be harder. That said, there is little desire
				341	for verbosely-commented code. The code should, itself, be readable, with
				342	comments explaining the more subtle aspects.
				343
				344	Certain things should always be commented. Uses of memory barriers should
				345	be accompanied by a line explaining why the barrier is necessary. The
				346	locking rules for data structures generally need to be explained somewhere.
				347	Major data structures need comprehensive documentation in general.
				348	Non-obvious dependencies between separate bits of code should be pointed
				349	out. Anything which might tempt a code janitor to make an incorrect
				350	"cleanup" needs a comment saying why it is done the way it is. And so on.
				351
				352
				353	4.4: INTERNAL API CHANGES
				354
				355	The binary interface provided by the kernel to user space cannot be broken
				356	except under the most severe circumstances. The kernel's internal
				357	programming interfaces, instead, are highly fluid and can be changed when
				358	the need arises. If you find yourself having to work around a kernel API,
				359	or simply not using a specific functionality because it does not meet your
				360	needs, that may be a sign that the API needs to change. As a kernel
				361	developer, you are empowered to make such changes.
				362
				363	There are, of course, some catches. API changes can be made, but they need
				364	to be well justified. So any patch making an internal API change should be
				365	accompanied by a description of what the change is and why it is
				366	necessary. This kind of change should also be broken out into a separate
				367	patch, rather than buried within a larger patch.
				368
				369	The other catch is that a developer who changes an internal API is
				370	generally charged with the task of fixing any code within the kernel tree
				371	which is broken by the change. For a widely-used function, this duty can
				372	lead to literally hundreds or thousands of changes - many of which are
				373	likely to conflict with work being done by other developers. Needless to
				374	say, this can be a large job, so it is best to be sure that the
				375	justification is solid.
				376
				377	When making an incompatible API change, one should, whenever possible,
Jonathan Corbet	d5b5243	2009-01-08 16:32:13 -0700	[diff] [blame]	378	ensure that code which has not been updated is caught by the compiler.
Jonathan Corbet	75b0214	2008-09-30 15:15:56 -0600	[diff] [blame]	379	This will help you to be sure that you have found all in-tree uses of that
				380	interface. It will also alert developers of out-of-tree code that there is
				381	a change that they need to respond to. Supporting out-of-tree code is not
				382	something that kernel developers need to be worried about, but we also do
Jonathan Corbet	d5b5243	2009-01-08 16:32:13 -0700	[diff] [blame]	383	not have to make life harder for out-of-tree developers than it needs to
				384	be.