All about linux including: introduction, booting to the Linux kernel, user space applications, kernel insides and kernel driver development. (not done yet, continue updating)
Kernel User Space Applications System administration Package management Graphical user interface Networking GNU Software Misc apps Console improvements Kernel Insides (covered in the future) Intro Kernel sources Compiling kernel Kernel data structure Scheduler MM Virtual file system Networking Processes, scheduling and interrupts Kernel Driver Development (covered in the future) Intro kernel module useful general-purpose kernel APIs Linux device and driver model kernel framework for device driver character drivers subsystem Recommended Readings Books Slides 2 / 122
Kernel User Space Applications System administration Package management Graphical user interface Networking GNU Software Misc apps Console improvements Kernel Insides (covered in the future) Intro Kernel sources Compiling kernel Kernel data structure Scheduler MM Virtual file system Networking Processes, scheduling and interrupts Kernel Driver Development (covered in the future) Intro kernel module useful general-purpose kernel APIs Linux device and driver model kernel framework for device driver character drivers subsystem Recommended Readings Books Slides 3 / 122
I'm doing a (free) operating system (just a hobby, won't be big and professional 4 like gnu) for 386(486) AT clones. This has been brewing since april, and is 5 starting to get ready. I'd like any feedback on things people like/dislike in 6 minix, as my OS resembles it somewhat (same physical layout of the file-system 7 (due to practical reasons) among other things). 8 9 I've currently ported bash(1.08) and gcc(1.40), and things seem to work. This 10 implies that I'll get something practical within a few months, and I'd like to 11 know what features most people would want. Any suggestions are welcome, but I 12 won't promise I'll implement them :-) 13 Linus ([email protected]) 14 15 PS. Yes - it's free of any minix code, and it has a multi-threaded fs. It is NOT 16 portable (uses 386 task switching etc), and it probably never will support 17 anything other than AT-harddisks, as that's all I have :-(. 18 19 — Linus Torvalds ▶ a collection of various artifacts from the period in which Linux first began to take shape ▶ http://www.cs.cmu.edu/~awb/linux.history.html 13 / 122
Kernel User Space Applications System administration Package management Graphical user interface Networking GNU Software Misc apps Console improvements Kernel Insides (covered in the future) Intro Kernel sources Compiling kernel Kernel data structure Scheduler MM Virtual file system Networking Processes, scheduling and interrupts Kernel Driver Development (covered in the future) Intro kernel module useful general-purpose kernel APIs Linux device and driver model kernel framework for device driver character drivers subsystem Recommended Readings Books Slides 18 / 122
magic power button ▶ motherboard sends a signal to the power supply ▶ power good signal ▶ predefined data in CPU registers after the computer resets (80386 and later CPUs) 1 IP 0xfff0 2 CS selector 0xf000 3 CS base 0xffff0000 Definition (IP, CS) ▶ IP: Instruction Pointer ▶ CS: Code Segment 19 / 122
▶ Reset vector ▶ CPU expects to find the first instruction to execute after reset ▶ jump instruction which usually points to the BIOS entry point 20 / 122
controlling which devices the BIOS attempts to boot from ▶ hard drive: boot sector ▶ MBR partition layout ▶ first 446 bytes of the first sector (which is 512 bytes) ▶ last two bytes of the first sector are 0x55 and 0xaa: representing the BIOS that this device is bootable 26 / 122
for ELF 2 extern main 3 4 ; define magic number + checksum + flags should equal 0 5 KERNEL_STACK_SIZE equ 4096 ; size of stack in bytes 6 7 section .text: ; start of the text (code) section 8 align 4 ; the code must be 4 byte aligned 9 dd MAGIC_NUMBER ; write the magic number to the machine code, 10 dd FLAGS ; the flags, 11 dd CHECKSUM ; and the checksum 12 13 loader: ; the loader label 14 mov eax, 0xCAFEBABE ; place the number 0xCAFEBABE in the register eax 15 mov esp, kernel_stack + KERNEL_STACK_SIZE 16 push dword 2 17 push dword 1 18 call main ; start_kernel() for the Linux kernel 19 .loop: 20 jmp .loop ; loop forever 21 22 section .bss 23 align 4 24 kernel_stack: 25 resb KERNEL_STACK_SIZE 32 / 122
protected mode is, ▶ some preparation for the transition into it, ▶ the heap and console initialization, ▶ memory detection, cpu validation, keyboard initialization ▶ and much much more. 34 / 122
first added to the x86 architecture in 1982 and was the main mode of Intel processors from the 80286 processor until Intel 64 and long mode came. ▶ real mode ▶ limited access to the RAM 220 bytes (1 megabyte): 20-bit address bus ▶ memory management is limited ▶ protected mode ▶ 4 gigabytes: 32-bit address bus ▶ paging and segmentation 35 / 122
Linux kernel x86 boot executable bzImage, version 3 4.3.3-2-ARCH (builduser@tobias) #1 SMP PREEMPT Wed Dec 23 20:09, RO-rootFS, 4 swap_dev 0x4, Normal VGA ▶ compile kernel and then compress with bzip ▶ decompression into memory 38 / 122
Kernel User Space Applications System administration Package management Graphical user interface Networking GNU Software Misc apps Console improvements Kernel Insides (covered in the future) Intro Kernel sources Compiling kernel Kernel data structure Scheduler MM Virtual file system Networking Processes, scheduling and interrupts Kernel Driver Development (covered in the future) Intro kernel module useful general-purpose kernel APIs Linux device and driver model kernel framework for device driver character drivers subsystem Recommended Readings Books Slides 39 / 122
boot ▶ daemon process that continues running until the system is shut down ▶ typically assigned process ID: 1 ▶ init (integrated): systemd ▶ traditional inits (script): SysVinit ▶ service managers: OpenRC 40 / 122
OpenSSH Daemon 3 Loaded: loaded (/usr/lib/systemd/system/sshd.service; enabled; vendor preset: disabled) → 4 Active: active (running) since Tue 2016-01-19 09:50:56 HKT; 1 day 9h ago 5 Main PID: 479 (sshd) 6 CGroup: /system.slice/sshd.service 7 └─479 /usr/bin/sshd -D 8 9 Jan 19 09:50:56 pc89160 systemd[1]: Started OpenSSH Daemon. 10 Jan 19 09:50:56 pc89160 sshd[479]: Server listening on 0.0.0.0 port 22. 11 Jan 19 09:50:56 pc89160 sshd[479]: Server listening on :: port 22. 42 / 122
the basic framework for a GUI environment: drawing and moving windows and interacting with a mouse and keyboard ▶ X originated at the MIT in 1984 ▶ X.Org foundation implementation: Xorg 45 / 122
a graphical environment ▶ DE contains additional components that may be considered necessary for a compete user experience ▶ window manager ▶ panel ▶ file manager ▶ terminal emulator ▶ text editor ▶ icons ▶ unilities ▶ GNOME, KDE, LXDE, Xfce, Cinnamon, Unity 46 / 122
of system resources ▶ simplify DE to a window manager ▶ standalone, complete freedom over the choice of the other applications ▶ desktop icons, fonts, toolbars, desktop widgets ▶ types ▶ stacking (aka floating) ▶ tilling: ”tile” the windows so that none are overlapping ▶ dynamic ▶ unixporn: http://www.reddit.com/r/unixporn/ 52 / 122
software —that is, it respects users’ freedom. The development of GNU made it possible to use a computer without software that would trample your freedom. ▶ GNU is a Unix-like operating system. That means it is a collection of many programs: applications, libraries, developer tools, even games. The development of GNU, started in January 1984, is known as the GNU Project. Many of the programs in GNU are released under the auspices of the GNU Project; those we call GNU packages. 57 / 122
The GNU C Compiler ▶ GDB: The GNU Debugger ▶ Coreutils: a set of basic UNIX-style utilities, such as ls, cat and chmod ▶ Findutils: to search and find files ▶ Fontutils: to convert fonts from one format to another or make new fonts ▶ The Gimp: GNU Image Manipulation Program ▶ Gnome: the GNU desktop environment ▶ Emacs: a very powerful editor ▶ Ghostscript and Ghostview: interpreter and graphical frontend for PostScript files. ▶ GNU Photo: software for interaction with digital cameras ▶ Octave: a programming language, primarily intended to perform numerical computations and image processing. ▶ GNU SQL: relational database system ▶ Radius: a remote authentication and accounting server 58 / 122
sharing command history among all running shells ▶ editing multi-line commands ▶ spelling correction ▶ themeable prompts ▶ extended file globbing ▶ autosuggestions ▶ autojump ▶ story about the name 60 / 122
Kernel User Space Applications System administration Package management Graphical user interface Networking GNU Software Misc apps Console improvements Kernel Insides (covered in the future) Intro Kernel sources Compiling kernel Kernel data structure Scheduler MM Virtual file system Networking Processes, scheduling and interrupts Kernel Driver Development (covered in the future) Intro kernel module useful general-purpose kernel APIs Linux device and driver model kernel framework for device driver character drivers subsystem Recommended Readings Books Slides 64 / 122
system, which also requires libraries and applications to provide features to end users. ▶ The Linux kernel was created as a hobby in 1991 by a Finnish student, Linus Torvalds. ▶ Linux quickly started to be used as the kernel for free software operating systems ▶ Linus Torvalds has been able to create a large and dynamic developer and user community around Linux. ▶ Nowadays, more than one thousand people contribute to each kernel release, individuals or companies big and small. 66 / 122
on most architectures. ▶ Scalability. Can run on super computers as well as on tiny devices (4 MB of RAM is enough). ▶ Compliance to standards and interoperability. ▶ Exhaustive networking support. ▶ Security. It can’t hide its flaws. Its code is reviewed by many experts. ▶ Stability and reliability. ▶ Modularity. Can include only what a system needs even at run time. ▶ Easy to program. You can learn from existing code. Many useful resources on the net. 67 / 122
CPU, memory, I/O. ▶ Provide a set of portable, architecture and hardware independent APIs to allow user space applications and libraries to use the hardware resources. ▶ Handle concurrent accesses and usage of hardware resources from different applications. ▶ Example: a single network interface is used by multiple user space applications through various network connections. The kernel is responsible to ”multiplex” the hardware resource. 69 / 122
user space is the set of system calls ▶ About 300 system calls that provide the main kernel services ▶ File and device operations, networking operations, inter-process communication, process management, memory mapping, timers, threads, synchronization primitives, etc. ▶ This interface is stable over time: only new system calls can be added by the kernel developers ▶ This system call interface is wrapped by the C library, and user space applications usually never make a system call directly but rather use the corresponding C library function 70 / 122
in user space through pseudo filesystems, sometimes also called virtual filesystems ▶ Pseudo filesystems allow applications to see directories and files that do not exist on any real storage: they are created and updated on the fly by the kernel ▶ The two most important pseudo filesystems are ▶ proc, usually mounted on /proc: Operating system related information (processes, memory management parameters…) ▶ sysfs, usually mounted on /sys: Representation of the system as a set of devices and buses. Information about these devices. 71 / 122
the Linux kernel, as released by Linus Torvalds, are available at http://www.kernel.org ▶ These versions follow the development model of the kernel ▶ However, they may not contain the latest development from a specific area yet. Some features in development might not be ready for mainline inclusion yet. ▶ Many chip vendors supply their own kernel sources ▶ Focusing on hardware support first ▶ Can have a very important delta with mainline Linux ▶ Useful only when mainline hasn’t caught up yet. ▶ Many kernel sub-communities maintain their own kernel, with usually newer but less stable features ▶ Architecture communities (ARM, MIPS, PowerPC, etc.), device drivers communities (I2C, SPI, USB, PCI, network, etc.), other communities (real-time, etc.) ▶ No official releases, only development trees are available 74 / 122
from http://kernel.org/pub/linux/kernel as full tarballs (complete kernel sources) and patches (differences between two kernel versions). ▶ However, more and more people use the git version control system. Absolutely needed for kernel development! ▶ Fetch the entire kernel sources and history git clone git://git.kernel.org/pub/scm/linux/kernel/ git/torvalds/linux.git ▶ Create a branch that starts at a specific stable version git checkout -b <name-of-branch> v3.11 ▶ Web interface available at http://git.kernel.org/cgit/linux/ kernel/git/torvalds/linux.git/tree/. ▶ Read more about Git at http://git-scm.com/ 75 / 122
573 MB (43,000 files, approx 15,800,000 lines) gzip compressed tar archive: 105 MB bzip2 compressed tar archive: 83 MB (better) xz compressed tar archive: 69 MB (best) ▶ Minimum Linux 3.17 compiled kernel size, booting on the ARM Versatile board (hard drive on PCI, ext2 filesystem, ELF executable support, framebuffer console and input devices): 876 KB (compressed), 2.3 MB (raw) ▶ Why are these sources so big? Because they include thousands of device drivers, many network protocols, support many architectures and filesystems… ▶ The Linux core (scheduler, memory management…) is pretty small! 76 / 122
systems. (C was created to implement the first Unix systems) ▶ A little Assembly is used too: ▶ CPU and machine initialization, exceptions ▶ Critical library routines. ▶ No C++ used, see http://www.tux.org/lkml/#s15-3 (Section 15 - Programming Religion) ▶ All the code compiled with gcc ▶ Many gcc specific extensions used in the kernel code, any ANSI C compiler will not compile the kernel ▶ A few alternate compilers are supported (Intel and Marvell) ▶ See http: //gcc.gnu.org/onlinedocs/gcc-4.9.0/gcc/CExtensions.html ▶ Ongoing work to compile the kernel with the LLVM compiler. 78 / 122
in C/assembly? 2. Why don’t we rewrite it all in assembly language for processor Mega666? 3. Why don’t we rewrite the Linux kernel in C++? 4. Finally, while Linus maintains the development kernel, he is the one who makes the final call. In case there are any doubts on what his opinion is, here is what he said in 2004: 1 In fact, in Linux we did try C++ once already, back in 1992. 2 3 It sucks. Trust me - writing kernel code in C++ is a BLOODY STUPID IDEA. 4 5 The fact is, C++ compilers are not trustworthy. They were even worse in 1992, 6 but some fundamental facts haven't changed: 7 8 the whole C++ exception handling thing is fundamentally broken. It's 9 _especially_ broken for kernels. any compiler or language that likes to hide 10 things like memory allocations behind your back just isn't a good choice for a 11 kernel. you can write object-oriented code (useful for filesystems etc) in C, 12 _without_ the crap that is C++. In general, I'd say that anybody who designs his 13 kernel modules for C++ is either 14 (a) looking for problems 15 (b) a C++ bigot that can't see what he is writing is really just C anyway 16 (c) was given an assignment in CS class to do so. 17 Feel free to make up (d). 18 19 — Linus Torvalds 79 / 122
and can’t use user space code. ▶ User space is implemented on top of kernel services, not the opposite. ▶ Kernel code has to supply its own library implementations (string utilities, cryptography, uncompression …) ▶ So, you can’t use standard C library functions in kernel code. (printf(), memset(), malloc(),…). ▶ Fortunately, the kernel provides similar C functions for your convenience, like printk(), memset(), kmalloc(), … 80 / 122
to implement kernel code can undergo changes between two releases. ▶ In-tree drivers are updated by the developer proposing the API change: works great for mainline code. ▶ An out-of-tree driver compiled for a given version may no longer compile or work on a more recent one. ▶ See Documentation/stable_api_nonsense.txt in kernel sources for reasons why. ▶ Of course, the kernel to user space API does not change (system calls, /proc, /sys), as it would break existing programs. 83 / 122
memory locations result in (often fatal) kernel oopses. ▶ Fixed size stack (8 or 4 KB). Unlike in user space, there’s no way to make it grow. ▶ Kernel memory can’t be swapped out. 84 / 122
under the GNU General Public License version 2 ▶ This license gives you the right to use, study, modify and share the software freely ▶ However, when the software is redistributed, either modified or unmodified, the GPL requires that you redistribute the software under the same license, with the source code ▶ If modifications are made to the Linux kernel (for example to adapt it to your hardware), it is a derivative work of the kernel, and therefore must be released under GPLv2 ▶ The validity of the GPL on this point has already been verified in courts ▶ https://en.wikipedia.org/wiki/GNU_General_Public_License# Legal_status ▶ However, you’re only required to do so ▶ At the time the device starts to be distributed ▶ To your customers, not to the entire world 85 / 122
distribute a binary kernel that includes statically compiled proprietary drivers ▶ The kernel modules are a gray area: are they derived works of the kernel or not? ▶ The general opinion of the kernel community is that proprietary drivers are bad: http://j.mp/fbyuuH ▶ From a legal point of view, each driver is probably a different case ▶ Is it really useful to keep your drivers secret? ▶ There are some examples of proprietary drivers, like the Nvidia graphics drivers: https://www.youtube.com/watch?v=iYWzMvlj2RQ (Comments from Linus) ▶ They use a wrapper between the driver and the kernel ▶ Unclear whether it makes it legal or not 86 / 122
(not drivers) ▶ README ▶ Overview and building instructions ▶ REPORTING-BUGS ▶ Bug report instructions ▶ samples/ ▶ Sample code (markers, kprobes, kobjects…) ▶ scripts/ ▶ Scripts for internal or external use ▶ security/ ▶ Security model implementations (SELinux…) ▶ sound/ ▶ Sound support code and drivers ▶ tools/ ▶ Code for various user space tools (mostly C) 91 / 122
drivers/ ppc/ net/ kernel/ include/ timer.c asm/ linux/ … … The header supplies the following to C codes: - Function declaration (i.e., prototype) - E porti g glo al aria les usi g e ter - Pro idi g a ro defi itio s usi g #defi e 94 / 122
kernel/ include/ timer.c asm/ linux/ … … B the wa , do ou k ow what e ter ea s? jiffies E.g., the aria le jiffies is de lared i kernel/timer.c (in v2.4) 95 / 122
kernel/ include/ timer.c asm/ linux/ … … B the wa , do ou k ow what e ter ea s? jiffies Then, the kernel states its existence in include/linux/sched.h sched.h 96 / 122
is ”jiffies”? ▶ The timer inside the kernel! • B the a : hat is ? – 584 ...... 585 extern unsigned long volatile jiffies; 586 ...... 67 ...... 68 unsigned long volatile jiffies; 69 ...... root v2.4 driver include kernel fs init timer.c linux/ sched.h 97 / 122
List of all system calls. include/linux/sched.h struct task_struct defi itio – the structure of a process inside kernel. include/asm-i386/current.h current defi itio – the process that calls the system call. include/linux/keyboard.h Keyboard keymap. include/linux/list.h A generic doubly linked list. include/linux/rbtree.h A not-so-generic red-black tree. 99 / 122
grep, ag ▶ find ▶ Tool to browse source code (mainly C, but also C++ or Java) ▶ Supports huge projects like the Linux kernel. Typically takes less than 1 min. to index the whole Linux sources. ▶ Allows searching for a symbol, a definition, functions, strings, files, etc. ▶ Integration with editors like vim and emacs. 100 / 122
▶ Generic source indexing tool and code browser ▶ Web server based, very easy and fast to use ▶ declaration, implementation, and usage of symbols 102 / 122
▶ many headers ▶ many source files ▶ They all compiled into one executable called the kernel image. ▶ Don’t believe in me? Let’s find out its ”main.c”! ▶ ”start_kernel” is the first function to call ▶ Inside ”main.c”, you can even locate the lines that it kicks start the first process ”/sbin/init”. 103 / 122
and build system is based on multiple Makefiles ▶ One only interacts with the main Makefile, present at the top directory of the kernel source tree ▶ Interaction takes place ▶ using the make tool, which parses the Makefile ▶ through various targets, defining which action should be done (configuration, compilation, installation, etc.). Run make help to see all available targets. ▶ Example ▶ cd linux-3.6.x/ ▶ make <target> 105 / 122
device drivers, filesystem drivers, network protocols and other configurable items ▶ Thousands of options are available, that are used to selectively compile parts of the kernel source code ▶ The kernel configuration is the process of defining the set of options with which you want your kernel to be compiled ▶ The set of options depends ▶ On your hardware (for device drivers, etc.) ▶ On the capabilities you would like to give to your kernel (network capabilities, filesystems, real-time, etc.) 106 / 122
the .config file at the root of kernel sources ▶ Simple text file, key=value style ▶ As options have dependencies, typically never edited by hand, but through graphical or text interfaces: ▶ make xconfig, make gconfig (graphical) (xorg, gtk) ▶ make menuconfig, make nconfig (text) (ncurses interface) ▶ You can switch from one to another, they all load/save the same .config file, and show the same set of options ▶ To modify a kernel in a GNU/Linux distribution: the configuration files are usually released in /boot/, together with kernel images: /boot/config-3.2.0-31-generic ▶ $ zcat /proc/config.gz > .config 107 / 122
file, resulting from the linking of all object files that correspond to features enabled in the configuration ▶ This is the file that gets loaded in memory by the bootloader ▶ All included features are therefore available as soon as the kernel starts, at a time where no filesystem exists ▶ Some features (device drivers, filesystems, etc.) can however be compiled as modules ▶ These are plugins that can be loaded/unloaded dynamically to add/remove features to the kernel ▶ Each module is stored as a separate file in the filesystem, and therefore access to a filesystem is mandatory to use modules ▶ This is not possible in the early boot procedure of the kernel, because no filesystem is available 108 / 122
types of options ▶ bool options, they are either ▶ true (to include the feature in the kernel) or ▶ false (to exclude the feature from the kernel) ▶ tristate options, they are either ▶ true (to include the feature in the kernel image) or ▶ module (to include the feature as a kernel module) or ▶ false (to exclude the feature) ▶ int options, to specify integer values ▶ hex options, to specify hexadecimal values ▶ string options, to specify string values ▶ There are dependencies between kernel options ▶ For example, enabling a network driver requires the network stack to be enabled 109 / 122
source directory ▶ Remember to run multiple jobs in parallel if you have multiple CPU cores. Example: make -j 4 ▶ No need to run as root! ▶ Generates ▶ vmlinux, the raw uncompressed kernel image, in the ELF format, useful for debugging purposes, but cannot be booted ▶ arch/<arch>/boot/*Image, the final, usually compressed, kernel image that can be booted ▶ bzImage for x86, zImage for ARM, vmImage.gz for Blackfin, etc. ▶ arch/<arch>/boot/dts/*.dtb, compiled Device Tree files (on some architectures) ▶ All kernel modules, spread over the kernel source tree, as .ko files. 113 / 122
for the host system by default, so needs to be run as root. Generally not used when compiling for an embedded system, as it installs files on the development workstation. ▶ Installs ▶ /boot/vmlinuz-<version> Compressed kernel image. Same as the one in arch/<arch>/boot ▶ /boot/System.map-<version> Stores kernel symbol addresses ▶ /boot/config-<version> Kernel configuration for this version ▶ Typically re-runs the bootloader configuration utility to take the new kernel into account. 114 / 122
for the host system by default, so needs to be run as root ▶ Installs all modules in /lib/modules/<version>/ ▶ kernel/ Module .ko (Kernel Object) files, in the same directory structure as in the sources. ▶ modules.alias Module aliases for module loading utilities. Example line: alias sound-service-?-0 snd_mixer_oss ▶ modules.dep, modules.dep.bin (binary hashed) Module dependencies ▶ modules.symbols, modules.symbols.bin (binary hashed) Tells which module a given symbol belongs to 115 / 122
Kernel User Space Applications System administration Package management Graphical user interface Networking GNU Software Misc apps Console improvements Kernel Insides (covered in the future) Intro Kernel sources Compiling kernel Kernel data structure Scheduler MM Virtual file system Networking Processes, scheduling and interrupts Kernel Driver Development (covered in the future) Intro kernel module useful general-purpose kernel APIs Linux device and driver model kernel framework for device driver character drivers subsystem Recommended Readings Books Slides 116 / 122
Kernel User Space Applications System administration Package management Graphical user interface Networking GNU Software Misc apps Console improvements Kernel Insides (covered in the future) Intro Kernel sources Compiling kernel Kernel data structure Scheduler MM Virtual file system Networking Processes, scheduling and interrupts Kernel Driver Development (covered in the future) Intro kernel module useful general-purpose kernel APIs Linux device and driver model kernel framework for device driver character drivers subsystem Recommended Readings Books Slides 118 / 122