(This page has no text content)
(This page has no text content)
Robert Love SECOND EDITION Linux System Programming
Linux System Programming, Second Edition by Robert Love Copyright © 2013 Robert Love. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://my.safaribooksonline.com). For more information, contact our corporate/ institutional sales department: 800-998-9938 or corporate@oreilly.com. Editors: Andy Oram and Maria Gulick Production Editor: Rachel Steely Copyeditor: Amanda Kersey Proofreader: Charles Roumeliotis Indexer: WordCo Indexing Services, Inc. Cover Designer: Randy Comer Interior Designer: David Futato Illustrator: Rebecca Demarest May 2013: Second Edition Revision History for the Second Edition: 2013-05-10: First release See http://oreilly.com/catalog/errata.csp?isbn=9781449339531 for release details. Nutshell Handbook, the Nutshell Handbook logo, and the O’Reilly logo are registered trademarks of O’Reilly Media, Inc. Linux System Programming, Second Edition, the image of a man in a flying machine, and related trade dress are trademarks of O’Reilly Media, Inc. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and O’Reilly Media, Inc., was aware of a trade‐ mark claim, the designations have been printed in caps or initial caps. While every precaution has been taken in the preparation of this book, the publisher and author assume no responsibility for errors or omissions, or for damages resulting from the use of the information contained herein. ISBN: 978-1-449-33953-1 [LSI]
(This page has no text content)
Table of Contents Foreword. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii 1. Introduction and Essential Concepts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 System Programming 1 Why Learn System Programming 2 Cornerstones of System Programming 3 System Calls 3 The C Library 4 The C Compiler 4 APIs and ABIs 5 APIs 5 ABIs 6 Standards 7 POSIX and SUS History 7 C Language Standards 8 Linux and the Standards 8 This Book and the Standards 9 Concepts of Linux Programming 10 Files and the Filesystem 10 Processes 16 Users and Groups 18 Permissions 19 Signals 20 Interprocess Communication 20 Headers 21 Error Handling 21 v
Getting Started with System Programming 24 2. File I/O. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 Opening Files 26 The open() System Call 26 Owners of New Files 29 Permissions of New Files 29 The creat() Function 31 Return Values and Error Codes 32 Reading via read() 32 Return Values 33 Reading All the Bytes 34 Nonblocking Reads 35 Other Error Values 35 Size Limits on read() 36 Writing with write() 36 Partial Writes 37 Append Mode 38 Nonblocking Writes 38 Other Error Codes 38 Size Limits on write() 39 Behavior of write() 39 Synchronized I/O 40 fsync() and fdatasync() 41 sync() 43 The O_SYNC Flag 43 O_DSYNC and O_RSYNC 44 Direct I/O 45 Closing Files 45 Error Values 46 Seeking with lseek() 46 Seeking Past the End of a File 47 Error Values 48 Limitations 48 Positional Reads and Writes 49 Error Values 50 Truncating Files 50 Multiplexed I/O 51 select() 52 poll() 58 poll() Versus select() 61 Kernel Internals 62 vi | Table of Contents
The Virtual Filesystem 62 The Page Cache 63 Page Writeback 65 Conclusion 66 3. Buffered I/O. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 User-Buffered I/O 67 Block Size 69 Standard I/O 70 File Pointers 70 Opening Files 71 Modes 71 Opening a Stream via File Descriptor 72 Closing Streams 73 Closing All Streams 73 Reading from a Stream 73 Reading a Character at a Time 74 Reading an Entire Line 75 Reading Binary Data 76 Writing to a Stream 77 Writing a Single Character 78 Writing a String of Characters 78 Writing Binary Data 79 Sample Program Using Buffered I/O 79 Seeking a Stream 80 Obtaining the Current Stream Position 82 Flushing a Stream 82 Errors and End-of-File 83 Obtaining the Associated File Descriptor 84 Controlling the Buffering 84 Thread Safety 86 Manual File Locking 87 Unlocked Stream Operations 88 Critiques of Standard I/O 89 Conclusion 90 4. Advanced File I/O. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 Scatter/Gather I/O 92 readv() and writev() 92 Event Poll 97 Creating a New Epoll Instance 97 Controlling Epoll 98 Table of Contents | vii
Waiting for Events with Epoll 101 Edge- Versus Level-Triggered Events 103 Mapping Files into Memory 104 mmap() 104 munmap() 109 Mapping Example 109 Advantages of mmap() 111 Disadvantages of mmap() 111 Resizing a Mapping 112 Changing the Protection of a Mapping 113 Synchronizing a File with a Mapping 114 Giving Advice on a Mapping 115 Advice for Normal File I/O 118 The posix_fadvise() System Call 118 The readahead() System Call 120 Advice Is Cheap 121 Synchronized, Synchronous, and Asynchronous Operations 121 Asynchronous I/O 123 I/O Schedulers and I/O Performance 123 Disk Addressing 124 The Life of an I/O Scheduler 124 Helping Out Reads 125 Selecting and Configuring Your I/O Scheduler 129 Optimzing I/O Performance 129 Conclusion 135 5. Process Management. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 Programs, Processes, and Threads 137 The Process ID 138 Process ID Allocation 138 The Process Hierarchy 139 pid_t 139 Obtaining the Process ID and Parent Process ID 140 Running a New Process 140 The Exec Family of Calls 140 The fork() System Call 145 Terminating a Process 148 Other Ways to Terminate 149 atexit() 149 on_exit() 151 SIGCHLD 151 Waiting for Terminated Child Processes 151 viii | Table of Contents
Waiting for a Specific Process 154 Even More Waiting Versatility 156 BSD Wants to Play: wait3() and wait4() 158 Launching and Waiting for a New Process 160 Zombies 162 Users and Groups 163 Real, Effective, and Saved User and Group IDs 163 Changing the Real or Saved User or Group ID 164 Changing the Effective User or Group ID 165 Changing the User and Group IDs, BSD Style 165 Changing the User and Group IDs, HP-UX Style 166 Preferred User/Group ID Manipulations 166 Support for Saved User IDs 167 Obtaining the User and Group IDs 167 Sessions and Process Groups 167 Session System Calls 169 Process Group System Calls 170 Obsolete Process Group Functions 172 Daemons 172 Conclusion 175 6. Advanced Process Management. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 Process Scheduling 177 Timeslices 178 I/O- Versus Processor-Bound Processes 179 Preemptive Scheduling 179 The Completely Fair Scheduler 180 Yielding the Processor 181 Legitimate Uses 182 Process Priorities 183 nice() 183 getpriority() and setpriority() 184 I/O Priorities 186 Processor Affinity 186 sched_getaffinity() and sched_setaffinity() 187 Real-Time Systems 190 Hard Versus Soft Real-Time Systems 190 Latency, Jitter, and Deadlines 191 Linux’s Real-Time Support 192 Linux Scheduling Policies and Priorities 192 Setting Scheduling Parameters 196 sched_rr_get_interval() 199 Table of Contents | ix
Precautions with Real-Time Processes 201 Determinism 201 Resource Limits 204 The Limits 205 Setting and Retrieving Limits 209 7. Threading. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 Binaries, Processes, and Threads 211 Multithreading 212 Costs of Multithreading 214 Alternatives to Multithreading 214 Threading Models 215 User-Level Threading 215 Hybrid Threading 216 Coroutines and Fibers 216 Threading Patterns 217 Thread-per-Connection 217 Event-Driven Threading 218 Concurrency, Parallelism, and Races 218 Race Conditions 219 Synchronization 222 Mutexes 222 Deadlocks 224 Pthreads 226 Linux Threading Implementations 226 The Pthread API 227 Linking Pthreads 227 Creating Threads 228 Thread IDs 229 Terminating Threads 230 Joining and Detaching Threads 233 A Threading Example 234 Pthread Mutexes 235 Further Study 239 8. File and Directory Management. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241 Files and Their Metadata 241 The Stat Family 241 Permissions 246 Ownership 248 Extended Attributes 250 Extended Attribute Operations 253 x | Table of Contents
Directories 259 The Current Working Directory 260 Creating Directories 265 Removing Directories 267 Reading a Directory’s Contents 268 Links 271 Hard Links 272 Symbolic Links 273 Unlinking 275 Copying and Moving Files 277 Copying 277 Moving 278 Device Nodes 280 Special Device Nodes 280 The Random Number Generator 281 Out-of-Band Communication 281 Monitoring File Events 283 Initializing inotify 284 Watches 285 inotify Events 287 Advanced Watch Options 290 Removing an inotify Watch 291 Obtaining the Size of the Event Queue 292 Destroying an inotify Instance 292 9. Memory Management. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293 The Process Address Space 293 Pages and Paging 293 Memory Regions 295 Allocating Dynamic Memory 296 Allocating Arrays 298 Resizing Allocations 299 Freeing Dynamic Memory 301 Alignment 303 Managing the Data Segment 307 Anonymous Memory Mappings 308 Creating Anonymous Memory Mappings 309 Mapping /dev/zero 311 Advanced Memory Allocation 312 Fine-Tuning with malloc_usable_size() and malloc_trim() 314 Debugging Memory Allocations 315 Obtaining Statistics 315 Table of Contents | xi
Stack-Based Allocations 316 Duplicating Strings on the Stack 318 Variable-Length Arrays 319 Choosing a Memory Allocation Mechanism 320 Manipulating Memory 321 Setting Bytes 321 Comparing Bytes 322 Moving Bytes 323 Searching Bytes 324 Frobnicating Bytes 325 Locking Memory 325 Locking Part of an Address Space 326 Locking All of an Address Space 327 Unlocking Memory 328 Locking Limits 328 Is a Page in Physical Memory? 328 Opportunistic Allocation 329 Overcommitting and OOM 330 10. Signals. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333 Signal Concepts 334 Signal Identifiers 334 Signals Supported by Linux 335 Basic Signal Management 340 Waiting for a Signal, Any Signal 341 Examples 342 Execution and Inheritance 344 Mapping Signal Numbers to Strings 345 Sending a Signal 346 Permissions 346 Examples 347 Sending a Signal to Yourself 347 Sending a Signal to an Entire Process Group 347 Reentrancy 348 Guaranteed-Reentrant Functions 349 Signal Sets 350 More Signal Set Functions 351 Blocking Signals 351 Retrieving Pending Signals 352 Waiting for a Set of Signals 353 Advanced Signal Management 353 The siginfo_t Structure 355 xii | Table of Contents
The Wonderful World of si_code 357 Sending a Signal with a Payload 361 Signal Payload Example 362 A Flaw in Unix? 362 11. Time. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363 Time’s Data Structures 365 The Original Representation 366 And Now, Microsecond Precision 366 Even Better: Nanosecond Precision 366 Breaking Down Time 367 A Type for Process Time 368 POSIX Clocks 368 Time Source Resolution 369 Getting the Current Time of Day 370 A Better Interface 371 An Advanced Interface 372 Getting the Process Time 372 Setting the Current Time of Day 373 Setting Time with Precision 374 An Advanced Interface for Setting the Time 374 Playing with Time 375 Tuning the System Clock 377 Sleeping and Waiting 380 Sleeping with Microsecond Precision 381 Sleeping with Nanosecond Resolution 382 An Advanced Approach to Sleep 383 A Portable Way to Sleep 385 Overruns 385 Alternatives to Sleeping 386 Timers 386 Simple Alarms 386 Interval Timers 387 Advanced Timers 389 A. GCC Extensions to the C Language. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395 B. Bibliography. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407 Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 411 Table of Contents | xiii
(This page has no text content)
Foreword There is an old line that Linux kernel developers like to throw out when they are feeling grumpy: “User space is just a test load for the kernel.” By muttering this line, the kernel developers aim to wash their hands of all responsibility for any failure to run user-space code as well as possible. As far as they’re concerned, user-space developers should just go away and fix their own code, as any problems are definitely not the kernel’s fault. To prove that it usually is not the kernel that is at fault, one leading Linux kernel devel‐ oper has been giving a “Why User Space Sucks” talk to packed conference rooms for more than three years now, pointing out real examples of horrible user-space code that everyone relies on every day. Other kernel developers have created tools that show how badly user-space programs are abusing the hardware and draining the batteries of un‐ suspecting laptops. But while user-space code might be just a “test load” for kernel developers to scoff at, it turns out that all of these kernel developers also depend on that user-space code every day. If it weren’t present, all the kernel would be good for would be to print out alternating ABABAB patterns on the screen. Right now, Linux is the most flexible and powerful operating system that has ever been created, running everything from the tiniest cell phones and embedded devices to more than 90 percent of the world’s top 500 supercomputers. No other operating system has ever been able to scale so well and meet the challenges of all of these different hardware types and environments. And along with the kernel, code running in user space on Linux can also operate on all of those platforms, providing the world with real applications and utilities people rely on. In this book, Robert Love has taken on the unenviable task of teaching the reader about almost every system call on a Linux system. In so doing, he has produced a tome that xv
will allow you to fully understand how the Linux kernel works from a user-space perspective, and also how to harness the power of this system. The information in this book will show you how to create code that will run on all of the different Linux distributions and hardware types. It will allow you to understand how Linux works and how to take advantage of its flexibility. In the end, this book teaches you how to write code that doesn’t suck, which is the best thing of all. —Greg Kroah-Hartman xvi | Foreword
Preface This book is about system programming on Linux. System programming is the practice of writing system software, which is code that lives at a low level, talking directly to the kernel and core system libraries. Put another way, the topic of the book is Linux system calls and low-level functions such as those defined by the C library. While many books cover system programming for Unix systems, few tackle the subject with a focus solely on Linux, and fewer still address the very latest Linux releases and advanced Linux-only interfaces. Moreover, this book benefits from a special touch: I have written a lot of code for Linux, both for the kernel and for system software built thereon. In fact, I have implemented some of the system calls and other features covered in this book. Consequently, this book carries a lot of insider knowledge, covering not just how the system interfaces should work, but how they actually work and how you can use them most efficiently. This book, therefore, combines in a single work a tutorial on Linux system programming, a reference manual covering the Linux system calls, and an insider’s guide to writing smarter, faster code. The text is fun and accessible, and regardless of whether you code at the system level on a daily basis, this book will teach you tricks that will enable you to be a better software engineer. Audience and Assumptions The following pages assume that the reader is familiar with C programming and the Linux programming environment—not necessarily well-versed in the subjects, but at least acquainted with them. If you are not comfortable with a Unix text editor—Emacs and vim being the most common and highly regarded—start playing with one. You’ll also want to be familiar with the basics of using gcc, gdb, make, and so on. Plenty of other books on tools and practices for Linux programming are out there; Appendix B at the end of this book lists several useful references. I’ve made few assumptions about the reader’s knowledge of Unix or Linux system pro‐ gramming. This book will start from the ground up, beginning with the basics, and xvii
winding its way up to the most advanced interfaces and optimization tricks. Readers of all levels, I hope, will find this work worthwhile and learn something new. In the course of writing the book, I certainly did. Similarly, I make few assumptions about the persuasion or motivation of the reader. Engineers wishing to program (better) at the system level are obviously targeted, but higher-level programmers looking for a stronger foundation will also find a lot to in‐ terest them. Merely curious hackers are also welcome, for this book should satiate that hunger, too. This book aims to cast a net wide enough to satisfy most programmers. Regardless of your motives, above all else, have fun. Contents of This Book This book is broken into 11 chapters and two appendices. Chapter 1, Introduction and Essential Concepts This chapter serves as an introduction, providing an overview of Linux, system programming, the kernel, the C library, and the C compiler. Even advanced users should visit this chapter. Chapter 2, File I/O This chapter introduces files, the most important abstraction in the Unix environ‐ ment, and file I/O, the basis of the Linux programming mode. It covers reading from and writing to files, along with other basic file I/O operations. The chapter culminates with a discussion on how the Linux kernel implements and manages files. Chapter 3, Buffered I/O This chapter discusses an issue with the basic file I/O interfaces—buffer size man‐ agement—and introduces buffered I/O in general, and standard I/O in particular, as solutions. Chapter 4, Advanced File I/O This chapter completes the I/O troika with a treatment on advanced I/O interfaces, memory mappings, and optimization techniques. The chapter is capped with a discussion on avoiding seeks and the role of the Linux kernel’s I/O scheduler. Chapter 5, Process Management This chapter introduces Unix’s second most important abstraction, the process, and the family of system calls for basic process management, including the venerable fork. Chapter 6, Advanced Process Management This chapter continues the treatment with a discussion of advanced process man‐ agement, including real-time processes. xviii | Preface
Comments 0
Loading comments...
Reply to Comment
Edit Comment