Search This Blog
Tuesday, April 29, 2008
Linux reset forgotten root password
Friday, April 25, 2008
'Why Should I Hire You?' and Other Favorite Interview Questions
Thursday, April 17, 2008
klogd: The Kernel Logging Dæmon
The syslog dæmon takes messages sent through the syslog applications programming interface (the openlog, syslog and closelog library calls) and dispatches them to various files, sockets, users, pipes, or the bit-bucket (meaning a “discard”--any log message that doesn't match a selector in the syslog.conf file is discarded). Kernel messages, however, cannot use the syslog API.
The basic reason for this is that the syslog API is provided by the kernel itself. Code that can be called by itself is said to be “reentrant”. In order to be reentrant, a secondary call must have no effect on a prior call. If code makes any use of global or static variables, for example, it cannot be reentrant, since a change to a static variable in the secondary call will change it in the primary call.
Think about it this way: what would happen if a program were in the middle of a syslog call when the kernel needed to call the syslog function to report some kernel event? Unless the syslog call were totally reentrant, the kernel call would clobber the user-space call.
Many parts of the kernel are reentrant, either by being truly reentrant or through the use of selective locking. It would be very bad news indeed if a user-space program could block kernel execution merely by getting stuck in a system call. For this reason (among others), a very clean separation is maintained between the functions that kernel-space code may use and the code that user-space code may use. No kernel API function is dependent on the state of any user-space call (or, to be a bit more precise, those dependencies that exist are predictable and well understood). Thus, the kernel doesn't have to worry about what user-space programs are calling what user-space API functions from moment to moment. The kernel code may simply get on with business.
The kernel does, however, need a way to send messages to report abnormal situations, and, when debugging kernel layer code, to “see how far we got”. Thus, the kernel has its own logging API.
(For the curious, the user-space syslog call is provided by a function in the kernel called sys_syslog. The kernel syslog call is called printk. You can find the source code for both in your own copy of the Linux kernel source, which is more than likely to be in the /usr/src/linux directory on your system. Go ahead and look. Remember, it's your source!)
The kernel is built for simplicity and speed. Thus, the kernel's conception of logging is a bit more basic than that of syslog. Kernel messages are simple text with the convention that a priority of 0 to 7 will be encoded in
Sometimes, as when a protection fault occurs, the kernel logs contain memory addresses. The protection fault report from a Linux kernel isn't much use to anyone in debugging your problem, because kernels are almost certainly locally compiled. Even if you have never recompiled your kernel, the sheer number of distributions and versions of distributions out there makes it impossible for someone to help you with a raw protection fault log.
Fortunately, Linux kernels since 1.3.43 have reported addresses in a standard format. The klogd program recognizes that format and attempts to resolve addresses to symbol names so that one can actually find the object or code an address refers to. Later on, we'll cover how klogd does this.
These aspects of kernel messages (separate API, non-syslog attributes and memory address resolution) are the reason for a separate dæmon for kernel messages. There were (and are) patched versions of syslog that pick up kernel messages, but this practice is no longer popular, and with good reason. A clean separation of user-space and kernel-space features makes sense. If syslogd did some of these things, it would tie syslogd fairly tightly to the kernel version. Of course, klogd is bound fairly tightly, but this is considered much more acceptable, since it “encapsulates” this “dependent” code and then passes it on to a standard logging mechanism (syslogd).
So, what klogd does, basically, is read kernel log messages, transforms them slightly (by resolving kernel memory addresses to symbols) and then calls the user-space syslog API with the kernel facility and the priority as encoded in the kernel message. This is the dæmon's default behavior. Let's take a look at how the default behavior can be modified.
First off, klogd does not have a configuration file as syslogd does. Its behavior can be modified only through command-line switches and signals. We'll cover the switches first, then we'll discuss address resolution. Finally, we'll go over the signals to which klogd responds.
-c Default console logging level. The kernel writes log messages not only to the kernel message buffer, but also to the system console (usually /dev/console). The default level for the kernel is 7, which means that messages of a value lower than 7 (higher priority) are written to the console. Often, you will want to change this once klogd is running, so the console isn't always scrolling through a lot of low-priority messages. The klogd/syslogd combination gives you quite a bit more control over your kernel messages than simply dumping them to a screen. You can specify a number n here (ex. -c 4) where messages of a value lower than but not equal to n will go to the console. Note that klogd doesn't route messages to the console itself. It merely provides this interface to change the kernel's setting of the console logging level. Keep in mind that lower values of n are higher priority messages.
-d Debugging mode. This generates lots of output on stderr. Give it a try if you're curious, although I do not recommend running this way for any length of time.
-f Log messages to file. This switch allows you to bypass the syslogd interface and log kernel messages directly to a file. You lose all of syslogd's ability to separate messages by facility and priority, to route a message to multiple destinations, and to route to pipes, sockets and users. It has obvious value, however, if for some reason you aren't running syslogd! (Ex. -f /var/log/kernel.log)
-i, -I Signal the currently running klogd. We'll go over these two switches (they are distinct!) in the section on memory address resolution.
-n Do not auto-background. There are three ways you might run a dæmon: by command at the console, by startup script, or directly with the System V init model (/etc/inittab). When you run with init, you don't want the process to “fork and die” (which is how a *nix process puts itself in the background; see chapter 2.6 of W. Richard Stevens' excellent book UNIX Network Programming if none of this makes any sense) as you would in the other two cases. Generally, this need not concern you if klogd is already running on your box.
-o One-shot mode. When started with this option, klogd will read all the messages presently in the kernel log buffer, and then it will exit.
-p Paranoia mode. This changes when klogd loads kernel symbol data. We'll cover this in more detail in the section on memory address resolution.
-s Force system call mode. Normally, klogd checks at startup for the existence of the /proc/kmsg file. If it is there, this is opened as the place to read kernel messages. If it is not there, klogd will poll the kernel through a system call for kernel messages. The /proc/kmsg is favored because it has lower overhead, especially when there are no kernel messages (which is a common case). You can override the preference for the /proc/kmsg interface and force klogd to use the system call instead with this switch.
-k Kernel symbol file. See the section on memory address resolution.
-v Print version and exit. This document is based on klogd 1.3-3.
-x Do not resolve addresses. See the section on memory address resolution for more information.
(The following discussion presumes Linux running on an x86 processor. I would imagine other processors are similar, but I have not examined the code for them, so I'm not prepared to state that the following holds true for those processors.)
Let's begin by noting that real protection exceptions resulting in kernel logs are very rare events. Most protection faults occur in user-space code. User-space protection faults result in a program termination and core file dump. You can use the core file and your favorite debugger to post-mortem the application. These events hardly bother the Linux kernel, which merrily goes on handling all the other applications in the system.
The faults we are talking about here are processor exceptions that happen in kernel code. These are so rare that I have seen only five since I started using Linux in 1993. Three of them occurred when I was using the “TAMU” Linux release from Texas A&M University. We're talking pre-0.99 Linux. I think that was to be expected. The next occurred when I had a dying hard drive and my swap partition was the defective area. The fifth and last occurred when I had an overheating CPU in my laptop. Since 1994, I haven't seen one for any reason, excepting a hardware failure.
That said, they do happen. Some never-before-used combination of hardware leads to a combination of kernel code never previously run; or perhaps you are a daring soul and you are running a development kernel. Whatever the reason, sometimes good code goes bad. The good news is Linux is an open-source OS. You can fix the bug. Or if not, you can post a bug report that goes directly to the people who can fix the bug. Try that with Windows!
When a protection fault occurs, Linux dumps out a dump of the processor state, including all the registers and the last several entries of the system stack. The latter is critical for finding the source of the problem. Trouble is, the raw dump consists entirely of memory addresses. Since Linux is an open-source system and since many installations have been custom compiled, the likelihood that these addresses will help anyone at a support desk to figure out the problem is small indeed.
Luckily, if you built your kernel in a normal way, there is a file called System.map installed with your kernel (probably in /boot). This maps code and symbols to physical addresses. The klogd dæmon reads this file. This takes care of all the “compiled-in” kernel code, but since the 2.0.x kernel series, Linux has supported kernel modules, which are dynamically loaded kernel-code modules. These could be at any address, depending on which are loaded at a given moment and in what order.
At program start, or in response to a signal, klogd will query the kernel for a list of modules and their load addresses. Kernel modules may register individual function or identifier addresses with the kernel when they are loaded. The klogd dæmon will use this information to report addresses in a fault dump. It is important to note that module addresses from klogd can be out of date! If modules are loaded or unloaded after klogd is initialized, then these module/address resolutions will be incorrect. Your distribution may take care of this for you by providing scripted utilities to refresh klogd automatically. If it does not, then some of the switches we skipped over earlier come into play to help you keep the memory map up to date.
The -i switch tells klogd to reload the module symbols. The -I tells klogd to reload the System.map file. The -p switch enables “paranoia” mode. What this does is cause klogd to attempt to reload the module symbols whenever it sees “Oops” in the kernel message stream. Protection faults have this string in them. I personally consider this kludgy, and I don't use it. Also, if there has been a protection fault, it is possible that the kernel is about to halt or the memory map may be in a corrupt state. It is available if you want it. The -k option allows you to specify the file that contains the kernel symbol information. See the section on multiple kernels below. The -x switch tells klogd to not read kernel and module symbols and simply to dump the protection fault messages untranslated.
Linux beginners are unlikely to encounter this, but more seasoned users will often have more than one bootable kernel on their system at a time. If the box is a hobbyist or kernel hacker's box, it is likely to have a number of stable and a number of development series kernels on it. I myself always keep three generations of stable kernels on my systems, so that if a bug should show up, I can immediately reboot into the older kernel.
When klogd starts, it identifies the kernel version (all kernels since 1.3.43 put version information in the map files) and then looks at:
/boot/System.map /System.map /usr/src/linux/System.map
It will use the kernel version information to choose the correct one, if possible. When I have a development series kernel on a box, I leave the stable kernel map in /boot and I leave the development kernel map in /usr/src/linux. As for my two “old” stable kernels, I just live with the fact that klogd will not be able to resolve addresses in the event of a fault if I boot into them. Remember, you can use the -k switch on klogd to force it to use a particular map file if you build an archive of them.
Just bear in mind as you read this discussion that these events are so rare that in 15 machine years (three machines running Linux 24x7 for five years), I have seen this happen twice, and both times it was due to failing hardware.
In addition to the command-line switches, klogd will respond to certain signals. You send signals with the kill command.
The signals klogd responds to are:
SIGTSTP/SIGCONT
SIGTSTP suspends and SIGCONT resumes kernel logging. The resuming includes re-initialization, so you can use this to, for example, unmount the /proc file system without killing klogd:
kill -TSTPumount /proc kill -CONT
SIGUSR1/SIGUSR2
See the Memory Address Resolution section for more information.
SIGINT/SIGHUP/SIGKILL/SIGTERM
These signals all gracefully shut down klogd.
The klogd works with syslogd to handle the dispatch of kernel messages. It exists solely because the kernel itself is unable to use the syslog API directly. Klogd provides for resolving raw memory addresses into kernel symbol names.
"this command was executed by me on Nilesh Machine to stop all those messages which was splashed on his screen through syslogd, i just redirected that message to another file in root named krnl.msg" [root@nilesh log]# klogd -c 3 -f /.krnl.msg