Douane is now released as 100% open source since more than a week. The application works great but is not, unfortunately, full stable.
When a kernel panic occur, the kernel is no more scheduling tasks, which means no more writing logs in the log files.
So when you reboot your machine, you can't expect to find useful information here.
The first solution found on the web is to use kgdb with a second computer so that when the freeze occur, the second machine can start gdb on the kernel and allow the user to look at the issue and understand what's wrong in the code.
I tried to put in place this configuration, using a virtual machine (in order to avoid to have 2 running computers), but it's really hard.
It was a very nice tool when developing the kernel module of Douane, but it's useless when a user report you an issue.
Kdump + crash
Finaly I found a solution which will help me, on my local machine in order to solve or test some code, but will also help me in order to allow users to send me the dump of their crash so that I can look at the issue like if I was on their machine !
This solution starts with kdump. When it is installed and enabled (I will describe it later for Ubuntu), as soon as the kernel crash, a second kernel will boot immediatelly after that the dump is created in the folder
/var/crash/ and then you're going to switch to this new kernel.
Having the dump saved, you're ready to analyze the issue (or to send it to the developer ... :-)).
Crash is so the tool to open this dump, and analyze it in order to understand the issue. Crash needs, as the first argument, the path to the
vmlinux file of the kernel.
Then you have to pass the path to the dump file. For example, on my machin after the simulated crash, the path to the dump file is
/var/crash/201405051934/dump.201405051934 and the file is about 180MB.
After that crash has opened the dump in the kernel environemnt, you have a set of possible commands in order to analyse different kind of information.
Installation and configuration of Kdump and crash for Ubuntu
As I'm a Ubuntu user, I'm describing here how I have installed it, but you can find a lot of better documentation on the web.
Regarding the installation part, first of all you need kdump:
$ sudo apt-get install linux-crashdump
During the installation, you will be asked if you want to it to manage the reboot of the machine. In the case you answer yes to this question, it means that when you will request your computer to reboot, it will not do a full reboot (so avoid all the BIOS step) and boot a new kernel (hot reboot). If you desire to perform a full reboot, then you need to execute
If you answer no to the question, you're going to keep the normal behavior.
Now you need to change the configuration file in order to enable it. Open the file
/etc/default/kdump-tools and update the
# kdump-tools configuration # --------------------------------------------------------------------------- # USE_KDUMP - controls kdump will be configured # 0 - kdump kernel will not be loaded # 1 - kdump kernel will be loaded and kdump is configured # KDUMP_SYSCTL - controls when a panic occurs, using the sysctl # interface. The contents of this variable should be the # "variable=value ..." portion of the 'sysctl -w ' command. # If not set, the default value "kernel.panic_on_oops=1" will # be used. Disable this feature by setting KDUMP_SYSCTL=" " # Example - also panic on oom: # KDUMP_SYSCTL="kernel.panic_on_oops=1 vm.panic_on_oom=1" # USE_KDUMP=1 #KDUMP_SYSCTL="kernel.panic_on_oops=1"
After a reboot, you can ensure that the configuration is working by executing the following:
$ cat /proc/cmdline BOOT_IMAGE=/boot/vmlinuz-3.13.0-24-generic root=UUID=6152f535-12cb-4bc1-a0c1-909bec9f66f6 ro quiet splash crashkernel=384M-:128M
As you can see here, I have the last part
crashkernel=384M-:128M which means to reserve 128M of memory in order to be used by the second kernel in order to boot when a crash will occur.
Last step regarding the installation: The kernel debug information:
$ sudo tee /etc/apt/sources.list.d/ddebs.list << EOF deb http://ddebs.ubuntu.com/ $(lsb_release -cs) main restricted universe multiverse deb http://ddebs.ubuntu.com/ $(lsb_release -cs)-security main restricted universe multiverse deb http://ddebs.ubuntu.com/ $(lsb_release -cs)-updates main restricted universe multiverse deb http://ddebs.ubuntu.com/ $(lsb_release -cs)-proposed main restricted universe multiverse EOF $ sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys ECDCAD72428D7C01 $ sudo apt-get update $ sudo apt-get install linux-image-$(uname -r)-dbgsym
Try the configuration
In order to ensure that the full installation and configuration is working well, we are going to force a crash on the machine and use crash in order to analyse the dump ! :-)
Forcing the crash !
$ sudo su - $ echo 1 > /proc/sys/kernel/sysrq $ echo c > /proc/sysrq-trigger
After hitting the Enter key, your computer will no more respond and freeze !
During the next minute(s), your compute will create a dump in the swap partition, then reboot and move the dump to the
/var/crash folder, and finaly boot to your environment.
Analysing the crash
Now we would like to check why our compute has crashed ! We are amnesic, and don't remember the crash test ... :-D
In a terminal we start
$ sudo crash /usr/lib/debug/boot/vmlinux-3.13.0-24-generic /var/crash/201405051934/dump.201405051934 [sudo] password for zedtux: crash 7.0.3 Copyright (C) 2002-2013 Red Hat, Inc. Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation Copyright (C) 1999-2006 Hewlett-Packard Co Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited Copyright (C) 2006, 2007 VA Linux Systems Japan K.K. Copyright (C) 2005, 2011 NEC Corporation Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc. Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc. This program is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Enter "help copying" to see the conditions. This program has absolutely no warranty. Enter "help warranty" for details. GNU gdb (GDB) 7.6 Copyright (C) 2013 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-unknown-linux-gnu"... KERNEL: /usr/lib/debug/boot/vmlinux-3.13.0-24-generic DUMPFILE: /var/crash/201405051934/dump.201405051934 [PARTIAL DUMP] CPUS: 4 DATE: Mon May 5 19:34:38 2014 UPTIME: 00:54:46 LOAD AVERAGE: 0.14, 0.07, 0.05 TASKS: 495 NODENAME: zUbuntu RELEASE: 3.13.0-24-generic VERSION: #46-Ubuntu SMP Thu Apr 10 19:11:08 UTC 2014 MACHINE: x86_64 (2675 Mhz) MEMORY: 8 GB PANIC: "Oops: 0002 [#1] SMP " (check log for details) PID: 7826 COMMAND: "tee" TASK: ffff8800a2ef8000 [THREAD_INFO: ffff8800a2e68000] CPU: 2 STATE: TASK_RUNNING (PANIC) crash>
Here you have already a lot of interesting information. The panic was Oops: 0002 [#1] SMP, on the CPU 2 with the command tee.
Next step is to look at all the available tools that we can use. So let's execute the help command:
crash> help * files mach repeat timer alias foreach mod runq tree ascii fuser mount search union bt gdb net set vm btop help p sig vtop dev ipcs ps struct waitq dis irq pte swap whatis eval kmem ptob sym wr exit list ptov sys q extend log rd task crash version: 7.0.3 gdb version: 7.6 For help on any command above, enter "help <command>". For help on input options, enter "help input". For help on output options, enter "help output". crash>
You can find a description of each available commands in the Redhat whitepaper on crash.
The most important command is bt as described in its documentation.
But here, in order to prouve what happened, we only need the log command which shows the following:
[ 8.207165] cgroup: "memory" requires setting use_hierarchy to 1 on the root. [ 8.369282] IPv6: ADDRCONF(NETDEV_UP): virbr0: link is not ready [ 9.344282] r8169 0000:02:00.0 eth0: link up [ 9.344295] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready [ 9.834872] douane: Kernel module loaded [ 10.152439] init: plymouth-stop pre-start process (1925) terminated with status 1 [ 3288.251889] SysRq : Trigger a crash <============= HERE IS THE INTERESTING LINE ========== [ 3288.251905] BUG: unable to handle kernel NULL pointer dereference at (null) [ 3288.251907] IP: [<ffffffff8144de76>] sysrq_handle_crash+0x16/0x20 [ 3288.251913] PGD b95e0067 PUD 3607d067 PMD 0 [ 3288.251916] Oops: 0002 [#1] SMP [ 3288.251919] Modules linked in: douane(OF) ipt_REJECT xt_CHECKSUM iptable_mangle xt_tcpudp ip6table_filter ip6_tables ebtable_nat ebtables xt_addrtype xt_conntrack iptable_filter ipt_MASQUERADE iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip_tables x_tables bridge stp llc aufs binfmt_misc snd_hda_codec_hdmi snd_usb_audio snd_usbmidi_lib snd_hda_codec_via snd_seq_midi snd_seq_midi_event snd_rawmidi snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_page_alloc snd_seq coretemp kvm_intel kvm snd_seq_device snd_timer nvidia(POF) i7core_edac psmouse serio_raw edac_core snd lpc_ich drm soundcore mac_hid asus_atk0110 lp parport pata_acpi hid_generic usbhid hid r8169 ahci mii libahci pata_jmicron [ 3288.251955] CPU: 2 PID: 7826 Comm: tee Tainted: PF O 3.13.0-24-generic #46-Ubuntu [ 3288.251957] Hardware name: System manufacturer System Product Name/P7P55D LE, BIOS 2003 12/16/2010 [ 3288.251958] task: ffff8800a2ef8000 ti: ffff8800a2e68000 task.ti: ffff8800a2e68000 [ 3288.251960] RIP: 0010:[<ffffffff8144de76>] [<ffffffff8144de76>] sysrq_handle_crash+0x16/0x20 [ 3288.251963] RSP: 0018:ffff8800a2e69e88 EFLAGS: 00010082 [ 3288.251964] RAX: 000000000000000f RBX: ffffffff81c9f6a0 RCX: 0000000000000000 [ 3288.251965] RDX: ffff88021fc4ffe0 RSI: ffff88021fc4e3c8 RDI: 0000000000000063 [ 3288.251966] RBP: ffff8800a2e69e88 R08: 0000000000000096 R09: 0000000000000387 [ 3288.251968] R10: 0000000000000386 R11: 0000000000000003 R12: 0000000000000063 [ 3288.251969] R13: 0000000000000246 R14: 0000000000000004 R15: 0000000000000000 [ 3288.251971] FS: 00007fb0f665b740(0000) GS:ffff88021fc40000(0000) knlGS:0000000000000000 [ 3288.251972] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 3288.251973] CR2: 0000000000000000 CR3: 00000000368fd000 CR4: 00000000000007e0 [ 3288.251974] Stack: [ 3288.251975] ffff8800a2e69ec0 ffffffff8144e5f2 0000000000000002 00007fff3cea3850 [ 3288.251978] ffff8800a2e69f50 0000000000000002 0000000000000008 ffff8800a2e69ed8 [ 3288.251980] ffffffff8144eaff ffff88021017a900 ffff8800a2e69ef8 ffffffff8121f52d [ 3288.251983] Call Trace: [ 3288.251986] [<ffffffff8144e5f2>] __handle_sysrq+0xa2/0x170 [ 3288.251988] [<ffffffff8144eaff>] write_sysrq_trigger+0x2f/0x40 [ 3288.251992] [<ffffffff8121f52d>] proc_reg_write+0x3d/0x80 [ 3288.251996] [<ffffffff811b9534>] vfs_write+0xb4/0x1f0 [ 3288.251998] [<ffffffff811b9f69>] SyS_write+0x49/0xa0 [ 3288.252001] [<ffffffff8172663f>] tracesys+0xe1/0xe6 [ 3288.252002] Code: 65 34 75 e5 4c 89 ef e8 f9 f7 ff ff eb db 0f 1f 80 00 00 00 00 66 66 66 66 90 55 c7 05 94 68 a6 00 01 00 00 00 48 89 e5 0f ae f8 <c6> 04 25 00 00 00 00 01 5d c3 66 66 66 66 90 55 31 c0 c7 05 be [ 3288.252025] RIP [<ffffffff8144de76>] sysrq_handle_crash+0x16/0x20 [ 3288.252028] RSP <ffff8800a2e69e88> [ 3288.252029] CR2: 0000000000000000
The interesting line is the following:
[ 3288.251889] SysRq : Trigger a crash
Of course this is a very simple example, and I'm not going deep enough about how to find the issue of a freeze, but I think I will post articles when I will discover interesting things.