Wednesday, January 19, 2011

Linux (Debian unstable) system: some apps have started segfaulting for no apparent reason

I have a Xen domU running Debian Unstable on a Xen 3.4.2 host.

This morning I noticed that various apps have started seg-faulting.

In particular running "aptitude safe-upgrade" causes a segfault in aptitude-curses with the following error:

aptitude[1035]: segfault at 7f1006ed13f8 ip 0000000000544293 sp 00007fff94b37140 error 4 in aptitude-curses[400000+331000]

This segfault is totally reproducible.

Other apps (such as reportbug, and trying to log into KDE) also cause reproducible segfaults.

I have another Xen domU running Debian Unstable on the same Xen host, running the same kernel (2.6.32.2), but running "aptitude safe-upgrade" doesn't cause a segfault. Both domUs seem equivalent, yet only one of them segfaults. Here are some points to note:

  • Both domUs use exactly the same kernel (64-bit)
  • Both have the same binaries for /usr/bin/aptitude-curses and all the shared libs it depends on (I used md5sum to compare files on both systems, and ldd to see which shared libs aptitude depends on)
  • I did e2fsck -f on the domU root volume that has the problems and there were no reported errors
  • Both domUs have the same amount of RAM and VCPUs allocated to them
  • I know that segfaults could point to hardware failure, but given that these segfaults are reproducible and that an equivalent domU has no problems (even if I change the order in which the domUs are created, hoping to force each to occupy a different part of physical RAM), it strongly suggests that hardware is not a problem
  • I also wonder whether some files are corrupt, but as I said aptitude and all its dependent libs (as reported by ldd) seem ok
  • I have rebooted the problem domU many times, and rebooted the host Xen OS once
  • I have tried booting the segfaulting domU in single-user mode (by setting the default level to 1 in /etc/inittab) and "aptitude safe-upgrade" still segfaults.

I don't think this is a Xen problem, but without knowing what's causing this I can't be sure.

I'm totally perplexed by why one virtual machine should keep segfaulting, and another similar VM doesn't.

Any help would be greatly appreciated.

Thanks.

  • Run the segfaulting program in gdb, with debugging symbols for all the relevant libraries installed, and diagnose the cause of the problem from there.

    From womble
  • It can be a memory module failure that spoils running apps' memoryy. Try memtest86+ to ensure your RAM modules are okay.

    If yes, then probably you should start analyzing libs: /lib , /usr/lib etc. An easy way is md5sum + diff on different working Linux boxes: maybe some of them are really corrupted?

    From o_O Tync

0 comments:

Post a Comment