I have a Xen domU running Debian Unstable on a Xen 3.4.2 host.
This morning I noticed that various apps have started seg-faulting.
In particular running "aptitude safe-upgrade" causes a segfault in aptitude-curses with the following error:
aptitude[1035]: segfault at 7f1006ed13f8 ip 0000000000544293 sp 00007fff94b37140 error 4 in aptitude-curses[400000+331000]
This segfault is totally reproducible.
Other apps (such as reportbug, and trying to log into KDE) also cause reproducible segfaults.
I have another Xen domU running Debian Unstable on the same Xen host, running the same kernel (2.6.32.2), but running "aptitude safe-upgrade" doesn't cause a segfault. Both domUs seem equivalent, yet only one of them segfaults. Here are some points to note:
- Both domUs use exactly the same kernel (64-bit)
- Both have the same binaries for /usr/bin/aptitude-curses and all the shared libs it depends on (I used md5sum to compare files on both systems, and ldd to see which shared libs aptitude depends on)
- I did e2fsck -f on the domU root volume that has the problems and there were no reported errors
- Both domUs have the same amount of RAM and VCPUs allocated to them
- I know that segfaults could point to hardware failure, but given that these segfaults are reproducible and that an equivalent domU has no problems (even if I change the order in which the domUs are created, hoping to force each to occupy a different part of physical RAM), it strongly suggests that hardware is not a problem
- I also wonder whether some files are corrupt, but as I said aptitude and all its dependent libs (as reported by ldd) seem ok
- I have rebooted the problem domU many times, and rebooted the host Xen OS once
- I have tried booting the segfaulting domU in single-user mode (by setting the default level to 1 in /etc/inittab) and "aptitude safe-upgrade" still segfaults.
I don't think this is a Xen problem, but without knowing what's causing this I can't be sure.
I'm totally perplexed by why one virtual machine should keep segfaulting, and another similar VM doesn't.
Any help would be greatly appreciated.
Thanks.
-
Run the segfaulting program in gdb, with debugging symbols for all the relevant libraries installed, and diagnose the cause of the problem from there.
From womble -
It can be a memory module failure that spoils running apps' memoryy. Try memtest86+ to ensure your RAM modules are okay.
If yes, then probably you should start analyzing libs: /lib , /usr/lib etc. An easy way is
md5sum
+diff
on different working Linux boxes: maybe some of them are really corrupted?From o_O Tync
0 comments:
Post a Comment