How to open java coredump in gdb

After dozen of experiments I finally come to next action sequence with gdb as the most universal/reliable one.

Good case we have access to client libraries

  1. Check gdb version. it should be 6.x the later the better.
  2. Create directory D inside you working one
  3. Create D/gdbrc file with *full* path to your directory
    set solib-absolute-prefix /home/dms/Sept12/12_09_2008_20_00_node4/D 
    

    Notice: set substitute-path doesn't work because gdb apply it to source files only

  4. symlink apropriate D/java
  5. run
       gdb -x D/gdbrc D/java core
    
  6. type
        info shared
    

You will see something like:

(gdb) info shared
From        To          Syms Read   Shared Object Library
                        No          /lib/tls/libpthread.so.0
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
0x00a05bb0  0x00a068c4  Yes /home/dms/Sept12/D/lib/libdl.so.2
...
0x008c9c00  0x009b9800  Yes /home/dms/Sept12/D/lib/tls/libc.so.6
                        No          /lib/libnsl.so.1
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  1. leave gdb and copy or link missed library under D.

In my case:

D/lib/tls/lib/tls/libpthread.so.0
D/lib/libnsl.so.1

To gather all libraries necessary to open a coredump on other machine run:

 gdb -batch --eval "info shared"  D/java core 2> /dev/null |\
 sed -n -e 's/^.*Yes[^\/]*\//\//p' -e 's/^.*No[^\/]*\//\//p'  > filelist

on your own machine and than:

 cat filelist | zip zipme.zip -@

on client one

Bad case - we don't have access to original libraries. We still can restore JVM part of stack trace manually.

To do it:

  1. Run some java app (e.g. Java2Demo.jar) with exactly the same version of jdk and JVM part of command line.
  2. Kill it by kill -BUS to get a core.
  3. Open core with gdb and check whether upper part of stack trace match customer's one -
#0  0xffffe424 in __kernel_vsyscall ()
#1  0xb76f96e0 in raise () from /lib/libc.so.6
#2  0xb76faf15 in abort () from /lib/libc.so.6
#3  0xb70abbaf in os::abort(bool)
#4  0xb71de555 in VMError::report_and_die()
#5  0xb70b257c in JVM_handle_linux_signal ()
#6  0xb70ae7a4 in signalHandler(int, siginfo*, void*) ()
#7  <signal handler called>

Ever without symbols - you should have exactly 6 entries before <> and os::abort is right the next after libc abort.

  • Type info shared and get addresses where jvm is loaded:
    0xb6bf2bd0  0xb71fe250  Yes (*)  /opt/jdk1.6.0_18/jre/lib/i386/server/libjvm.so
    
  • Calculate offset of os:abort : 0xb70abbaf - 0xb6bf2bd0 = 0x4b8fdf
  • Calculate jvm size: 0xb71fe250 - 0xb6bf2bd0 = 0x60b680
  • Go to cu core and calculate jvm start and end addresses
    <os::abort> - 0x4b8fdf = N, N + 0x60b680
    

*Calculate offset between your JVM and cu JVM

  <os::abort> - 0xb70abbaf (os:abort from my core, or compare two JVM starts)
  • Get stacktrace offset, check whether it within range. apply difference
  • Go to your coredump and type
info address <recalcualted_address>