Towards backtracing through signal trampolines and fresh libc++


March 09, 2020 posted by Michał Górny

Upstream describes LLDB as a next generation, high-performance debugger. It is built on top of LLVM/Clang toolchain, and features great integration with it. At the moment, it primarily supports debugging C, C++ and ObjC code, and there is interest in extending it to more languages.

In February 2019, I have started working on LLDB, as contracted by the NetBSD Foundation. So far I've been working on reenabling continuous integration, squashing bugs, improving NetBSD core file support, extending NetBSD's ptrace interface to cover more register types and fix compat32 issues, fixing watchpoint and threading support, porting to i386.

During the last month, I've finally managed to create proper reproducers (and tests) for the remaining concurrent signal delivery problems. I have started working on backtracing through signal trampolines, and prepared a libc++ update.

NetBSD concurrent signal updates

While finishing the last report, I was trying to reproduce some of the concurrent test failures in LLDB with plain ptrace(). I've finally managed to do that and therefore discover the factor causing all my earlier attempts to fail — concurrent signal delivery works fine unless the signal is actually delivered to the process and handled by it.

Let me explain this a bit. When a signal is delivered to a debugged process (or one of its threads), it is stopped and the debugger receives stopping signal via waitpid(). Now, if the debugger wishes the signal to be delivered to the process (thread), it needs to pass the signal number as an argument to PT_CONTINUE. If it neglects to do so (passes 0), the signal is discarded.

My tests so far were doing precisely that — discarding the signal. However, once I modified them to pass it back, they started failing similarly to how LLDB tests are failing.

Whenever the debugged program receives concurrent signals to different threads and the debugger requests their delivery, the process is stopped with some of the signals multiple times. Curiously enough, during my testing every signal to a thread was reported at least once which means no signals were lost. I suspect that in an attempt to deliver pending concurrent signals the kernel is passing them again to the debugger rather than to the process itself.

I've used this research to extend testing of concurrent behavior. More specifically, I have:

  1. Made signal concurrency test into a reusable factory.

  2. Started testing passing signal back to the process.

  3. Extended the test to verify that signal is actually being delivered.

  4. Included catching newly-created processes in the test.

  5. Added concurrent breakpoints to the test.

  6. Added concurrent watchpoints to the test.

  7. Finally, started testing combination of simultaneous signals, breakpoints and watchpoints.

Research into backtrace through signal trampoline

The most important of the remaining tasks was to enhance LLDB with NetBSD signal trampoline support.

Signal trampolines on NetBSD

Signal trampolines are shortly covered by Signal delivery chapter of NetBSD Internals.

When a signal is delivered to a running program, the system needs to interrupt its execution and run its defined signal handler. Once the signal handler finishes, the program execution resumes where it left off. How this is achieved differs from system to system.

On NetBSD, so-called signal trampoline is used. The kernel (this is done by sendsig_siginfo() e.g. in amd64/machdep.c function on newer ABIs) saves the program context and executes the signal handler. When the signal handler returns, it returns to a trampoline function defined by the libc that restores the saved context and therefore resumes the program execution.

From debugger's perspective, the backtrace for a process interrupted in midst of a signal handler ends on this trampoline function. However, it is often considered useful to be able to know the status of the process just before the signal was received — and therefore, the point where program execution will continue. The goal in this point was to make LLDB aware of NetBSD's trampoline design and capable of locating and using the saved context to produce full backtrace.

The two possible solutions

There are two approaches to implementing signal trampoline handling:

  1. Explicitly detecting and processing signal trampolines in debugger.

  2. Adding CFI code to signal trampoline implementation in order to store the necessary information in libc itself.

GDB on NetBSD is currently using the first approach. The code (found in nbsd-tdep.c and e.g. amd64-nbsd-tdep.c) explicitly establishes whether the current frame corresponds to a signal trampoline, finds the saved context and processes it.

Long-term, the second approach is preferable. Instead of explicitly writing platform-specific code, we add CFI annotations to the trampoline code (e.g. in __sigtramp2.S). Those annotations are consumed by the toolchain and used to construct frame information inside the executable that can be afterwards consumed by the debugger.

Both approaches are therefore roughly equivalent. The main difference is that approach 1. stores platform-specific logic in the debugger, while approach 2. stores it in the executable for all debuggers to consume.

libc++ update

Another task to undergo during this period was to update libc++ in NetBSD src tree. It was last imported in 2015, to the version roughly corresponding to LLVM 3.7 release. This version is dated and has some bugs, particularly it is prone to miscompilation due to undefined behavior (e.g. segfault in std::map). I've decided to upgrade to the commit corresponding to the most recent LLVM/Clang update.

max_align_t visibility

The first problem I've hit after upgrading is that max_align_t is declared on NetBSD only for C11/C++11. However, on NetBSD libc++ is exposing it unconditionally.

Kamil Rytarowski proposed to expose max_align_t unconditionally in our headers as well. Joerg Sonnenberger on the other hand wants to change libc++ instead.

Missing errno constants

Another issue I've found is that NetBSD is missing the two errno constants for robust mutexes: EOWNERDEAD and ENOTRECOVERABLE. While libc++ has a hack to redefine them when missing, it seemed a better idea to assign them on our end.

I've learned that adding errno constants involves a few changes besides adding new constants:

  1. Adding mapping to Linux compat in sys/compat/linux/common/linux_errno.c.

  2. Adding descriptions to manpage lib/libc/sys/intro.2.

  3. Adding messages to libc catalogs.

  4. Enabling appropriate features in libstdc++.

  5. Adding new error codes to libdtrace.

  6. Adding errno mapping to NFS support in sys/nfs/nfs_subs.c.

While at it, I've made sure to make it harder to accidentally miss doing some of that in the future. Notably:

  1. I've added ATF tests to make sure that libc catalogs stay in sync with errno and signal descriptions in code.

  2. I've added a script to autogenerate libdtrace errno lists.

  3. I've added a compile-time assertion that NFS errno mapping covers all values.

The complete list of commits:

  1. Sync errno messages between catalog and errno.h

  2. Sync signal messages between catalog and sys_siglist

  3. Add tests for missing libc catalog entries

  4. PR standards/44921: Add errno consts for robust mutexes

  5. Enable EOWNERDEAD & ENOTRECOVERABLE in libstdc++

  6. Update dtrace errno.d mapping and add a script for it

  7. Update NFS errno mapping and add assert for correctness

The update

I have sent libc++ update to 01f3a59fb3e2542fce74c768718f594d0debd0da to the mailing list for review. The proposed patch set includes:

  1. Adjust the cleanup script for the new version.

  2. Cleaning up extraneous files from the old import (to make the diff clearer).

  3. Importing the new version and updating Makefiles.

  4. Moving headers to standard /usr/include/c++/v1 location for better interoperability.

  5. Moving libc++ to apache2 license group.

Future plans

This is the final month of my contract and therefore I would like to primarily focus on importing LLDB into src tree. As time permits, I will continue attempting to improve support for backtracing through signal trampolines.

The exact list of remaining tasks in my contract follows:

  1. Add support to backtrace through signal trampoline and extend the support to libexecinfo, unwind implementations (LLVM, nongnu). Examine adding CFI support to interfaces that need it to provide more stable backtraces (both kernel and userland).

  2. Add support for aarch64 target.

  3. Stabilize LLDB and address breaking tests from the test suite.

  4. Merge LLDB with the base system (under LLVM-style distribution).

This work is sponsored by The NetBSD Foundation

The NetBSD Foundation is a non-profit organization and welcomes any donations to help us continue funding projects and services to the open-source community. Please consider visiting the following URL to chip in what you can:

https://netbsd.org/donations/#how-to-donate

[0 comments]

 



Post a Comment:
Comments are closed for this entry.