LLDB: watchpoints, XSTATE in ptrace() and core dumps

July 07, 2019 posted by Michał Górny

Upstream describes LLDB as a next generation, high-performance debugger. It is built on top of LLVM/Clang toolchain, and features great integration with it. At the moment, it primarily supports debugging C, C++ and ObjC code, and there is interest in extending it to more languages.

In February, I have started working on LLDB, as contracted by the NetBSD Foundation. So far I've been working on reenabling continuous integration, squashing bugs, improving NetBSD core file support and lately extending NetBSD's ptrace interface to cover more register types and fix compat32 issues. You can read more about that in my May 2019 report.

In June, I have finally finished the remaining ptrace() work for xstate and got it merged both on NetBSD and LLDB end (meaning it's going to make it into NetBSD 9). I have also worked on debug register support in LLDB, effectively fixing watchpoint support. Once again I had to fight some upstream regressions.

ptrace() XSTATE interface

In the previous report, I was comparing two approaches to resolving unpredictable XSAVE data offsets. Both solutions had their merits but I eventually went with having a pair of requests with a single predictable, extensible structure. As a result, I have implemented two new ptrace() requests:

  • PT_GETXSTATE that obtains full FPU state and stores it in struct xstate,

  • PT_SETXSTATE that updates FPU state as requested from struct xstate.

The main features of this API are:

  1. It provides single call to obtain all supported XSAVE components. This is especially useful for YMM or ZMM registers whose contents are split between disjoint XSAVE components.

  2. It provides a xs_rfbm bitfield that clearly indicates which XSAVE components were available, and which can be used to issue partial updates via PT_SETXSTATE.

  3. It requires the caller to explicitly specify structure size. As a result, new fields (= component types) can be added to it without breaking compatibility with already built programs.

  4. It provides identical API to i386 and amd64 programs, removing the need for code duplication.

  5. It provides backwards compatibility with FSAVE- and FXSAVE-only systems, with xs_rfbm clearly indicating which fields were filled.

  6. It can replace disjoint PT_GETFPREGS and PT_GETXMMREGS APIs on i386/amd64 with a single convenient method.

From user's perspective, the main gain is ability to read YMM (AVX) registers. The code supports ZMM (AVX-512) registers as well but I have not been able to test it due to lack of hardware. That said, if one of the readers is running NetBSD on AVX-512 capable CPU and is willing to help, please contact me and I'll give you some tests to run.

The two relevant commits are:

The two new calls are covered by tests for reading and writing MM (MMX), XMM (SSE) and YMM (AVX) registers. I have also done some work on ZMM (AVX-512) test but I did not complete it due to aforementioned lack of hardware.

On the LLDB end, the change was preceded with some bugfixes and cleanup suggested by Pavel Labath. The relevant commits are:

XSTATE in core dumps

The ptrace() XSTATE supports provides the ability to introspect registers in running programs. However, in order to improve the support for debugging crashed programs the respective support needs to be also added to core dumps.

NetBSD core dumps are built on ELF file format, with additional process information stored in ELF notes. Notes can be conveniently read via readelf -n. Each note is uniquely identified by a pair of name and numeric type identifier. NetBSD-specific notes are split into two groups:

  • process-specific notes (shared by all LWPs) use NetBSD-CORE name and successive type numbers defined in sys/exec_elf.h,

  • LWP-specific notes use NetBSD-CORE@nn where nn is LWP number, and type numbers corresponding to ptrace() requests.

Two process-specific notes are used at the moment:

  1. ELF_NOTE_NETBSD_CORE_PROCINFO containing process information — including killing signal information, PIDs, UIDs, GIDs…

  2. ELF_NOTE_NETBSD_CORE_AUXV containing auxiliary information provided by the dynamic linker.

The LWP-specific notes currently contain register dumps. They are stored in the same format as returned by ptrace() calls, and use the same numeric identifiers as PT_GET* requests.

Previously, only PT_GETREGS and PT_GETFPREGS dumps were supported. This implies that i386 coredumps do not include MMX register values. Both requests were handled via common code, with a TODO for providing machdep (arch-specific) hooks.

My work on core dumps involved three aspects:

  1. Writing ATF tests for their correctness.

  2. Providing machdep API for injecting additional arch-specific notes.

  3. Injecting PT_GETXSTATE data into x86 core dumps.

To implement the ATF tests, I've used PT_DUMPCORE to dump core into a temporary file with predictable filename. Afterwards, I've used libelf to process the ELF file and locate notes in it. The note format I had to process myself — I have included a reusable function to find and read specific note in the tests.

Firstly, I wrote a test for process information. Then, I refactored register tests to reduce code duplication and make writing additional variants much easier, and created matching core dump tests for all existing PT_GET* register tests. Finally, I implemented the support for dumping PT_GETXSTATE information.

Of this work, only the first test was merged. The relevant commits and patches are:

LLDB debug register / watchpoint support

The next item on my TODO was fixing debug register support in LLDB. There are six debug registers on x86, and they are used to support up to four hardware breakpoints or watchpoints (each can serve as either). Those are:

  • DR0 through DR3 registers used to specify the breakpoint or watchpoint address,

  • DR6 acting as status register, indicating which debug conditions have occurred,

  • DR7 acting as control register, used to enable and configure breakpoints or watchpoints.

DR4 and DR5 are obsolete synonyms for DR6 and DR7.

For each breakpoint, the control register provides the following options:

  1. Enabling it as global or local breakpoint. Global breakpoints remain active through hardware task switches, while local breakpoints are disabled on task switches.

  2. Setting it to trigger on code execution (breakpoint), memory write or memory write or read (watchpoints). Read-only hardware watchpoints are not supported on x86, and are normally emulated via read/write watchpoints.

  3. Specifying the size of watched memory to 1, 2, 4 or 8 bytes. 8-byte watchpoints are not supported on i386.

According to my initial examination, watchpoint support was already present in LLDB (most likely copied from relevant Linux code) but it was not working correctly. More specifically, the accesses were reported as opaque tracepoints rather than as watchpoints. While the program was correctly stopped, LLDB was not aware which watchpoint was triggered.

Upon investigating this further, I've noticed that this happens specifically because LLDB is using local watchpoints. After switching it to use global watchpoints, NetBSD started reporting triggered watchpoints correctly.

As a result, new branch of LLDB code started being used… and turned out to segfault. Therefore, my next goal was to locate the invalid memory use and correct it. In this case, the problem lied in the way thread data was stored in a list. Specifically, the program wrongly assumed that the list index will match LWP number exactly. This had two implications.

Firstly, it suffered from off-by-one error. Since LWPs start with 1, and list indexes start with 0, a single-threaded program crashed trying to access past the list. Secondly, the assumption that thread list will be always in order seemed fragile. After all, it relied on LWPs being reported with successive numbers. Therefore, I've decided to rewrite the code to iterate through thread list and locate the correct LWP explicitly.

With those two fixes, some of the watchpoint tests started passing. However, some are still failing because we are not handling threads correctly yet. According to earlier research done by Kamil Rytarowski, we need to copy debug register values into new LWPs as they are created. I am planning to work on this shortly.

Additionally, NetBSD normally disallows unprivileged processes from modifying debug registers. This can be changed via enabling security.models.extensions.user_set_dbregs. Since LLDB tests are normally run via unprivileged users, I had to detect this condition from within LLDB test suite and skip watchpoint tests appropriately.

The LLDB commits relevant to this topic are:

Regressions caught by buildbot

Finally, let's go over the regressions that were caught by our buildbot instance throughout the passing month:

Future plans

Since Kamil has managed to move the kernel part of threading support forward, I'm going to focus on improving threading support in LLDB right now. Most notably, this includes ensuring that LLDB can properly handle multithreaded applications, and that all thread-level actions (stepping, resuming, signalling) are correctly handled. As mentoned above, this also includes handling watchpoints in threads.

Of course, I am also going to finish the work on XSTATE in coredumps, and handle any possible bugs I might have introduced in my earlier work.

Afterwards I will work on the remaining TODO items, that are:

  1. Add support to backtrace through signal trampoline and extend the support to libexecinfo, unwind implementations (LLVM, nongnu). Examine adding CFI support to interfaces that need it to provide more stable backtraces (both kernel and userland).

  2. Add support for i386 and aarch64 targets.

  3. Stabilize LLDB and address breaking tests from the test suite.

  4. Merge LLDB with the base system (under LLVM-style distribution).

This work is sponsored by The NetBSD Foundation

The NetBSD Foundation is a non-profit organization and welcomes any donations to help us continue funding projects and services to the open-source community. Please consider visiting the following URL to chip in what you can:


[1 comment]



Great job Michal! Personally I prefer GCC/GDB but you are doing an awesome work.

Posted by oscar on July 08, 2019 at 12:11 AM UTC #

Post a Comment:
Comments are closed for this entry.