LLDB now works on i386


February 08, 2020 posted by Michał Górny

Upstream describes LLDB as a next generation, high-performance debugger. It is built on top of LLVM/Clang toolchain, and features great integration with it. At the moment, it primarily supports debugging C, C++ and ObjC code, and there is interest in extending it to more languages.

In February 2019, I have started working on LLDB, as contracted by the NetBSD Foundation. So far I've been working on reenabling continuous integration, squashing bugs, improving NetBSD core file support, extending NetBSD's ptrace interface to cover more register types and fix compat32 issues, fixing watchpoint and threading support.

The original NetBSD port of LLDB was focused on amd64 only. In January, I have extended it to support i386 executables. This includes both 32-bit builds of LLDB (running natively on i386 kernel or via compat32) and debugging 32-bit programs from 64-bit LLDB.

Build bot failure report

I have finished the previous report with indication that upstream broke libc++ builds with gcc. The change in question has been reverted afterwards and recommitted with the necessary fixes.

A test breakage has been caused by adding a clang driver test using env -u. The problem has been resolved by setting the variable to an empty value instead of unsetting it. However, maybe it is time to implement env -u on NetBSD?

Yet another problem was basic_string copy constructor optimization that broke programs at runtime, in particular TableGen. The commit in question has been reverted.

Lastly, adding sigaltstack interception in compiler-rt broke our builds. Missing bits for NetBSD have been added afterwards.

I would like to thank all upstream contributors who are putting an effort to fix their patches to work with NetBSD.

LLDB i386 support

Onto the mysterious UserArea

LLDB uses quite an interesting approach to support reading and writing registers on Linux. It abstracts two register lists for i386 and amd64 respectively. Those lists contain offsets to appropriate fields in data returned by ptrace. When debugging a 32-bit program on amd64, it uses a hybrid. It takes the i386 register list and combines it with offsets specific to amd64 structures. The offsets themselves are not written explicitly but instead established from UserData structure defined in the plugin.

The NetBSD plugin uses a different approach. Rather than using binary offsets, it explicitly accesses appropriate fields in ptrace structures. However, the plugin needs to declare UserData nevertheless in order to fill the offsets in register lists. What are those offsets used for then? That's the first problem I had to answer.

According to LLDB upstream developer Pavel Labath those offsets are additionally used to serialize and deserialize register values in gdb protocol packets. This opened a consideration of improving the protocol-wise compatibility between LLDB and GDB. I'm going to elaborate on this problem separately below. However, the immediate implication was that the precise field order does not matter and can be changed arbitrarily.

My first attempts at reordering the fields to improve GDB compatibility have resulted in new test failures. The offsets must be used for something else as well! After further research, I've realized that our plugin has two register reading/writing interfaces: an interface for operating on a single register, and an interface for reading/writing all registers (in this case, just the general-purpose registers). While the former uses explicit field names/indices, the latter just passes the whole structure as an abstract blob — and apparently the offsets are used to access data in this blob.

This meant that the initial portion of UserData must match the GPR structure as returned by ptrace. However, the remaining registers can be ordered and structured arbitrarily.

Native i386 support

I've decided to follow the ideas used in the Linux plugin. Most importantly, this meant having a single plugin for both 32-bit and 64-bit x86 variants. This is useful because, on one hand, both ptrace interfaces are similar, and on the other, 64-bit debugger uses 64-bit ptrace interface on 32-bit programs. The resulting code uses preprocessor conditions to distinguish between 32-bit and 64-bit API whenever necessary, and debugged program ABI to switch between appropriate register data.

Initially, I've started by implementing a minimal proof-of-concept for 32-bit program on 64-bit debugger support. This way I've aimed to ensure that I won't have to change design in the future in order to support both variants. Once this version started working, I've stashed it and focused on getting native i386 working first.

The result was introducing i386 support in NetBSD Process plugin. The patch added i386 register definitions, and the code to handle them. To reduce code duplication, the functions operate almost exclusively on amd64 constants, and map i386 constants to them when debugging 32-bit programs.

The main differences are in GPR structure for i386/amd64, using PT_GETXMMREGS on i386 (rather than PT_GETFPREGS) and abstracting out debug register constants. The actual floating-point and debug registers are handled via common code.

The second part is improving debugging 32-bit programs on amd64. It adds two features: explicitly recognizing 32-bit executables, and providing 32-bit-alike register context for them. The first part reuses existing LLDB routines in order to read the header of the underlying executable in order to distinguish whether it is 32-bit or 64-bit ELF file. The second part reuses the approach from Linux: takes 32-bit register context, and updates it with 64-bit offsets.

More on LLDB/GDB packet compatibility

I have mentioned the compatibility between LLDB and GDB protocols. In fact, LLDB is using a modified version of the GDB protocol that is only partially compatible with the original. This incompatibility particularly applies to handling registers.

The register packets transmit register values as packet binary data. On NetBSD, the layout used by LLDB is different than the one used by GDB, rendering them incompatible. This incompatibility also means that LLDB cannot be successfully used to connect to other implementations of GDB protocol server, e.g. in qemu.

Both GDB and LLDB support additional abstraction over register packet layout, making it possible to work with a different layout that the default. However, both implement different protocol for exposing those abstractions. LLDB has explicit register layout packet as JSON, while GDB transmits target definition as series of XML files. Ideally, LLDB should grow support for the latter in order to improve its compatibility with different servers.

i386 outside NetBSD

While working on i386 support in NetBSD plugin, I have noticed a number of failing tests that do not seem to be specific to NetBSD. Indeed, upstream indicates that i386 is not actively tested on any platform nowadays.

In order to improve its state a little, I have applied a few small fixes that could be done quickly:

Future plans

I am currently trying to build minimal reproducers for remaining race conditions in concurrent event handling (in particular signal delivery to debugged program).

The remaining tasks in my contract are:

  1. Add support to backtrace through signal trampoline and extend the support to libexecinfo, unwind implementations (LLVM, nongnu). Examine adding CFI support to interfaces that need it to provide more stable backtraces (both kernel and userland).

  2. Add support for aarch64 target.

  3. Stabilize LLDB and address breaking tests from the test suite.

  4. Merge LLDB with the base system (under LLVM-style distribution).

This work is sponsored by The NetBSD Foundation

The NetBSD Foundation is a non-profit organization and welcomes any donations to help us continue funding projects and services to the open-source community. Please consider visiting the following URL to chip in what you can:

https://netbsd.org/donations/#how-to-donate

[1 comment]

 



Comments:

Nice work.

Posted by darktrym on February 09, 2020 at 09:27 AM UTC #

Post a Comment:
  • HTML Syntax: NOT allowed