Work-in-progress threading support in LLDB

August 02, 2019 posted by Michał Górny

Upstream describes LLDB as a next generation, high-performance debugger. It is built on top of LLVM/Clang toolchain, and features great integration with it. At the moment, it primarily supports debugging C, C++ and ObjC code, and there is interest in extending it to more languages.

In February, I have started working on LLDB, as contracted by the NetBSD Foundation. So far I've been working on reenabling continuous integration, squashing bugs, improving NetBSD core file support, extending NetBSD's ptrace interface to cover more register types and fix compat32 issues, and lately fixing watchpoint support. You can read more about that in my June 2019 report.

My July's work has been focused on improving support for NetBSD threads in LLDB. This involved a lot of debugging and fighting hanging tests, and I have decided to delay committing the results until I manage to provide fixes for all the immediate issues.

Buildbot updates

During July, upstream has made two breaking changes to the build system:

  1. Automatic switching to libc++abi when present in ebuild tree was initially removed in D63883. I needed to force it explicitly because of this. However, upstream has eventually reverted the change.

  2. LLDB has enabled Python 3 support, and started requiring SWIG 2+ in the process (D64782). We had to upgrade SWIG on the build host, and eventually switched to Python 3 as well.

As a result of earlier watchpoint fixes, a number of new tests started running. Due to lacking multithreading support, I had to XFAIL a number of LLDB tests in r365338.

A few days later, upstream has fixed the issue causing TestFormattersSBAPI to fail. I un-XFAILED it in r365991.

The breaking xfer:libraries-svr4:read change has been reapplied and broke NetBSD process plugin again. And I've reapplied my earlier fix as r366889.

Lit maintainers have broken NetBSD support in tests by starting to use env -u VAR syntax in r366980. The -u switch is not specified by POSIX, and not supported by NetBSD env(1). In order to fix the problem, I've changed FileCheck's behavior to consider empty envvar as equivalent to disabled (r367122), and then switched lit to set both envvars to empty instead (r367123).

Finally, I've investigated a number of new test failures by the end of the month:

  1. New functionalities/signal/handle-abrt test was added in r366580. Since trampolines are not properly supported at the moment, I've marked it XFAIL in r367228.

  2. Two functionalities/exec tests started failing since upstream fixed @skipIfSanitized that previously caused the test to be skipped unconditionally (r366903). Since it's not a regression, I've marked it XFAIL in r367228.

  3. Same happened for one of the python_api/hello_world tests. It was clearly related to another failing test, so I've marked it XFAIL in r367285.

  4. Two tools/lldb-vscode tests were failing since upstream compared realpath'd path with normal path, and our build path happens to include symlinks as one of the parent directories. I've fixed the test to compare realpath in r367291. While at it, I've replaced weird os.path.split(...)[0] with clearer os.path.dirname(...) as suggested by Pavel Labath, in r367290.

NetBSD ptrace() interfaces for thread control

NetBSD currently provides two different methods for thread-related operations.

The legacy method consists of the following requests:

  • PT_CONTINUE with negative data argument. It is used to resume execution of a single thread while suspending all other threads.

  • PT_STEP with positive data argument. It is used to single-step the specified thread, while all other threads continue execution.

  • PT_STEP with negative data argument. It is used to single-step the specified thread, while all other threads remain suspended.

This means that using those methods, you can effectively either:

  • run all threads, and optionally send signal to the process,

  • run one thread, while keeping other threads suspended,

  • single-step one thread, with all other threads either running or being suspended as a whole.

Furthermore, it is impossible to combine single-stepping with syscall tracing via PT_SYSCALL.

The new method introduced by Kamil Rytarowski during his ptrace(2) work is more flexible, and includes the following requests:

  • PT_RESUME that sets the specified thread to continue running after PT_CONTINUE.

  • PT_SUSPEND that sets the specified thread to remain suspended after PT_CONTINUE.

  • PT_SETSTEP that enables single-stepping for the specified thread after PT_CONTINUE.

  • PT_CLEARSTEP that disables single-stepping for the specified thread.

Using the new API, it is possible to control both execution and single- stepping per thread, and to combine syscall tracing with that. It is also possible to deliver a single signal either to the whole process or to one of the threads.

Implementing threading in LLDB NetBSD process plugin

When I started my work, the support for threads in the NetBSD plugin was minimal. Technically, the code had structures needed to keep the threads and filled it in at start. However, it did not register new or terminated threads, and it did not support per-thread execution control.

The first change necessary was therefore to implement support for reporting new and terminated threads. I've prepared an initial patch in D65555. With this patch enabled, the thread list command now correctly reports the list of threads at any moment.

The second change necessary is to fix process resuming routine to support multiple threads properly. The routine is passed a data structure containing requested action for each thread. The old code simply took the action for the first thread, and applied it to the whole process. D64647 is my work-in-progress attempt at using the new ptrace calls to apply correct action for each thread.

However, the patch is currently buggy as it assumed that LLDB should provide explicit eStateSuspended action for each thread that is supposed to be supposed. The current LLDB implementation, on the other hand, assumes that thread should be suspended if no action is specified for it. I am currently discussing with upstream whether the current approach is correct, or should be changed to the explicit eStateSuspended usage.

The third change necessary is that we need to explicitly copy debug registers to newly created threads, in order to enable watchpoints on them. However, I haven't gotten to writing a patch for this yet.

Fixing nasty process interrupt bug

While debugging my threading code, I've hit a nasty bug in LLDB. After issuing process interrupt command from remote LLDB session, the server terminated. After putting a lot of effort into debugging why the server terminates with no obvious error, I've discovered that it's terminating because… the client has disconnected.

Further investigation with help of Pavel Labath uncovered that the client is silently disconnecting because it expects a packet indicating that the process has stopped and times out waiting for it. In order to make the server send this packet, NetBSD process plugin needed to explicitly mark process as stopped in the SIGSTOP handler. I've fixed it in r367047.

Future plans

The initial 6 months of my LLDB contract have passed. I am currently taking a month's break from the work, then I will resume it for 3 more months. During that time, I will continue working on threading support and my remaining goals.

The remaining TODO items are:

  1. Add support to backtrace through signal trampoline and extend the support to libexecinfo, unwind implementations (LLVM, nongnu). Examine adding CFI support to interfaces that need it to provide more stable backtraces (both kernel and userland).

  2. Add support for i386 and aarch64 targets.

  3. Stabilize LLDB and address breaking tests from the test suite.

  4. Merge LLDB with the base system (under LLVM-style distribution).

This work is sponsored by The NetBSD Foundation

The NetBSD Foundation is a non-profit organization and welcomes any donations to help us continue funding projects and services to the open-source community. Please consider visiting the following URL to chip in what you can:



Post a Comment:
Comments are closed for this entry.