Work-in-progress threading support in LLDB
Upstream describes LLDB as a next generation, high-performance debugger. It is built on top of LLVM/Clang toolchain, and features great integration with it. At the moment, it primarily supports debugging C, C++ and ObjC code, and there is interest in extending it to more languages.
In February, I have started working on LLDB, as contracted by the NetBSD Foundation. So far I've been working on reenabling continuous integration, squashing bugs, improving NetBSD core file support, extending NetBSD's ptrace interface to cover more register types and fix compat32 issues, and lately fixing watchpoint support. You can read more about that in my June 2019 report.
My July's work has been focused on improving support for NetBSD threads in LLDB. This involved a lot of debugging and fighting hanging tests, and I have decided to delay committing the results until I manage to provide fixes for all the immediate issues.
Buildbot updates
During July, upstream has made two breaking changes to the build system:
-
Automatic switching to libc++abi when present in ebuild tree was initially removed in D63883. I needed to force it explicitly because of this. However, upstream has eventually reverted the change.
-
LLDB has enabled Python 3 support, and started requiring SWIG 2+ in the process (D64782). We had to upgrade SWIG on the build host, and eventually switched to Python 3 as well.
As a result of earlier watchpoint fixes, a number of new tests started running. Due to lacking multithreading support, I had to XFAIL a number of LLDB tests in r365338.
A few days later, upstream has fixed the issue causing TestFormattersSBAPI to fail. I un-XFAILED it in r365991.
The breaking xfer:libraries-svr4:read change has been reapplied and broke NetBSD process plugin again. And I've reapplied my earlier fix as r366889.
Lit maintainers have broken NetBSD support in tests by starting to use
env -u VAR
syntax in
r366980.
The -u
switch is not specified by POSIX, and not supported by NetBSD
env(1). In order to fix the problem, I've changed FileCheck's behavior
to consider empty envvar as equivalent to disabled
(r367122),
and then switched lit to set both envvars to empty instead
(r367123).
Finally, I've investigated a number of new test failures by the end of the month:
-
New
functionalities/signal/handle-abrt
test was added in r366580. Since trampolines are not properly supported at the moment, I've marked it XFAIL in r367228. -
Two
functionalities/exec
tests started failing since upstream fixed@skipIfSanitized
that previously caused the test to be skipped unconditionally (r366903). Since it's not a regression, I've marked it XFAIL in r367228. -
Same happened for one of the
python_api/hello_world
tests. It was clearly related to another failing test, so I've marked it XFAIL in r367285. -
Two
tools/lldb-vscode
tests were failing since upstream compared realpath'd path with normal path, and our build path happens to include symlinks as one of the parent directories. I've fixed the test to compare realpath in r367291. While at it, I've replaced weirdos.path.split(...)[0]
with cleareros.path.dirname(...)
as suggested by Pavel Labath, in r367290.
NetBSD ptrace() interfaces for thread control
NetBSD currently provides two different methods for thread-related operations.
The legacy method consists of the following requests:
-
PT_CONTINUE
with negativedata
argument. It is used to resume execution of a single thread while suspending all other threads. -
PT_STEP
with positivedata
argument. It is used to single-step the specified thread, while all other threads continue execution. -
PT_STEP
with negativedata
argument. It is used to single-step the specified thread, while all other threads remain suspended.
This means that using those methods, you can effectively either:
-
run all threads, and optionally send signal to the process,
-
run one thread, while keeping other threads suspended,
-
single-step one thread, with all other threads either running or being suspended as a whole.
Furthermore, it is impossible to combine single-stepping with syscall
tracing via PT_SYSCALL
.
The new method introduced by Kamil Rytarowski during his ptrace(2) work is more flexible, and includes the following requests:
-
PT_RESUME
that sets the specified thread to continue running afterPT_CONTINUE
. -
PT_SUSPEND
that sets the specified thread to remain suspended afterPT_CONTINUE
. -
PT_SETSTEP
that enables single-stepping for the specified thread afterPT_CONTINUE
. -
PT_CLEARSTEP
that disables single-stepping for the specified thread.
Using the new API, it is possible to control both execution and single- stepping per thread, and to combine syscall tracing with that. It is also possible to deliver a single signal either to the whole process or to one of the threads.
Implementing threading in LLDB NetBSD process plugin
When I started my work, the support for threads in the NetBSD plugin was minimal. Technically, the code had structures needed to keep the threads and filled it in at start. However, it did not register new or terminated threads, and it did not support per-thread execution control.
The first change necessary was therefore to implement support for
reporting new and terminated threads. I've prepared an initial patch
in D65555. With this patch enabled,
the thread list
command now correctly reports the list of threads
at any moment.
The second change necessary is to fix process resuming routine to support multiple threads properly. The routine is passed a data structure containing requested action for each thread. The old code simply took the action for the first thread, and applied it to the whole process. D64647 is my work-in-progress attempt at using the new ptrace calls to apply correct action for each thread.
However, the patch is currently buggy as it assumed that LLDB should
provide explicit eStateSuspended
action for each thread that is
supposed to be supposed. The current LLDB implementation, on the other
hand, assumes that thread should be suspended if no action is specified
for it. I am currently discussing with upstream whether the current
approach is correct, or should be changed to the explicit
eStateSuspended
usage.
The third change necessary is that we need to explicitly copy debug registers to newly created threads, in order to enable watchpoints on them. However, I haven't gotten to writing a patch for this yet.
Fixing nasty process interrupt
bug
While debugging my threading code, I've hit a nasty bug in LLDB. After
issuing process interrupt
command from remote LLDB session, the server
terminated. After putting a lot of effort into debugging why the server
terminates with no obvious error, I've discovered that it's terminating
because… the client has disconnected.
Further investigation with help of Pavel Labath uncovered that
the client is silently disconnecting because it expects a packet
indicating that the process has stopped and times out waiting for it.
In order to make the server send this packet, NetBSD process plugin
needed to explicitly mark process as stopped in the SIGSTOP
handler.
I've fixed it in r367047.
Future plans
The initial 6 months of my LLDB contract have passed. I am currently taking a month's break from the work, then I will resume it for 3 more months. During that time, I will continue working on threading support and my remaining goals.
The remaining TODO items are:
-
Add support to backtrace through signal trampoline and extend the support to libexecinfo, unwind implementations (LLVM, nongnu). Examine adding CFI support to interfaces that need it to provide more stable backtraces (both kernel and userland).
-
Add support for i386 and aarch64 targets.
-
Stabilize LLDB and address breaking tests from the test suite.
-
Merge LLDB with the base system (under LLVM-style distribution).
This work is sponsored by The NetBSD Foundation
The NetBSD Foundation is a non-profit organization and welcomes any donations to help us continue funding projects and services to the open-source community. Please consider visiting the following URL to chip in what you can:
https://netbsd.org/donations/#how-to-donate
[0 comments]