Improving the ptrace(2) API and preparing for LLVM-10.0
This month I have improved the NetBSD
ptrace(2)
API, removing one legacy
interface with a few flaws and replacing it with two new calls with new
features, and removing technical debt.
As LLVM 10.0 is branching now soon (Jan 15th 2020), I worked on proper support of the LLVM features for NetBSD 9.0 (today RC1) and NetBSD HEAD (future 10.0).
ptrace(2) API changes
There are around 20 Machine Independentptrace(2)
calls.
The origin of some of these calls trace back to BSD4.3.
The PT_LWPINFO
call was introduced in 2003 and was loosely inspired
by a similar interface in HP-UX ttrace(2)
.
As that was the early in the history of POSIX threads and SMP support,
not every bit of the interface remained ideal for the current computing needs.
The PT_LWPINFO
call was originally intended
to retrieve the thread (LWP) information inside a traced process.
This call was designed to work as an
iterator over threads to retrieve the LWP id + event information. The
event information is received in a raw format (PL_EVENT_NONE
,
PL_EVENT_SIGNAL
, PL_EVENT_SUSPENDED
).
Problems:
1. PT_LWPINFO
shares the operation name with PT_LWPINFO
from FreeBSD
that works differently and is used for different purposes:
- On FreeBSD
PT_LWPINFO
returns pieces of information for the suspended thread, not the next thread in the iteration. - FreeBSD uses a custom interface for iterating over threads (actually
retrieving the threads is done with
PT_GETNUMLWPS
+PT_GETLWPLIST
). - There is almost no overlapping correct usage of
PT_LWPINFO
on NetBSD andPL_LWPINFO
on FreeBSD, and this causes confusion and misuse of the interfaces (recently I fixed such misuse in the DTrace code).
2. pl_event can only return whether a signal was emitted to all threads or a single one. There is no information whether this is a per-LWP signal or per-PROC signal, no siginfo_t information is attached etc.
3. Syncing our behavior with FreeBSD would mean complete breakage of our
PT_LWPINFO
users and it is actually unnecessary, as we receive full
siginfo_t
through Linux-like PT_GET_SIGINFO
, instead of reimplementing
siginfo_t
inside ptrace_lwpinfo in FreeBSD-style. (FreeBSD
wanted to follow NetBSD and adopt some of our APIs in ptrace(2)
and signals.).
4. Our PT_LWPINFO
is unable to list LWP ids in a traced
process.
5. The PT_LWPINFO
semantics cannot be used in core files as-is (as our
PT_LPWINFO
returns next LWP, not the indicated one) and pl_event is
redundant with netbsd_elfcore_procinfo.cpi_siglwp
, and still
less powerful (as it cannot distinguish between a per-LWP and a per-PROC signal in a
single-threaded application).
6. PT_LWPINFO
is already documented in the BUGS section of ptrace(2)
,
as it contains additional flaws.
Solution:
1. Remove PT_LWPINFO
from the public ptrace(2)
API, keeping it only as a
hidden namespaced symbol for legacy compatibility.
2. Introduce the PT_LWPSTATUS
that prompts the kernel about exact thread
and retrieves useful information about LWP.
3. Introduce PT_LWPNEXT
with the iteration semantics from PT_LWPINFO
,
namely return the next LWP.
4. Include per-LWP information in core(5)
files as "PT_LWPSTATUS@nnn"
.
5. Fix flattening the signal context in netbsd_elfcore_procinfo
in
core(5)
files, and move per-LWP signal information to the per-LWP structure
"PT_LWPSTATUS@nnn"
.
6. Do not bother with FreeBSD like PT_GETNUMLWPS
+ PT_GETLWPLIST
calls,
as this is a micro-optimization. We intend to retrieve the list of
threads once on attach/exec and later trace them through the LWP events
(PTRACE_LWP_CREATE
, PTRACE_LWP_EXIT
). It's more important to keep
compatibility with current usage of PT_LWPINFO
.
7. Keep the existing ATF tests for PT_LWPINFO
to avoid rot.
PT_LWPSTATUS
and PT_LWPNEXT
operate over newly introduced "struct
ptrace_lwpstatus"
. This structure is inspired by:
- SmartOS lwpstatus_t
,
- struct ptrace_lwpinfo
from NetBSD,
- struct ptrace_lwpinfo
from FreeBSD
and their usage in real existing open-source software.
#define PL_LNAMELEN 20 /* extra 4 for alignment */ struct ptrace_lwpstatus { lwpid_t pl_lwpid; /* LWP described */ sigset_t pl_sigpend; /* LWP signals pending */ sigset_t pl_sigmask; /* LWP signal mask */ char pl_name[PL_LNAMELEN]; /* LWP name, may be empty */ void *pl_private; /* LWP private data */ /* Add fields at the end */ };
pt_lwpid
is picked fromPT_LWPINFO
.pl_event
is removed entirely as useless, misleading and harmful.pl_sigpend
andpl_sigmask
are mainly intended to untangle thecpi_sig*
fields from"struct ptrace_lwpstatus"
(fix "XXX" in the kernel code).- pl_name is an easy to use API to retrieve the LWP name, replacing
sysctl()
retrieval. (Previous algorithm: retrieve the number of LWPs, retrieve all LWPs; iterate over them; finding the matching ID; copy the LWP name.)pl_name
will also be included with the missing LWP name information incore(5)
files. - pl_private implements currently missing interface to read the TLS base value.
I have decided to avoid a writable version of PT_LWPSTATUS
that rewrites signals, name, or private pointer. These options are
practically unused in existing open-source software. There are two
exceptions that I am familiar with, but both are specific to kludges
overusing ptrace(2)
. If these operations are needed, they
can be implemented without a writable version of PT_LWPSTATUS
, patching
tracee's code.
I have switched GDB (in base), LLDB, picotrace and sanitizers to the new API.
As NetBSD 9.0 is nearing release, this API change will land NetBSD 10.0
and existing ptrace(2)
software will use PT_LWPINFO
for now.
New interfaces are ensured to be stable and continuously verified by the ATF infrastructure.
pthreadtracer
In the early in the history of libpthread
, the NetBSD developers
designed and programmed a libpthread_dbg
library.
It's use-case was initially intended to handle user-space scheduling of threads
in the M:N threading model inspired by Solaris.
After the switch of the internals to new SMP design (1:1 model) by Andrew Doran,
this library lost its purpose and was no longer used
(except being linked for some time in a local base system GDB version).
I removed the libpthread_dbg
when I modernized the ptrace(2)
API,
as it no longer had any use
(and it was broken in several ways for years without being noticed).
As I have introduced the PT_LWPSTATUS
call, I have decided to verify this interface
in a fancy way. I have mapped ptrace_lwpstatus::pl_private
into the tls_base
structure as it
is defined in the sys/tls.h
header:
struct tls_tcb { #ifdef __HAVE_TLS_VARIANT_I void **tcb_dtv; void *tcb_pthread; #else void *tcb_self; void **tcb_dtv; void *tcb_pthread; #endif };
The pl_private pointer is in fact a pointer to a structure in debugger's address space, pointing to a tls_tcl structure.
This is not true universally in every environment, but it is true in regular programs using the ELF loader and the libpthread library.
Now, with the tcb_pthread
field we can reference a regular C-style pthread_t
object.
Now, wrapping it into a real tracer, I have implemented a program that can either start a debuggee or attach to a process and
on demand
(as a SIGINFO
handler, usually triggered in the BSD environment with ctrl-t)
dump the full state of pthread_t
objects within a process. A part of the example usage is below:
$ ./pthreadtracer -p `pgrep nslookup` [ 21088.9252645] load: 2.83 cmd: pthreadtracer 6404 [wait parked] 0.00u 0.00s 0% 1600k DTV=0x7f7ff7ee70c8 TCB_PTHREAD=0x7f7ff7e94000 LID=4 NAME='sock-0' TLS_TSD=0x7f7ff7eed890 pt_self = 0x7f7ff7e94000 pt_tls = 0x7f7ff7eed890 pt_magic = 0x11110001 (= PT_MAGIC=0x11110001) pt_state = 1 pt_lock = 0x0 pt_flags = 0 pt_cancel = 0 pt_errno = 35 pt_stack = {.ss_sp = 0x7f7fef9e0000, ss_size = 4194304, ss_flags = 0} pt_stack_allocated = YES pt_guardsize = 65536
Full log is stored here. The source code of this program, on top of picotrace is here.
The problem with this utility is that it requires libpthread
sources available and reachable by the build rules.
pthreadtracer reaches each field of pthread_t
knowing its exact internal structure.
This is enough for validation of PT_LWPSTATUS
,
but is it enough for shipping it to users and finding its real world use-case?
Debuggers (GDB, LLDB) using debug information can reach the same data with DWARF,
but supporting DWARF in pthreadtracer is currently harder than it ought to be for the interface tests.
There is also an option to revive at some point libpthread_dbg(3)
, revamping it for modern libpthread(3)
,
this would help avoid DWARF introspection and it could find some use in self-introspection programs, but are there any?
LLD
I keep searching for a solution to properly support lld (LLVM linker).
NetBSD's major issue with LLVM lld is the lack of standalone linker support, therefore being a real GNU ld replacement. I was forced to publish a standalone wrapper for lld, called lld-standalone and host it on GitHub for the time being, at least until we will sort out the talks with LLVM developers.
LLVM sanitizers
As the NetBSD code is evolving, there is a need to support multiple kernel versions starting from 9.0 with the LLVM sanitizers. I have introduced the following changes:
- [compiler-rt] [netbsd] Switch to syscall for ThreadSelfTlsTcb()
- [compiler-rt] [netbsd] Add support for versioned statvfs interceptors
- [compiler-rt] Sync NetBSD ioctl definitions with 9.99.26
- [compiler-rt] [fuzzer] Include stdarg.h for va_list
- [compiler-rt] [fuzzer] Enable LSan in libFuzzer tests on NetBSD
- [compiler-rt] Enable SANITIZER_CAN_USE_PREINIT_ARRAY on NetBSD
- [compiler-rt] Adapt stop-the-world for ptrace changes in NetBSD-9.99.30
- [compiler-rt] Adapt for ptrace(2) changes in NetBSD-9.99.30
The purpose of these changes is as follows:
- Stop using internal interface to retrieve the
tcl_tcb
struct (TLS base) and switch to public API with the syscall_lwp_getprivate(2)
. While there, I have harmonized the namespacing of__lwp_getprivate_fast()
and__lwp_gettcb_fast()
in the NetBSD distribution. Now, every port will need to use the same define (-D_RTLD_SOURCE
,-D_LIBC_SOURCE
or-D__LIBPTHREAD_SOURCE__
). Previously these interfaces were conflicting with the public namespaces (affecting kernel builds) and wrongly suggesting that these interfaces might be available to public third party code. Initially I used it in LLVM sanitizers, but switched it to full-syscall_lwp_getspecific()
. - Nowadays almost every mainstream OS implements support for preinit/initarray/finitarray in all ports, regardless of ABI requirements. NetBSD originally supported these features only when they were mandated by an ABI specification. Christos Zoulas in 2018 enabled these features for all CPUs, and this eventually allowed to enable this feature unconditionally for consumption in the sanitizer code. This allows use of the same interface as Linux or Solaris, rather than relying on C++-style constructors that have their own issues (need to abuse priorities of constructors and lack of guarantee that our code will be called before other constructors, which can be fatal).
- Support for kernels between 9.0 and 9.99.30 (and later, unless there are breaking changes).
There is still one portability issue in the sanitizers, as we hard-code the offset of the link_map
field within the internal dlopen
handle pointer.
The dlopen
handler is internal to the ELF loader object of type Obj_Entry
.
This type is not available to third party code and it is not stable.
It also has a different layout depending on the CPU architecture.
The same problem exists for at least FreeBSD, and to some extent to Linux.
I have prepared a patch that utilizes
the dlinfo
(3) call with option RTLD_DI_LINKMAP
.
Unfortunately there is a regression with MSan on NetBSD HEAD (it works on 9.0rc1) that makes it harder for me to finalize the patch.
I suspect that after the switch to GCC 8,
there is now incompatible behavior that causes a recursive call
sequence: _Unwind_Backtrace()
calling
_Unwind_Find_FDE()
, calling search_object
, and triggering the
__interceptor_malloc
interceptor again, which calls _Unwind_Backtrace()
, resulting in deadlock.
The offending code is located in src/external/gpl3/gcc/dist/libgcc/unwind-dw2-fde.c
and needs proper investigation.
A quick workaround to stop recursive stack unwinding unfortunately did not work, as
there is another (related?) problem:
==4629==MemorySanitizer CHECK failed: /public/llvm-project/llvm/projects/compiler-rt/lib/msan/msan_origin.h:104 "((stack_id)) != (0)" (0x0, 0x0)
This shows that this low-level code is very sensitive to slight changes, and needs maintenance power. We keep improving the coverage of tested scenarios on the LLVM buildbot, and we enabled sanitizer tests on 9.0 NetBSD/amd64; however we could make use of more manpower in order to reach full Linux parity in the toolchain.
Other changes
As my project in LLVM and ptrace(2) is slowly concluding, I'm trying to finalize the related tasks that were left behind.
I've finished researching why we couldn't use syscall restart on
kevent(2) call in LLDB and improved the
system documentation on it.
I have also fixed small nits in the NetBSD
wiki page on kevent(2)
.
I have updated the list of ELF defines
for CPUs and OS ABIs in sys/exec_elf.h
.
Plan for the next milestone
Port remaining ptrace(2) test scenarios from Linux, FreeBSD and OpenBSD to ATF and ensure that they are properly operational.
This work was sponsored by The NetBSD Foundation.
The NetBSD Foundation is a non-profit organization and welcomes any donations to help us continue funding projects and services to the open-source community. Please consider visiting the following URL to chip in what you can:
http://netbsd.org/donations/#how-to-donate [0 comments]