Improving the ptrace(2) API and preparing for LLVM-10.0

January 13, 2020 posted by Kamil Rytarowski

This month I have improved the NetBSD ptrace(2) API, removing one legacy interface with a few flaws and replacing it with two new calls with new features, and removing technical debt.

As LLVM 10.0 is branching now soon (Jan 15th 2020), I worked on proper support of the LLVM features for NetBSD 9.0 (today RC1) and NetBSD HEAD (future 10.0).

ptrace(2) API changes

There are around 20 Machine Independent ptrace(2) calls. The origin of some of these calls trace back to BSD4.3. The PT_LWPINFO call was introduced in 2003 and was loosely inspired by a similar interface in HP-UX ttrace(2). As that was the early in the history of POSIX threads and SMP support, not every bit of the interface remained ideal for the current computing needs.

The PT_LWPINFO call was originally intended to retrieve the thread (LWP) information inside a traced process.

This call was designed to work as an iterator over threads to retrieve the LWP id + event information. The event information is received in a raw format (PL_EVENT_NONE, PL_EVENT_SIGNAL, PL_EVENT_SUSPENDED).


1. PT_LWPINFO shares the operation name with PT_LWPINFO from FreeBSD that works differently and is used for different purposes:

  • On FreeBSD PT_LWPINFO returns pieces of information for the suspended thread, not the next thread in the iteration.
  • FreeBSD uses a custom interface for iterating over threads (actually retrieving the threads is done with PT_GETNUMLWPS + PT_GETLWPLIST).
  • There is almost no overlapping correct usage of PT_LWPINFO on NetBSD and PL_LWPINFO on FreeBSD, and this causes confusion and misuse of the interfaces (recently I fixed such misuse in the DTrace code).

2. pl_event can only return whether a signal was emitted to all threads or a single one. There is no information whether this is a per-LWP signal or per-PROC signal, no siginfo_t information is attached etc.

3. Syncing our behavior with FreeBSD would mean complete breakage of our PT_LWPINFO users and it is actually unnecessary, as we receive full siginfo_t through Linux-like PT_GET_SIGINFO, instead of reimplementing siginfo_t inside ptrace_lwpinfo in FreeBSD-style. (FreeBSD wanted to follow NetBSD and adopt some of our APIs in ptrace(2) and signals.).

4. Our PT_LWPINFO is unable to list LWP ids in a traced process.

5. The PT_LWPINFO semantics cannot be used in core files as-is (as our PT_LPWINFO returns next LWP, not the indicated one) and pl_event is redundant with netbsd_elfcore_procinfo.cpi_siglwp, and still less powerful (as it cannot distinguish between a per-LWP and a per-PROC signal in a single-threaded application).

6. PT_LWPINFO is already documented in the BUGS section of ptrace(2), as it contains additional flaws.


1. Remove PT_LWPINFO from the public ptrace(2) API, keeping it only as a hidden namespaced symbol for legacy compatibility.

2. Introduce the PT_LWPSTATUS that prompts the kernel about exact thread and retrieves useful information about LWP.

3. Introduce PT_LWPNEXT with the iteration semantics from PT_LWPINFO, namely return the next LWP.

4. Include per-LWP information in core(5) files as "PT_LWPSTATUS@nnn".

5. Fix flattening the signal context in netbsd_elfcore_procinfo in core(5) files, and move per-LWP signal information to the per-LWP structure "PT_LWPSTATUS@nnn".

6. Do not bother with FreeBSD like PT_GETNUMLWPS + PT_GETLWPLIST calls, as this is a micro-optimization. We intend to retrieve the list of threads once on attach/exec and later trace them through the LWP events (PTRACE_LWP_CREATE, PTRACE_LWP_EXIT). It's more important to keep compatibility with current usage of PT_LWPINFO.

7. Keep the existing ATF tests for PT_LWPINFO to avoid rot.

PT_LWPSTATUS and PT_LWPNEXT operate over newly introduced "struct ptrace_lwpstatus". This structure is inspired by: - SmartOS lwpstatus_t, - struct ptrace_lwpinfo from NetBSD, - struct ptrace_lwpinfo from FreeBSD

and their usage in real existing open-source software.

#define PL_LNAMELEN 20 /* extra 4 for alignment */

struct ptrace_lwpstatus {
 lwpid_t  pl_lwpid;  /* LWP described */
 sigset_t pl_sigpend;  /* LWP signals pending */
 sigset_t pl_sigmask;  /* LWP signal mask */
 char  pl_name[PL_LNAMELEN]; /* LWP name, may be empty */
 void  *pl_private;  /* LWP private data */
 /* Add fields at the end */

  • pt_lwpid is picked from PT_LWPINFO.
  • pl_event is removed entirely as useless, misleading and harmful.
  • pl_sigpend and pl_sigmask are mainly intended to untangle the cpi_sig* fields from "struct ptrace_lwpstatus" (fix "XXX" in the kernel code).
  • pl_name is an easy to use API to retrieve the LWP name, replacing sysctl() retrieval. (Previous algorithm: retrieve the number of LWPs, retrieve all LWPs; iterate over them; finding the matching ID; copy the LWP name.) pl_name will also be included with the missing LWP name information in core(5) files.
  • pl_private implements currently missing interface to read the TLS base value.

I have decided to avoid a writable version of PT_LWPSTATUS that rewrites signals, name, or private pointer. These options are practically unused in existing open-source software. There are two exceptions that I am familiar with, but both are specific to kludges overusing ptrace(2). If these operations are needed, they can be implemented without a writable version of PT_LWPSTATUS, patching tracee's code.

I have switched GDB (in base), LLDB, picotrace and sanitizers to the new API. As NetBSD 9.0 is nearing release, this API change will land NetBSD 10.0 and existing ptrace(2) software will use PT_LWPINFO for now.

New interfaces are ensured to be stable and continuously verified by the ATF infrastructure.


In the early in the history of libpthread, the NetBSD developers designed and programmed a libpthread_dbg library. It's use-case was initially intended to handle user-space scheduling of threads in the M:N threading model inspired by Solaris.

After the switch of the internals to new SMP design (1:1 model) by Andrew Doran, this library lost its purpose and was no longer used (except being linked for some time in a local base system GDB version). I removed the libpthread_dbg when I modernized the ptrace(2) API, as it no longer had any use (and it was broken in several ways for years without being noticed).

As I have introduced the PT_LWPSTATUS call, I have decided to verify this interface in a fancy way. I have mapped ptrace_lwpstatus::pl_private into the tls_base structure as it is defined in the sys/tls.h header:

struct tls_tcb {   
        void    **tcb_dtv;
        void    *tcb_pthread;
        void    *tcb_self;
        void    **tcb_dtv;
        void    *tcb_pthread;

The pl_private pointer is in fact a pointer to a structure in debugger's address space, pointing to a tls_tcl structure. This is not true universally in every environment, but it is true in regular programs using the ELF loader and the libpthread library. Now, with the tcb_pthread field we can reference a regular C-style pthread_t object. Now, wrapping it into a real tracer, I have implemented a program that can either start a debuggee or attach to a process and on demand (as a SIGINFO handler, usually triggered in the BSD environment with ctrl-t) dump the full state of pthread_t objects within a process. A part of the example usage is below:

$ ./pthreadtracer -p `pgrep nslookup` 
[ 21088.9252645] load: 2.83  cmd: pthreadtracer 6404 [wait parked] 0.00u 0.00s 0% 1600k
DTV=0x7f7ff7ee70c8 TCB_PTHREAD=0x7f7ff7e94000
LID=4 NAME='sock-0' TLS_TSD=0x7f7ff7eed890
pt_self = 0x7f7ff7e94000
pt_tls = 0x7f7ff7eed890
pt_magic = 0x11110001 (= PT_MAGIC=0x11110001)
pt_state = 1
pt_lock = 0x0
pt_flags = 0
pt_cancel = 0
pt_errno = 35
pt_stack = {.ss_sp = 0x7f7fef9e0000, ss_size = 4194304, ss_flags = 0}
pt_stack_allocated = YES
pt_guardsize = 65536

Full log is stored here. The source code of this program, on top of picotrace is here.

The problem with this utility is that it requires libpthread sources available and reachable by the build rules. pthreadtracer reaches each field of pthread_t knowing its exact internal structure. This is enough for validation of PT_LWPSTATUS, but is it enough for shipping it to users and finding its real world use-case? Debuggers (GDB, LLDB) using debug information can reach the same data with DWARF, but supporting DWARF in pthreadtracer is currently harder than it ought to be for the interface tests. There is also an option to revive at some point libpthread_dbg(3), revamping it for modern libpthread(3), this would help avoid DWARF introspection and it could find some use in self-introspection programs, but are there any?


I keep searching for a solution to properly support lld (LLVM linker).

NetBSD's major issue with LLVM lld is the lack of standalone linker support, therefore being a real GNU ld replacement. I was forced to publish a standalone wrapper for lld, called lld-standalone and host it on GitHub for the time being, at least until we will sort out the talks with LLVM developers.

LLVM sanitizers

As the NetBSD code is evolving, there is a need to support multiple kernel versions starting from 9.0 with the LLVM sanitizers. I have introduced the following changes:

  • [compiler-rt] [netbsd] Switch to syscall for ThreadSelfTlsTcb()
  • [compiler-rt] [netbsd] Add support for versioned statvfs interceptors
  • [compiler-rt] Sync NetBSD ioctl definitions with 9.99.26
  • [compiler-rt] [fuzzer] Include stdarg.h for va_list
  • [compiler-rt] [fuzzer] Enable LSan in libFuzzer tests on NetBSD
  • [compiler-rt] Enable SANITIZER_CAN_USE_PREINIT_ARRAY on NetBSD
  • [compiler-rt] Adapt stop-the-world for ptrace changes in NetBSD-9.99.30
  • [compiler-rt] Adapt for ptrace(2) changes in NetBSD-9.99.30

The purpose of these changes is as follows:

  • Stop using internal interface to retrieve the tcl_tcb struct (TLS base) and switch to public API with the syscall _lwp_getprivate(2). While there, I have harmonized the namespacing of __lwp_getprivate_fast() and __lwp_gettcb_fast() in the NetBSD distribution. Now, every port will need to use the same define (-D_RTLD_SOURCE, -D_LIBC_SOURCE or -D__LIBPTHREAD_SOURCE__). Previously these interfaces were conflicting with the public namespaces (affecting kernel builds) and wrongly suggesting that these interfaces might be available to public third party code. Initially I used it in LLVM sanitizers, but switched it to full-syscall _lwp_getspecific().
  • Nowadays almost every mainstream OS implements support for preinit/initarray/finitarray in all ports, regardless of ABI requirements. NetBSD originally supported these features only when they were mandated by an ABI specification. Christos Zoulas in 2018 enabled these features for all CPUs, and this eventually allowed to enable this feature unconditionally for consumption in the sanitizer code. This allows use of the same interface as Linux or Solaris, rather than relying on C++-style constructors that have their own issues (need to abuse priorities of constructors and lack of guarantee that our code will be called before other constructors, which can be fatal).
  • Support for kernels between 9.0 and 9.99.30 (and later, unless there are breaking changes).

There is still one portability issue in the sanitizers, as we hard-code the offset of the link_map field within the internal dlopen handle pointer. The dlopen handler is internal to the ELF loader object of type Obj_Entry. This type is not available to third party code and it is not stable. It also has a different layout depending on the CPU architecture. The same problem exists for at least FreeBSD, and to some extent to Linux. I have prepared a patch that utilizes the dlinfo(3) call with option RTLD_DI_LINKMAP. Unfortunately there is a regression with MSan on NetBSD HEAD (it works on 9.0rc1) that makes it harder for me to finalize the patch. I suspect that after the switch to GCC 8, there is now incompatible behavior that causes a recursive call sequence: _Unwind_Backtrace() calling _Unwind_Find_FDE(), calling search_object, and triggering the __interceptor_malloc interceptor again, which calls _Unwind_Backtrace(), resulting in deadlock. The offending code is located in src/external/gpl3/gcc/dist/libgcc/unwind-dw2-fde.c and needs proper investigation. A quick workaround to stop recursive stack unwinding unfortunately did not work, as there is another (related?) problem:

==4629==MemorySanitizer CHECK failed:
/public/llvm-project/llvm/projects/compiler-rt/lib/msan/msan_origin.h:104 "((stack_id)) != (0)" (0x0, 0x0)

This shows that this low-level code is very sensitive to slight changes, and needs maintenance power. We keep improving the coverage of tested scenarios on the LLVM buildbot, and we enabled sanitizer tests on 9.0 NetBSD/amd64; however we could make use of more manpower in order to reach full Linux parity in the toolchain.

Other changes

As my project in LLVM and ptrace(2) is slowly concluding, I'm trying to finalize the related tasks that were left behind.

I've finished researching why we couldn't use syscall restart on kevent(2) call in LLDB and improved the system documentation on it. I have also fixed small nits in the NetBSD wiki page on kevent(2).

I have updated the list of ELF defines for CPUs and OS ABIs in sys/exec_elf.h.

Plan for the next milestone

Port remaining ptrace(2) test scenarios from Linux, FreeBSD and OpenBSD to ATF and ensure that they are properly operational.

This work was sponsored by The NetBSD Foundation.

The NetBSD Foundation is a non-profit organization and welcomes any donations to help us continue funding projects and services to the open-source community. Please consider visiting the following URL to chip in what you can: [0 comments]


Post a Comment:
  • HTML Syntax: NOT allowed