LLDB restoration and return to ptrace(2)


March 01, 2018 posted by Kamil Rytarowski

I've managed to unbreak the LLDB debugger as much as possible with the current kernel and hit problems with ptrace(2) that are causing issues with further work on proper NetBSD support. Meanwhile, I've upstreamed all the planned NetBSD patches to sanitizers and helped other BSDs to gain better or initial support.

LLDB

Since the last time I worked on LLDB, we have introduced many changes to the kernel interfaces (most notably related to signals) that apparently fixed some bugs in Go and introduced regressions in ptrace(2). Part of the regressions were noted by the existing ATF tests. However, the breakage was only marked as a new problem to resolve. For completeness, the ptrace(2) code was also cleaned up by Christos Zoulas, and we fixed some bugs with compat32.

I've fixed a crash in *NetBSD::Factory::Launch(), triggered on startup of the lldb-server application.

Here is the commit message:

We cannot call process_up->SetState() inside
the NativeProcessNetBSD::Factory::Launch
function because it triggers a NULL pointer
deference.

The generic code for launching a process in:
GDBRemoteCommunicationServerLLGS::LaunchProcess
sets the m_debugged_process_up pointer after
a successful call to  m_process_factory.Launch().
If we attempt to call process_up->SetState()
inside a platform specific Launch function we
end up dereferencing a NULL pointer in
NativeProcessProtocol::GetCurrentThreadID().

Use the proper call process_up->SetState(,false)
that sets notify_delegates to false.

Differential Revision: D42868

I've synchronized the logging interfaces in PlatformNetBSD.cpp with Linux, switching to a more generic and modern API and thus reducing the unneeded code difference with this OS.

Differential Revision: D42912

I've submitted a patch to fix recognition of NetBSD images (programs and userland core(5) files). This code is still pending a review and now marked as "Changes Planned", because I was requested to ship tests and I feel more comfortable shipping tests with a more functional debugger.

Differential Revision: D42870

The immediate kernel tracing bug is generating invalid signals, SIGSTOP instead of SIGTRAP and they are apparently occurring under abnormal conditions. This is the reason why I decided to return to ptrace(2) and correct all the problems.

The abnormal breakage looks like this:

Process 1369 stopped
* thread #1, stop reason = signal SIGSTOP
    frame #0: 0x00007f7efc000770 ld.elf_so`.rtld_start
ld.elf_so`.rtld_start:
->  0x7f7efc000770 <+0>: subq   $0x10, %rsp
    0x7f7efc000774 <+4>: movq   %rsp, %r12
    0x7f7efc000777 <+7>: pushq  %rbx
    0x7f7efc000778 <+8>: andq   $-0x10, %rsp

I can step few instructions, but after stepping through a few indeterministic number of them the process is killed and lldb-server detaches abnormally.

Process 1369 stopped
* thread #1, stop reason = instruction step over
    frame #0: 0x00007f7efc000774 ld.elf_so`.rtld_start + 4
ld.elf_so`.rtld_start:
->  0x7f7efc000774 <+4>:  movq   %rsp, %r12
    0x7f7efc000777 <+7>:  pushq  %rbx
    0x7f7efc000778 <+8>:  andq   $-0x10, %rsp
    0x7f7efc00077c <+12>: leaq   0x21087d(%rip), %rax      ; _GLOBAL_OFFSET_TABLE_
(lldb)  
 <  16> send packet: $vCont;s:0001#b3
Process 1369 exited with status = -1 (0xffffffff) lost connection

My observation is that without fixing the kernel we won't make much more progress.

Sanitizers

I suspended development of new features in sanitizers last month, but I was still in the process of upstreaming of local patches. This process was time-consuming as it required rebasing patches, adding dedicated tests, and addressing all other requests and comments from the upstream developers.

A fairly complete list of changes that landed upstream:

  • Add new interceptors: strlcpy(3) and strlcat(3)
  • Add new NetBSD interceptors: devname(3), devname_r(3)
  • Handle NetBSD symbol mangling devname -> __devname50
  • Correct a bug in GetArgsAndEnv() for NetBSD
  • Add new interceptor: lstat(2)
  • Add NetBSD syscall hooks skeleton in sanitizers
  • Prevent recursive MSan interceptors in fgets(3)
  • Prevent recursive MSan interceptors in strftime(3) like functions
  • Teach sanitizer about NetBSD specific ioctl(2) calls
  • Enable syscall-specific functions in TSan/NetBSD
  • Enable test/asan for NetBSD
  • Implement a large part of NetBSD syscalls of netbsd_syscall_hooks.h
  • Add initial XRay support for NetBSD
  • Recognize all NetBSD architectures in UBSan
  • Stop intercepting forkpty(3) and openpty(3) on NetBSD
  • Add new interceptor: fgetln(3)
  • Add new interceptor: strmode(3)
  • Correct ctype(3) functions with NLS on NetBSD
  • Skip two more ioctl interceptors for NetBSD
  • Add new interceptors: getttyent(3) family
  • Add new interceptors: getprotoent(3) family
  • Add new interceptors: getnetent(3) family
  • Disable ASan exceptions on NetBSD
  • Stop linking sanitized applications with -lutil and -lkvm on NetBSD
  • Handle the NetBSD case in ToolChain::getOSLibName()
  • Skip two more ioctl interceptors for NetBSD
  • Mark the textdomain.cc test as unsupported on BSDs

I'm not counting hot fixes, as some changes were triggering build or test issues on !NetBSD hosts. Thankfully all these issues were addressed quickly. The final result is a reduction of local delta size of almost 1MB to less than 100KB (1205 lines of diff). The remaining patches are rescheduled for later, mostly because they depend on extra work with cross-OS tests and prior integration of sanitizers with the basesystem distribution. I didn't want to put extra work here in the current state of affairs and, I've registered as a mentor for Google Summer of Code for the NetBSD Foundation and prepared Software Quality improvement tasks in order to outsource part of the labour.

Userland changes

Part of the work landed the basesystem tree. Here is a list:

  • Install GCC (gcc.old/) headers for Sanitizers
  • Install GCC (gcc) headers for Sanitizers
  • Introduce _UC_MACHINE_FP() as a macro
  • Stop installing dbregs.h
  • Add new tests in lib/libc/sys/t_ucontext

I've also improved documentation for some of the features of NetBSD, described in man-pages. These pieces of information were sometimes wrong or incomplete, and this makes covering the NetBSD system with features such as sanitizers harder as there is a mismatch between the actual code and the documented code.

Some pieces of software also require better namespacing support, these days mostly for the POSIX standard. I've fixed few low-hanging fruits there and requested pullups to NetBSD-8(BETA).

I thank the developers for improving the landed code in order to ship the best solutions for users.

mdnsd - Multicast and Unicast DNS daemon

I've been debugging the connectivity issues between lldb client and lldb-server. I've observed a dying connection for one particular message (these programs communicate using the GDB remote protocol, with LLDB extensions): qHostInfo. This message emitted by a client asks the server about the Host Information.

The communication looks like this:

$ ./lldb
(lldb) log enable gdb-remote packets
(lldb) process connect connect://localhost:1234
                 <   1> send packet: +
                 history[1] tid=0x0001 <   1> send packet: +
                 <  19> send packet: $QStartNoAckMode#b0
                 <   1> read packet: +
                 <   6> read packet: $OK#9a
                 <   1> send packet: +
                 <  41> send packet: $qSupported:xmlRegisters=i386,arm,mips#12
                 < 124> read packet: $PacketSize=20000;QStartNoAckMode+;QThreadSuffixSupported+;QListThreadsInStopReply+;qEcho+;QPassSignals+;qXfer:auxv:read+#be
                 <  26> send packet: $QThreadSuffixSupported#e4
                 <   6> read packet: $OK#9a
                 <  27> send packet: $QListThreadsInStopReply#21
                 <   6> read packet: $OK#9a
                 <  13> send packet: $qHostInfo#9b

The communication is now hanging and data is no longer being received by the client.

I was debugging the server, the client side, and even tapping the wire to test if there was ongoing communication and to find the lost answer. The answer looks like this and is 400-500 octets long (originally sent as a single line, divided into 100-octet rows):

$triple:7838365f36342d756e6b6e6f776e2d6e6574627364382e39392e3132;ptrsize:8;watchpoint_exceptions_rec
eived:after;endian:little;os_version:8.99.12;os_build:30383939303031323030;os_kernel:4e6574425344203
82e39392e3132202847454e45524943292023303a20576564204665622032382030383a30363a33332043455420323031382
020726f6f744063686965667465633a2f7075626c69632f6e65746273642d726f6f742f7379732f617263682f616d6436342
f636f6d70696c652f47454e45524943;hostname:6c6f63616c686f7374;#88

We can decode the string using e.g. radare2 tools:

$ rax2 -s 7838365f36342d756e6b6e6f776e2d6e6574627364382e39392e3132;echo
x86_64-unknown-netbsd8.99.12

The debugging took a while, and after finding no bugs on the client and server side I've finally detected the root cause of the problem: mdnsd. An upgrade in the development branch broke mdnsd and it couldn't resolve the hostname anymore. LLDB was calling the following algorithm on the server side:

#include <stdio.h>
#include <unistd.h>
#include <netdb.h>
#include <limits.h>

int
main(int argc, char **argv)
{
  char hostname[PATH_MAX];

  hostname[sizeof(hostname) - 1] = '\0';
  if (gethostname(hostname, sizeof(hostname) - 1) == 0) {
    printf("gethostname done\n");
    struct hostent *h = gethostbyname(hostname);
    printf("gethostbyname done\n");
    if (h)
      printf("h: '%s'\n", h->h_name);
    else
      printf("!h: '%s'\n", hostname);
  } else {
    printf("gethostname error\n");
  }

  return 0;
}

The gethostbyname(3) operation was taking 10 seconds, instead of returning a proper result almost immediately. I've verified that LLDB expects a response in 1 second and GDB within 2 seconds. This was a good sign that something was broken on the NetBSD side. Thanks to the excellent ktruss(1) tool I tracked down the root cause quickly and with feedback provided by more experienced networking engineers we concluded that mdnsd was broken.

I've found a workaround, defining my host in /etc/hosts and assuring that /etc/nsswitch.conf lists files before dns & mdnsd for the hosts option.

The mdnsd problem has been reported to developers and was quickly fixed by Christos Zoulas.

The name resolution with mdnsd is quick and correct again:

$ time getent hosts rugged.local
192.168.0.241     rugged.local
    0.03s real     0.00s user     0.00s system

BSD collaboration in LLVM

A One-man-show in human activity is usually less fun and productive than collaboration in a team. This is also true in software development. Last month I was helping as a reviewer to port LLVM features to FreeBSD and when possible to OpenBSD. This included MSan/FreeBSD, libFuzzer/FreeBSD, XRay/FreeBSD and UBSan/OpenBSD.

I've landed most of the submitted and reviewed code to the mainstream LLVM tree.

Part of the code also verified the correctness of NetBSD routes in the existing porting efforts and showed new options for improvement. This is the reason why I've landed preliminary XRay/NetBSD code and added missing NetBSD bits to ToolChain::getOSLibName(). The latter produced setup issues with the prebuilt LLVM toolchain, as the directory name with compiler-rt goodies were located in a path like ./lib/clang/7.0.0/lib/netbsd8.99.12 with a varying OS version. This could stop working after upgrades, so I've simplified it to "netbsd", similar to FreeBSD and Solaris.

Prebuilt toolchain for testers

I've prepared a build of Clang/LLVM with LLDB and compiler-rt features prebuilt on NetBSD/amd64 v. 8.99.12:

llvm-clang-compilerrt-lldb-7.0.0beta_2018-02-28.tar.bz2

Plan for the next milestone

With the approaching NetBSD 8.0 release I plan to finish backporting a few changes there from HEAD:

  • Remove one unused feature from ptrace(2), PT_SET_SIGMASK & PT_GET_SIGMASK. I've originally introduced these operations with criu/rr-like software in mind, but they are misusing or even abusing ptrace(2) and are not regular process debuggers. I plan to remove this operation from HEAD and backport this to NetBSD-8(BETA), before the release, so no compat will be required for this call. Future ports of criu/rr should involve dedicated kernel support for such requirements.
  • Finish the backport of _UC_MACHINE_FP() to NetBSD-8. This will allow use of the same code in sanitizers in HEAD and NetBSD-8.0.
  • By popular demand, improve the regnsub(3) and regasub(3) API, adding support for more or less substitutions than 10.

Once done, I will return to ptrace(2) debugging and corrections.

This work was sponsored by The NetBSD Foundation.

The NetBSD Foundation is a non-profit organization and welcomes any donations to help us continue funding projects and services to the open-source community. Please consider visiting the following URL, and chip in what you can:

http://netbsd.org/donations/#how-to-donate [0 comments]

 



Post a Comment:
Comments are closed for this entry.