Increasing coverage of signal semantics in regression tests


March 04, 2019 posted by Kamil Rytarowski

Kernel signal code is a complex maze, it's very difficult to introduce non-trivial changes without regressions. Over the past month I worked on covering missing elementary scenarios involving the ptrace(2) API. Part of the new tests were marked as expected to success, however a number of them are expected to fail.

The NetBSD distribution changes

I've also introduced non-ptrace(2) related changes namely from the domain of kernel sanitizers, kernel fixes and corresponding ATF tests. I won't discuss them further as they were beyond the ptrace(2) scope. These changes were largely stimulated by students preparing for summer work as a part of Google Summer of Code.

The ptrace(2) ATF commits landed into the repository:

  • Define PTRACE_ILLEGAL_ASM for NetBSD/amd64 in ptrace.h
  • Enable 3 new ptrace(2) tests for SIGILL
  • Refactor GPR and FPR tests in t_ptrace_wait* tests
  • Refactor definition of PT_STEP tests into single macro
  • Correct a style in description of PT_STEP tests in t_ptrace_wait*
  • Refactor kill* test in t_ptrace_wait*
  • Add infinite_thread() for ptrace(2) ATF tests
  • Add initial pthread(3) tests in ATF t_prace_wait* tests
  • Link t_ptrace_wait* tests with -pthread
  • Initial refactoring of siginfo* tests in t_ptrace_wait*
  • Drop siginfo5 from ATF tests in t_ptrace_wait*
  • Merge siginfo6 into other PT_STEP tests in t_ptrace_wait*
  • Rename the siginfo4 test in ATF t_ptrace_wait*
  • Refactor lwp_create1 and lwp_exit1 into trace_thread* in ptrace(2) tests
  • Rename signal1 to signal_mask_unrelated in t_ptrace_wait*
  • Add new regression scenarios for crash signals in t_ptrace_wait*
  • Replace signal2 in t_ptrace_wait* with new tests
  • Add new ATF tests traceme_raisesignal_ignored in t_ptrace_wait*
  • Add new ATF tests traceme_signal{ignored,masked}_crash* in t_ptrace_wait*
  • Add additional assert in traceme_signalmasked_crash t_ptrace_wait* tests
  • Add additional assert in traceme_signalignored_crash t_ptrace_wait* tests
  • Remove redundant test from ATF t_ptrace_wait*
  • Add new ATF t_ptrace_wait* vfork(2) tests
  • Add minor improvements in unrelated_tracer_sees_crash in t_ptrace_wait*
  • Add more tests for variations of unrelated_tracer_sees_crash in ATF
  • Replace signal4 (PT_STEP) test with refactored ones with extra asserts
  • Add signal masked and ignored variations of traceme_vfork_exec in ATF tests
  • Add signal masked and ignored variations of traceme_exec in ATF tests
  • Drop signal5 test-case from ATF t_ptrace_wait*
  • Refactor signal6-8 tests in t_ptrace_wait*

Trap signals processing without signal context reset

The current NetBSD kernel approach of processing crash signals (SEGV, FPE, BUS, ILL, TRAP) is to reset the context of signals. This behavior was introduced as an intermediate and partially legitimate fix for cases of masking a crash signal that was causing infinite loop in a dying process.

The expected behavior is to never reset signal context of a trap signal (or any other signal) when executed under a debugger. In order to achieve these semantics I've introduced a fix for this for the first time last year, but I had to revert quickly, as it caused side effect breakage, not covered by existing at that time ATF ptrace(2) regression tests. This time I made sure to cover upfront almost all interesting scenarios that are requested to function properly. Surprisingly after grabbing old faulty fix and improving it locally, the current signal maze code caused various side effects in corner cases, such as translating SIGKILL in certain tests to previous trap signal (like SIGSEGV).. In other cases side effect behavior seems to be probably even stranger, as one tests hangs only against a certain type of wait(2)-like function (waitid(2)), and executes without hangs against other wait(2)-like function types.

For the reference such surprises can be achieved with the following patch:

Index: sys/kern/kern_sig.c
===================================================================
RCS file: /cvsroot/src/sys/kern/kern_sig.c,v
retrieving revision 1.350
diff -u -r1.350 kern_sig.c
--- sys/kern/kern_sig.c	29 Nov 2018 10:27:36 -0000	1.350
+++ sys/kern/kern_sig.c	3 Mar 2019 19:26:54 -0000
@@ -911,13 +911,25 @@
 	KASSERT(!cpu_intr_p());
 	mutex_enter(proc_lock);
 	mutex_enter(p->p_lock);
+
+	if (ISSET(p->p_slflag, PSL_TRACED) &&
+	    !(p->p_pptr == p->p_opptr && ISSET(p->p_lflag, PL_PPWAIT))) {
+		p->p_xsig = signo;
+		p->p_sigctx.ps_faked = true; // XXX
+		p->p_sigctx.ps_info._signo = signo;
+		p->p_sigctx.ps_info._code = ksi->ksi_code;
+		sigswitch(0, signo, false);
+		// XXX ktrpoint(KTR_PSIG)
+		mutex_exit(p->p_lock);
+		return;
+	}
+
 	mask = &l->l_sigmask;
 	ps = p->p_sigacts;
 
-	const bool traced = (p->p_slflag & PSL_TRACED) != 0;
 	const bool caught = sigismember(&p->p_sigctx.ps_sigcatch, signo);
 	const bool masked = sigismember(mask, signo);
-	if (!traced && caught && !masked) {
+	if (caught && !masked) {
 		mutex_exit(proc_lock);
 		l->l_ru.ru_nsignals++;
 		kpsendsig(l, ksi, mask);

Such changes need proper investigation and addressing bugs that are now detectable easier with the extended test-suite.

Plan for the next milestone

Keep preparing kernel fixes and after thorough verification applying them to the mainline.

This work was sponsored by The NetBSD Foundation.

The NetBSD Foundation is a non-profit organization and welcomes any donations to help us continue funding projects and services to the open-source community. Please consider visiting the following URL to chip in what you can:

http://netbsd.org/donations/#how-to-donate [0 comments]

 



Post a Comment:
Comments are closed for this entry.