September 28, 2009 posted by Alistair Crooks
The 2009 Summer of Code
project to implement efficient wide-character regular expressions
for NetBSD was carried out by Matthias-Christian Ott, mentored by myself. This
blog entry gives an overview of the progress and results of the project.
The goal of the project was to enable wide character regular expressions
to be added to the NetBSD base system.
project outline was evaluated by Matthias-Christian and the possible
ways of accomplishing the project were discussed.
As there were a number of wide-character regular expression libraries
available, with differing licenses, the first part of the project was
spent investigating the advantages and disadvantages of the individual
libraries, and a comparison matrix was produced. This period was also spent gathering regular expression test
data, and writing fuzzer testing software, as well as looking at timing
tests, and discussing how best to transition NetBSD from its
traditional regular expression implementation to a new one.
The tre library from Ville Laurikari was identified as being the best performing
regular expression library through tests, which confirmed the results others had
obtained. After discussing the licensing of tre with Ville, he very kindly agreed
to change the license on tre from LGPL to the NetBSD Foundation's recommended
2-clause license. Thanks, Ville.
The final part of the project dealt with porting tre to be lint-free
and to build with WARNS=4 on NetBSD-current. Manual pages were written and
revised, and regression tests were run, as well as timing information
The project goal has been met, and much work was put into
measuring performance, into documentation, and into regular expression
At the end of the project, it was found that, while extended regular expressions
were very efficiently implemented by the tre library, there were some regression
test differences with basic regular expressions, and the work to quantify these
differences continues. Following that, the tre library will be merged into
the NetBSD base source tree.