Summer of Code results: Efficient Wide-Character Regular Expressions


September 28, 2009 posted by Alistair Crooks

The 2009 Summer of Code project to implement efficient wide-character regular expressions for NetBSD was carried out by Matthias-Christian Ott, mentored by myself. This blog entry gives an overview of the progress and results of the project.

Goals

The goal of the project was to enable wide character regular expressions to be added to the NetBSD base system.

Progress

The initial project outline was evaluated by Matthias-Christian and the possible ways of accomplishing the project were discussed. As there were a number of wide-character regular expression libraries available, with differing licenses, the first part of the project was spent investigating the advantages and disadvantages of the individual libraries, and a comparison matrix was produced. This period was also spent gathering regular expression test data, and writing fuzzer testing software, as well as looking at timing tests, and discussing how best to transition NetBSD from its traditional regular expression implementation to a new one.

The tre library from Ville Laurikari was identified as being the best performing regular expression library through tests, which confirmed the results others had obtained. After discussing the licensing of tre with Ville, he very kindly agreed to change the license on tre from LGPL to the NetBSD Foundation's recommended 2-clause license. Thanks, Ville.

The final part of the project dealt with porting tre to be lint-free and to build with WARNS=4 on NetBSD-current. Manual pages were written and revised, and regression tests were run, as well as timing information obtained.

Results

The project goal has been met, and much work was put into measuring performance, into documentation, and into regular expression fuzzer testing.

At the end of the project, it was found that, while extended regular expressions were very efficiently implemented by the tre library, there were some regression test differences with basic regular expressions, and the work to quantify these differences continues. Following that, the tre library will be merged into the NetBSD base source tree.

[1 comment]

 



Comments:

How can one get the test environment and/or the performance test results, please? Thank you.

Posted by André Dolenc on October 04, 2009 at 05:43 PM UTC #

Post a Comment:
Comments are closed for this entry.