Bug #3668

Compilation with -DWT_HAVE_GNU_REGEX needs some patches

Added by Laurence Withers about 8 years ago. Updated almost 8 years ago.

Target version:
Start date:
Due date:
% Done:


Estimated time:



I've been working to reduce the footprint of an application using Wt on my embedded system, and explored the -DWT_HAVE_GNU_REGEX option. I have hit two issues with wt-3.3.4-rc1: first, there's no way to turn off the Boost::Regex detection in CMake, and second, there is a type mismatch (seemingly introduced in commit b794e1e12d866d209ba153699f35b10fa00c5358) preventing compilation within src/web/WebController.C (a std::string changed to a const char* but the HAVE_GNU_REGEX path was not changed).

I will attach the patch I am using which is not suitable for general use as it just drops Boost::Regex from CMake altogether (I don't know enough CMake to implement an option), but the patch to the source should be correct.


wt-3.3.4-rc1-disable-boost-regex.patch (1.88 KB) wt-3.3.4-rc1-disable-boost-regex.patch Laurence Withers, 11/17/2014 06:26 PM
wt-3.3.4-rc1-gnu-regex-fcgi.patch (1.46 KB) wt-3.3.4-rc1-gnu-regex-fcgi.patch Laurence Withers, 11/18/2014 06:41 PM
wt-3.3.4-rc1-gnu-regex-refencoder.patch (1.72 KB) wt-3.3.4-rc1-gnu-regex-refencoder.patch Laurence Withers, 11/18/2014 06:41 PM
wt-3.3.4-rc1-WRegExp-full-match.patch (830 Bytes) wt-3.3.4-rc1-WRegExp-full-match.patch Laurence Withers, 12/10/2014 12:20 PM

Updated by Laurence Withers about 8 years ago

It turns out that further patches are needed against src/fcgi/Server.C and src/web/RefEncoder.C . I've got some experimental patches about to be tested, and can add those later.

I suppose the saving is significant. The application is built using static boost, static fcgi and static wt (but dynamic C/C libraries) and the difference in size with/without boost::regex is around 100KiB. Still, compared to the ugliness/potential bugginess of two code paths, it may not be worth maintaining this option.

Without boost::regex:

lwithers@rhodium Pt-web-build $ size Pt-web/obj/Pt-web.wt
   text    data     bss     dec     hex filename
3099086    6800   17376 3123262  2fa83e Pt-web/obj/Pt-web.wt

With boost::regex:

lwithers@rhodium Pt-web-build $ !size
size Pt-web/obj/Pt-web.wt
   text    data     bss     dec     hex filename
3201493    8264   17328 3227085  313dcd Pt-web/obj/Pt-web.wt

Updated by Laurence Withers about 8 years ago

As mentioned, there are a couple of other places that boost::regex is currently used directly.

The FastCGI connector uses it to pull out the session ID; attached is a patch that codes the same thing but with GNU regex. If I deliberately make that code do something else then the session never loads, so it seems correct.

The code which hides the session ID from the referrer URL when pulling in an absolute URL via inline CSS also used boost::regex directly; another patch for this is attached. This time it seemed better to use a different method altogether, rather than a regex. I tested it with lynx, which uses the session-ID-in-URL scheme, and inspected the generated HTML to be sure that the absolute URL had been correctly substituted for a redirect trampoline:


Updated by Koen Deforche about 8 years ago

  • Status changed from New to InProgress
  • Assignee set to Koen Deforche
  • Target version set to 3.3.4


Thanks for the patches. It's indeed a code path which isn't included on our build sever.

With respect to code size, it of course depends on the amount of features and widgets in Wt that you use. I'm right to believe you have followed the advice on this page:

It's a bit outdated and Wt may have been growing some fat but I would believe that you should still get the same order of magnitude results for the example mentioned. And then of course the additional 200K do matter alot (for no loss in functionality).




Updated by Laurence Withers about 8 years ago


Yes, I did follow the steps on that page (yes, a little outdated, but nothing that some digging/experimentation can't solve). On the x86_64 platform there has been a bit of an increase in filesize over time (note: all sizes after stripping debug symbols):

lwithers@amethyst 2014-11-01-wt-filesize $ ls -lh */src/lib*.so.?.?.?
-rwxr-xr-x 1 lwithers lwithers 6.9M Nov 17 14:28 3.3.0/src/
-rwxr-xr-x 1 lwithers lwithers 8.0M Nov 17 14:28 3.3.1/src/
-rwxr-xr-x 1 lwithers lwithers 8.5M Nov 17 14:28 3.3.2/src/
-rwxr-xr-x 1 lwithers lwithers 8.5M Nov 17 14:28 3.3.3/src/
-rwxr-xr-x 1 lwithers lwithers 8.8M Nov 17 14:28 3.3.4-rc1/src/

But this was massively amplified on the ARM (we have a PXA270 and an ARM926EJS platform), where 3.3.0 was 6.8M and 3.3.4-rc1 was 20M. I wasn't able to determine why the code size increased so significantly. Using the hidden visibility options reduced the compiled library to around 12M. And certainly the largest object file (8M unstripped) in the 3.3.4-rc1 build is the Spirit-based CSS parser. But all of the object files just seemed to massively increase in size.

The most elegant solution was switching to a static build and using function/data sections. This let me get the executable down much smaller, so that's what I ended up doing. As a bonus it also starts a fair bit faster too now, and I can afford to ship a version with debug symbols on the x86 platform.

The regex stuff was a step along the route. As I say, 100K is not much in the big scheme of things, it's really your call about whether it's worth maintaining the option or not. The potential for bugs is high!


Updated by Laurence Withers about 8 years ago

During testing I found another issue --- there is a difference in semantics between boost::regex::regex_match() and regexec(3). The boost version requires the expression match the whole string whereas the POSIX version does not. I worked around this by adding start/end of line chars within WRegExp (and also added error reporting if the expression fails to compile). See new attached patch.


Updated by Koen Deforche almost 8 years ago

  • Status changed from InProgress to New


Perhaps it's more worth-while looking at other things which have less risk of introducing bugs. For sure, a static build is the best way to deploy to embedded systems.



Also available in: Atom PDF