Bug #11321

Semantics of boost::split and Utils:split are not properly preserved when translating to Java

Added by Roel Standaert about 1 month ago. Updated about 1 month ago.

Implemented @Emweb
Target version:
Start date:
Due date:
% Done:


Estimated time:


boost::split's behavior is notably different from Java's String#split:

  • Splitting "," on the ',' character yields a list of two empty strings with boost::split (and in many other programming languages, e.g. Python, JavaScript, Rust,...), whereas Java yields an empty list.
  • We use boost::is_any_of, which interprets its arguments as a sequence of characters to split on, but String#split treats its argument as a regular expression!

This means that:

  1. Translating Utils::split from web/StringUtils.h to eu.webtoolkit.jwt.StringUtils#split is incorrect, since StringUtils#split simply uses String#split
  2. Translating boost::split($0;,$1;,boost::is_any_of($2;)) to $0; = new ArrayList<String>(Arrays.asList($1;.split($2;))) is incorrect

We should write a StringUtils#split that does have the correct semantics, unit test it, and translate all of our usages of Utils::split and boost::split to use it.

I noticed this while working on WEmailEdit and WEmailValidator, where I noticed that JWt would validate lists of multiple email addresses differently from C++ (issue #7279).


Updated by Roel Standaert about 1 month ago

  • Status changed from InProgress to Review
  • Assignee deleted (Roel Standaert)

Updated by Roel Standaert about 1 month ago

  • Status changed from Review to Implemented @Emweb
  • Assignee set to Roel Standaert
  • % Done changed from 0 to 100

Also available in: Atom PDF