Thursday, July 9, 2009

Mark text inside XML

This week I had to implement really simple feature - mark selected words in a XHTML content. Let's say that we need to wrap term 'door' as <strong>door</strong>. The XHTML snippet is

<a href='/door.html'>Our door</a>

String replace will break the structure of the XHTML:

<a href='/<strong>door</strong>.html'>Our <strong>door</strong></a>

Another solution is to use regular expression: ((<.[^>]*>))|(\bdoor\b). This regexp has two parts:
  1. ((<.[^>]*>)) - matches any tag

  2. (\bdoor\b)- matches word 'door'

Method that wraps word 'door' is

// rawContent contains input string
Pattern pattern = Pattern.compile("(<.[^>]*>)|(\\bdoor\\b)", Pattern.CASE_INSENSITIVE | Pattern.MULTILINE);
Matcher matcher = pattern.matcher(rawContent);

int lastMatchEnd = 0;
boolean result = matcher.find();
if (result) {
StringBuffer sb = new StringBuffer();
do {
// Add text before match
sb.append(rawContent.substring(lastMatchEnd, matcher.start()));
lastMatchEnd = matcher.end();
String temp =;
if (!temp.startsWith("<")) {
} else {
result = matcher.find();
} while (result);
return sb.toString();
return rawContent;