Java Regex Search and Replace

12 August 2008 ~ blog java

For many years, I felt that there was nothing "regular" about Regular Expressions, but lately I have been warming
up to them a bit. The QuickRex Eclipse plug-in
has really helped make them easier to manage, but that's not what this post is about.

I recently needed to do a regex-based search and replace operation to convert all the html entities in a string to their
actual character equivalents, basically unescape all the entities in an html string (don't ask why). With a little regex
and a little searching documentation browsing I found that it is very easy to do. Start out with the pattern, which should
be a static class member (it is thread-safe once created):

private static final Pattern entityPattern = Pattern.compile("(&[a-z]*;)");

The pattern will match any html entity, which have the form &name;. Next we need the search and replace code:

private String unescapeEntities(final String html){
    final StringBuffer buffer = new StringBuffer();
    final Matcher matcher = entityPattern.matcher(html);
    while (matcher.find()) {
        matcher.appendReplacement(
            buffer,
            StringEscapeUtils.unescapeHtml(matcher.group())
        );
    }
    matcher.appendTail(buffer);
    return buffer.toString();
}

Your StringBuffer will end up with the replaced content of your string. The StringEscapeUtils class is from the
Jakarta Commons - Lang API. Sorry, this isn't much of a tutorial... it's more of a
code snippet for future use.


Creative Commons License CoffeaElectronica.com content is copyright © 2016 Christopher J. Stehno and available under a Creative Commons Attribution-ShareAlike 4.0 International License.