Regular Expressions: Groups

In Python, you can write the following, to capture groups of characters with regular expressions.

>>> import re
>>> print(“Match is ‘”
… +‘\\s([a-z]+)\\s’,
… ‘My text string.’).group(1) + “‘”)
Match is ‘text’

This is quite straightforward. In C#, you can write something similar.
using System.Text.RegularExpressions;
namespace MhNeifer.Samples.CSharp {
    public class MyRegex {
        static void Main() {
            Regex rgx = new Regex(@”\s([a-z]+)\s”);
            System.Console.WriteLine(“Match is ‘”
                           + rgx.Matches(“My text string.”)[0].Groups[1].Value
                           + “‘”);
If you ignore that C# is more wordy in general (namespace and class definition and all this), this is straightforward as well.
I thought that in Java it would be straightforward too. But it seems that there’s a catch. Or I’m too dumb to see the simple solution. I found the following.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Regex {
    public static void main(String[] args) {
        Pattern p = Pattern.compile(“.*\\s([a-z]+)\\s.*”);
        Matcher m = p.matcher(“My text string.”);
        System.out.println(“Match is ‘” + + “‘”);
While it looks straightforward, it is not. You have to call Matcher.matches() before, or you get an exception. I was surprised by this. Please note the ‘.*’ at the beginning and the end of the regular expression. You have to write a regular expression that matches the whole string. For me, this took a while to remember.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s