Friday, April 29, 2011

Groovy syntax for regular expression matching

What is the Groovy equivalent of the following Perl code?

my $txt = "abc : groovy : def";
if ($txt =~ / : (.+?) : /) {
  my $match = $1;
  print "MATCH=$match\n"; 
  # should print "MATCH=groovy\n"
}

I know that TMTOWTDI (including the regular Java way) - but what is the "Groovy way" of doing it?

This is one way of doing it, but it feels a bit clumsy - especially the array notation (m[0][1]) which feels a bit strange. Is there a better way do it? If not - please describe the logic behind m[0][1].

def txt = "java : groovy : grails"
if ((m = txt =~ / : (.+?) :/)) {
  def match = m[0][1]
  println "MATCH=$match"
}
From stackoverflow
  • The "groovy way" is documented on their wiki.

    Personally, I find such syntax to be overly arcane, and would just use standard Java.

    knorv : Please see my clarification.
  • This is my best understanding of how to do this using Groovy syntax (but see lfaraone's response too):

    import java.util.regex.Matcher
    
    def txt = 'abc : groovy : def'
    if (txt =~ ~/ : (.+?) : /) {
        def match = Matcher.lastMatcher[0][1]
        println "MATCH=$match"
    }
    
    knorv : Thanks for your reply! I think your code will fail with an IndexOutOfBoundsException if there is no match. I edited my post before I saw your reply - so you might want to revisit your post. Please see the stuff about the m[0][1] notation.
    Chris Jester-Young : Thanks, I've updated my answer. Also, a Matcher in Java can have multiple matches. (Think of the /g flag for Perl matches.) The first index allows you to specify the match you care about.
  • This was the closest match to the Perl code that I could achieve:

    def txt = "abc : groovy : def"
    if ((m = txt =~ / : (.+?) : /)) {
      def match = m.group(1)
      println "MATCH=$match"
    }
    
  • m[0] is the first match object.
    m[0][0] is everything that matched in this match.
    m[0][1] is the first capture in this match.
    m[0][2] is the second capture in this match.

    Based on what I have read (I don't program in Groovy or have a copy handy), given

    def m = "barbaz" =~ /(ba)([rz])/;
    

    m[0][0] will be "bar"
    m[0][1] will be "ba"
    m[0][2] will be "r"
    m[1][0] will be "baz"
    m[1][1] will be "ba"
    m[1][2] will be "z"

    I could stand not knowing if I was right or not, so I downloaded groovy and wrote an example:

    def m = "barbaz" =~ /(ba)([rz])/;
    
    println "m[0][0] " + m[0][0]
    println "m[0][1] " + m[0][1]
    println "m[0][2] " + m[0][2]
    println "m[1][0] " + m[1][0]
    println "m[1][1] " + m[1][1]
    println "m[1][2] " + m[1][2]
    
    knorv : Thanks for your reply. Could you give an example on when there could be multiple match objects (m[1])?

0 comments:

Post a Comment