Monday, April 25, 2011

How can I find repeated characters with a regex in Java?

Can anyone give me a Java regex to identify repeated characters in a string? I am only looking for characters that are repeated immediately and they can be letters or digits.

Example:

abccde <- looking for this (immediately repeating c's)

abcdce <- not this (c's seperated by another character)

From stackoverflow
  • Try "(\\w)\\1+"

    The \\w matches any word character (letter, digit, or underscore) and the \\1+ matches whatever was in the first set of parentheses, one or more times. So you wind up matching any occurrence of a word character, followed immediately by one or more of the same word character again.

    (Note that I gave the regex as a Java string, i.e. with the backslashes already doubled for you)

    Cerebrus : Good one, David. But maybe it should be "((\\w)\\2+)+". That would match the repeating pair - any no. of times and would match the entire set of repeating occurences in Backref #1.
    JediPotPie : Since java implictly adds the "^" and "$" delimiters, this expression will match strings like "cc" and "cccc" but not "xcc" etc. That's where I get stuck. How can I make the regex match anywhere in the string?
    JediPotPie : I guess my problem was that I was using the "matches()" method to check for a match. My mistake. Thanks for the help.
    Alan Moore : @Cerebrus, I don't see the benefit of that outer set of parentheses. If the input were "aabbbcddd" your regex would match "aabbb" the first time you call find(), then match "ddd" the next time around. All you get is a trivial performance gain for performing fewer matches.
    Gennadiy : A more explicit regex is as follows: ".*([0-9A-Za-z])\\1+.*". This searches for repeats anywhere in the string and can be used with pattern.matcher(...).matches()
  • Regular Expressions are expensive. You would probably be better off just storing the last character and checking to see if the next one is the same. Something along the lines of:

    String s;
    char c1, c2;
    c1 = s.charAt(0);
    for(int i=1;i<s.length(); i++){
        char c2 = s.charAt(i);
    
        // Check if they are equal here
    
        c1=c2;
    }
    
    Cerebrus : More expensive than manually iterating through a string's chars ? I don't think so!
    JediPotPie : Yep, that's one way to do it but that's not the way i need. I need a regular expression.
    Joachim Sauer : @John Terry: you think this is a worthwhile optimization and yet you program in Java? Strange. The regex-version is shorter and quicker to grok. I'd choose it any day.
    Alan Moore : And simply saying regexes are "expensive" is just FUD. It's true that a regex-based solution can never be as fast as a well-written solution based on the low-level String API, but Java's regexes are plenty fast enough for most applications.
  • String stringToMatch = "abccdef";
    Pattern p = Pattern.compile("(\\w)\\1+");
    Matcher m = p.matcher(stringToMatch);
    if (m.find())
    {
        System.out.println("Duplicate character " + m.group(1));
    }
    

0 comments:

Post a Comment