Tuesday, March 1, 2011

I've got problem with fine tuning of regex

i've got regex which was alright, but as it camed out doesn't work well in some situations

Keep eye on message preview cause message editor do some tricky things with "\"

[\[]?[\^%#\$\*@\-;].*?[\^%#\$\*@\-;][\]]

its task is to find pattern which in general looks like that

[ABA]

  • A - char from set ^,%,#,$,*,@,-,;
  • B - some text
  • [ and ] are included in pattern

is expected to find all occurences of this pattern in test string

Black fox [#sample1#] [%sample2%] - [#sample3#] eats blocks.

but instead of expected list of matches

  • "[#sample1#]"
  • "[%sample2%]"
  • "[#sample3#]"

I get this

  • "[#sample1#]"
  • "[%sample2%]"
  • "- [#sample3#]"

And it seems that this problem will occur also with other chars in set "A". So could somebody suggest changes to my regex to make it work as i need?

and less important thing, how to make my regex to exclude patterns which look like that

[ABC]

  • A - char from set ^,%,#,$,*,@,-,;
  • B - some text
  • C - char from set ^,%,#,$,*,@,-,; other than A
  • [ and ] are included in pattern

for example

[$sample1#] [%sample2@] [%sample3;]

thanks in advance

MTH

From stackoverflow
  • Why the first "?" in "[[]?"

    \[[\^%#\$\*@\-;].*?[\^%#\$\*@\-;]\]
    

    would detect your different strings just fine

    To be more precise:

    \[([\^%#\$\*@\-;])([^\]]*?)(?=\1)([\^%#\$\*@\-;])\]
    

    would detect [ABA]

    \[([\^%#\$\*@\-;])([^\]]*?)(?!\1)([\^%#\$\*@\-;])\]
    

    would detect [ABC]

    MoreThanChaos : well it seems that i was making so much changes that i missed this
  • You have an optional matching of the opening square bracket:

    [\]]?

    For the second part of you question (and to perhaps simplify) try this:

    \[\%[^\%]+\%\]|\[\#[^\#]+\#\]|\[\$[^\$]+\$\]

    In this case there is a sub pattern for each possible delimiter. The | character is "OR", so it will match if any of the 3 sub expressions match.

    Each subexpression will:

    • Opening bracket
    • Special Char
    • Everything that is not a special char (1)
    • Special char
    • Closing backet

    (1) may need to add extra exclusions like ']' or '[' so it doesn't accidently match across a large body of text like:

    [%MyVar#] blah blah [$OtherVar%]

    Rob

  • \[([%#$*@;^-]).+?\1\]
    

    applied to text:

    Black fox [#sample1#] [%sample2%] - [#sample3#] [%sample4;] eats blocks.
    

    matches

    • [#sample1#]
    • [%sample2%]
    • [#sample3#]
    • but not [%sample4;]

    EDIT

    This works for me (Output as expected, regex accepted by C# as expected):

    Regex re = new Regex(@"\[([%#$*@;^-]).+?\1\]");
    string s = "Black fox [#sample1#] [%sample2%] - [#sample3#] [%sample4;] eats blocks.";
    
    MatchCollection mc = re.Matches(s);
    foreach (Match m in mc)
    {
      Console.WriteLine(m.Value);
    }
    
    VonC : Yes but how would you detect ABC without using lookahead ? [^\1] does not work...
    MoreThanChaos : well c# RegEx engine seems not to like this expression, prhaps something was wrongly interpreted by message editor on this page?
    Tomalak : Well, reading it again - *Not* matching ABC is a requirement. My regex matches ABA exclusively. No lookahead needed.
    MoreThanChaos : Ofcourse you're right, in testing app i've had "ExplicitCapture" on for regex. So numbers of groups was just not right, diagnostic message didn't gaved me clue, so i reviewed my code and changed it, Now all works just fine, Thanks for your help
    Tomalak : So is that what you've been after or is it just one more way to do it?

0 comments:

Post a Comment