I'm wanting to match a list of words which is easy enough when those words are truly words. For example /\b (pop|push) \b/gsx
when ran against the string
pop gave the door a push but it popped back
will match the words pop and push but not popped.
I need similar functionality for words that contain characters that would normally qualify as word boundaries. So I need /\b (reverse!|push) \b/gsx
when ran against the string
push reverse! reverse!push
to only match reverse! and push but not match reverse!push. Obviously this regex isn't going to do that so what do I need to use instead of \b to make my regex smart enough to handle these funky requirements?
-
Your first problem is that you need three (possibly four) cases in your alternation, not two.
/\breverse!(?:\s|$)/
reverse! by itself/\bpush\b/
push by itself/\breverse!push\b/
together/\bpushreverse!(?:\s|$)/
this is the possible case
Your second problem is that a
\b
won't match after a"!"
because it is not a\w
. Here is what Perl 5 has to say about\b
, you may want to consult your docs to see if they agree:A word boundary ("\b") is a spot between two characters that has a "\w" on one side of it and a "\W" on the other side of it (in either order), counting the imaginary characters off the beginning and end of the string as matching a "\W". (Within character classes "\b" represents backspace rather than a word boundary, just as it normally does in any double-quoted string.)
So, the regex that you need is something like
/ \b ( reverse!push | reverse! | push ) (?: \s | \b | $ )+ /gx;
I left out the
/s
because there are not periods in this regex, so treat as single line makes no sense. If/s
doesn't mean treat as a single line in your engine you should probably add it back. Also, you should read up on how your engine handles alternation. I know in Perl 5 to get the right behaviour you must arrange the items this way (otherwise reverse! would always win over reverse!push).Alan Moore : Read the question again, Chas; the OP *doesn't* want to match "reverse!push". -
At the end of a word, \b means "the previous character was a word character, and the next character (if there is a next character) is not a word character. You want to drop the first condition because there might be a non-word character at the end of the "word". That leaves you with a negative lookahead:
/\b (reverse!|push) (?!\w)/gx
I'm pretty sure AS3 regexes support lookahead.
DL Redden : In addition to using (?!\w) as the trailing \b replacement I also used (?You can replace \b by something equivalent, but less strict:
/(?<=\s|^)(reverse!|push)(?=\s|$)/g
This way the limiting factor of the
\b
(that it can only match before or after an actual\w
word character) is removed.Now white space or the start/end of the string function as valid separators, and the inner expression can be easily built at run-time, from a list of search terms for example.
0 comments:
Post a Comment