Code Question: Take multiples matches with regex separated by defined marks

Hello. I have a text and I need to take the content in a defined pattern. A content between MARK1 and MARK2 and content after MARK2. However, those marks can repeat and I need to take all their ocurrences. In the example below:

text: "textA textB _MARK1_ textC _MARK2_ textD _MARK1_ textE textF _MARK2_ textG textH textI"

array(0): _MARK1_ textC _MARK2_ textD 
array(1): textC
array(2): textD
array(3): _MARK1_ textE textF _MARK2_ textG textH textI 
array(4): textE textF
array(5): textG textH textI

From stackoverflow

I don't think you'll be able to achieve this with a single expression. Likely you'll need to break it down into an initial expression and then a loop to perform a 2nd expression match against each iteration of the first match.
Am I missing something or is this what you are looking for?
```
/(_MARK1_ (.*?) _MARK2 (.*?))*/
```
I made some arbitrary assumptions about how you want to handle spaces, which I realize were probably only consistent to make your example case more readable.
That would be:
```
/(_MARK1_(.*?)_MARK2_((?:(?!_MARK1_).)*))/g
```
At least, it works on RegEx Coach on your test case.
Of course, you need to iterate on each match.
Note it might not work on all flavors of regex: JavaScript, for example, has no lookahead assertions.

Davi Kenji : perfect. Thats it

Sparr : good catch, excluding _MARK2__MARK1_, I didn't cover that case in my solution
I'm not sure whether you actually need the separating marks in your array. That part seems superfluous unless you have a specific spec for it. This solution assumes you don't really need that. Since you didn't specify a language, how about Perl?
```
use Data::Dumper;
my $text = 'textA textB _MARK1_ textC _MARK2_ textD _MARK1_ textE textF _MARK2_ textG textH textI';
my @results = $text =~ m/(?<=_MARK1_|_MARK2_)(.*?)(?=_MARK1_|_MARK2_|$)/g;
print Data::Dumper::Dumper @results;
```
However, there's no reason to try the general case with regular expressions. Use a parser instead.

Code Question

Tuesday, April 5, 2011

Take multiples matches with regex separated by defined marks

0 comments:

Post a Comment

Blog Archive