Hello. I have a text and I need to take the content in a defined pattern. A content between MARK1 and MARK2 and content after MARK2. However, those marks can repeat and I need to take all their ocurrences. In the example below:
text: "textA textB _MARK1_ textC _MARK2_ textD _MARK1_ textE textF _MARK2_ textG textH textI"
array(0): _MARK1_ textC _MARK2_ textD
array(1): textC
array(2): textD
array(3): _MARK1_ textE textF _MARK2_ textG textH textI
array(4): textE textF
array(5): textG textH textI
-
I don't think you'll be able to achieve this with a single expression. Likely you'll need to break it down into an initial expression and then a loop to perform a 2nd expression match against each iteration of the first match.
-
Am I missing something or is this what you are looking for?
/(_MARK1_ (.*?) _MARK2 (.*?))*/
I made some arbitrary assumptions about how you want to handle spaces, which I realize were probably only consistent to make your example case more readable.
-
That would be:
/(_MARK1_(.*?)_MARK2_((?:(?!_MARK1_).)*))/g
At least, it works on RegEx Coach on your test case.
Of course, you need to iterate on each match.
Note it might not work on all flavors of regex: JavaScript, for example, has no lookahead assertions.Davi Kenji : perfect. Thats itSparr : good catch, excluding _MARK2__MARK1_, I didn't cover that case in my solution -
I'm not sure whether you actually need the separating marks in your array. That part seems superfluous unless you have a specific spec for it. This solution assumes you don't really need that. Since you didn't specify a language, how about Perl?
use Data::Dumper; my $text = 'textA textB _MARK1_ textC _MARK2_ textD _MARK1_ textE textF _MARK2_ textG textH textI'; my @results = $text =~ m/(?<=_MARK1_|_MARK2_)(.*?)(?=_MARK1_|_MARK2_|$)/g; print Data::Dumper::Dumper @results;
However, there's no reason to try the general case with regular expressions. Use a parser instead.
0 comments:
Post a Comment