Since I unsuccessfully tried to figure out balanced matching in regular expressions not so long ago, I'm going to keep a copy of Ryan Byington lesson
on the subject here.
of the questions that seems to come up a lot is that someone wants to
match balanced parenthesis. Something like the string “(aa (bbb) (bbb)
aa)” and they want to match from the beginning parenthesis to the
matching end parenthesis. Generally this
is not possible with regular expression, that language just is not
descriptive enough to handle this. For the longest time this is how I
answered these question when they came to me.
in .Net this is actually possible with something called Balancing Group
Definition. This construct generally looks like (?<name1-name2>).
The following is what MSDN has to say about this:
group definition. Deletes the definition of the previously defined
group name2 and stores in group name1 the interval between the
previously defined name2 group and the current group. If no group name2
is defined, the match backtracks. Because deleting the last definition
of name2 reveals the previous definition of name2, this construct
allows the stack of captures for group name2 to be used as a counter
for keeping track of nested constructs such as parentheses. In this
construct, name1 is optional. You can use single quotes instead of
angle brackets; for example, (?'name1-name2').
following expression matches all balanced opening and closing angle
brackets(<>). Angle brackets were used because they do no require
escaping like parenthesis and make the expression a little easier to
outer most group just matches an open angle bracket followed by
anything that is not a angle bracket followed by close angle bracket. I
will explain “(?(Open)(?!))” later.
inner group does all of the interesting angle bracket matching. The
Open group matches only the open angle bracket and the following part
of expression matches anything that is not an angle bracket. So the
first group will basically match anything up till the first close angle
is best to think of a Group as a Stack of captures. Where the top of
the stack is the last capture made. (?<Close-Open>\)) Matches to
“)” and pops a capture off of the Open group’s capture stack. This
match can only be successful if and only if the Open group’s capture
stack is not empty. This is a fancy way of saying that for every match
of this group there must be a match of the group Open.
now we know that for every closing angle bracket there must have been
an opening angle bracket. However we still have done nothing to assert
that for every opening angle bracket there is a matching closing angle
bracket. That is where the (?(Open)(?!)) part of the expression comes
into play. This expression tells Regex to match (?!) if the Open group
still contains a match(i.e. there were more open angle brackets then
close angle brackets). Trying to match (?!) will always cause the
expression to fail. Basically this is a way of making the expression
fail if the Open group still contains a capture.
[.NET Regular Expressions: Regex and Balanced Matching [Ryan Byington]]
posted @ Wednesday, March 16, 2005 9:35 PM