Python: Regex Test Function
More fun with Python and regular expressions. Following up on a previous post, I wanted to share a little test regex function I wrote in Python to help me as I work through the regular expression book.
I’m mostly working at the interactive prompt and had been running commands from Python re (the regex module) as I experimented with different regular expressions. This was good as I spent time in help(re) and built up some muscle memory for Python regex functions, but it was becoming repetitious to keep typing the commands for analyzing the results of a match. Once I started learning about writing functions in Python, I realized it was time to enhance my regex learning experience with a simple Python function.
I know there are sophisticated regex tools out there and probably simple functions that do more than this, but it was fun to cobble the function together and learn more about Python in order to learn more about Regex. So far it’s proven helpful in understanding how regular expressions work. I hope it may be of use to you also.
Function Definition
The function will print whether there is a match or not, starting and ending positions along with the matched part of the string for each match, captured strings (groups), and then finally will do a global search and replace on the string and print the result.
match(pattern, string[, repl])
- pattern: the regular expression
- string: the target string to be matched against and replacements made against
- repl: the optional string to use for replacements (default = _._)
(I would have preferred putting repl before string to match the re.sub parameter order, but I switched them to make it an optional last argument.) I put the function in a file named imisc.py (interactive miscellaneous) that I import into an interactive session to make regex experimentation more convenient.
Keep reading below the fold for examples and the actual function!
Examples
In this first example, capturing parentheses aren’t used in the regex so there are no captured groups displayed. The r in r'\d+' indicates a “raw” string which saves us from having to escape backslashes with more backslashes. The default _._ is used for replacements.
>>> imisc.match(r'\d+', 'Go to 4782 West 70th St.') a match! 1) start: 6, end: 10, str: 4782 2) start: 16, end: 18, str: 70 global replace (_._): Go to _._ West _._th St. >>>
Next we’ll use capturing parentheses to collect strings in \1 and \2. We can see these values displayed in the match groups, and we’ll use \2 in our global replace. ((?i) is a mode switch for a case-insensitive match.)
>>> imisc.match(r'(?i)The (\w+) (\w+)\.?',
... 'The quick brown fox jumps over the lazy dog.', r'\2')
a match!
1) start: 0, end: 15, str: The quick brown
groups: ('quick', 'brown')
2) start: 31, end: 44, str: the lazy dog.
groups: ('lazy', 'dog')
global replace (\2):
brown fox jumps over dog
>>>
Finally, some zero-width matches on “nothing”:
>>> imisc.match(r'z?', 'abc', '_') a match! 1) start: 0, end: 0, str: 2) start: 1, end: 1, str: 3) start: 2, end: 2, str: 4) start: 3, end: 3, str: global replace (_): _a_b_c_ >>>
The Match Function
I’ll place this humble bit of code into the public domain to make it painless to share and include in your own work. I hope if my function finds its way in to a larger work that you’ll do the right thing and share it under a free software license. :-)
import re
def match(pattern, string, repl='_._'):
r = re.compile(pattern)
m = r.search(string)
if m:
print('a match!')
i = 0
while m:
m_start = m.start()
m_end = m.end()
i += 1
print( '%d) start: %d, end: %d, str: %s' %
(i, m_start, m_end, string[m_start:m_end]) )
if m.groups(): # capturing groups
print(' groups: ' + str(m.groups()))
if m_end == len(string): # infinite loop if
break # m_start == m_end == len(string)
elif m_start == m_end: # zero-width match;
m_end += 1 # keep things moving along
m = r.search(string, m_end)
print( 'global replace (%s):\n%s' %
(repl, re.sub(pattern, repl, string)) )
else:
print('not a match')
Related
Comments
-
Hi Scott,
Thanks for sharing. For folks who use the interactive python prompt a lot, I recommend IPython. It is an improved python shell that is a lot more fun and convenient than the vanilla python interpreter.
Cheers,
MichaelPosted by Michael on 31 March 2008 at 6:47 am
You can follow any responses to this entry through the
comments feed.



bookmark with del.icio.us
Richard Stallman:


