Paul McGuire
APUG – May, 2016
WritingParsersinPython
UsingPyparsing
WritingParsersinPython
UsingPyparsing
Paul McGuire
APUG – May, 2016
Best practices:
… highlighted in the examples
integer = Word('0123456789')
phone_number = Optional('(' + integer + ')') + integer + '-' + integer
# re.compile(r'((d+))?d+-d+')
greet = Word(alphas) + "," + Word(alphas) + "!"
greet.parseString("Hello, World!")
Best practice:
Don’t include whitespace in
the parser definition
geo:27.9878,86.9250,8850;crs=wgs84;u=100
geo:-26.416,27.428,-3900;u=100
geo:17.75,142.5,-11033;crs=wgs84;u=100
geo:36.246944,-116.816944,-85;u=50
geo:30.2644663,-97.7841169;a=100;href=http://www.allure-energy.com/
https://tools.ietf.org/html/rfc5870
geo-URI = geo-scheme ":" geo-path
geo-scheme = "geo"
geo-path = coordinates p
coordinates = num "," num [ "," num ]
p = [ crsp ] [ uncp ] [";" other]...
crsp = ";crs=" crslabel
crslabel = "wgs84" / labeltext
uncp = ";u=" uval
other = labeltext "=" val
val = uval / chartext
Best practice:
Start with a BNF
patt = r'geo:(-?d+(?:.d*)?),(-?d+(?:.d*)?)(?:,(-?d+(?:.d*)?))?' +
r'(?:;(crs=[^;]+))?(?:;(u=d+(?:.d*)?))?'
print(re.compile(patt).match(tests[0]).groups())
('27.9878', '86.9250', '8850', 'crs=wgs84', 'u=100')
ParseResult(scheme='geo', netloc='',
path='27.9878,86.9250,8850;crs=wgs84;u=100', params='',
query='', fragment='')
from pyparsing import *
EQ,COMMA = map(Suppress, "=,")
number = Regex(r'-?d+(.d*)?').addParseAction(lambda t: float(t[0]))
geo_coords = Group(number('lat') + COMMA + number('lng') +
Optional(COMMA + number('alt')))
crs_arg = Group('crs' + EQ + Word(alphanums))
u_arg = Group('u' + EQ + number)
url_args = Dict(delimitedList(crs_arg | u_arg, ';'))
geo_url = "geo:" + geo_coords('coords') + Optional(';' + url_args('args'))
Best practice:
Use parse actions for conversions
Best practice:
Use results names
tests = """
geo:36.246944,-116.816944,-85;u=50
geo:30.2644663,-97.7841169;a=100;href=http://www.allure-energy.com/
"""
geo_url.runTests(tests)
assert geo_url.matches("geo:36.246944,-116.816944,-85;u=50“)
assert geo_url.matches("geo:36.246944;u=50“)
Best practice:
runTests() is new in 2.0.4
Best practice:
Use matches() for incremental inline
validation of your parser elements
geo:36.246944,-116.816944,-85;u=50
['geo:', [36.246944, -116.816944, -85.0], ';', [['u', 50.0]]]
- args: [['u', 50.0]]
- u: 50.0
- coords: [36.246944, -116.816944, -85.0]
- alt: -85.0
- lat: 36.246944
- lng: -116.816944
geo:30.2644663,-97.7841169;a=100;href=http://www.allure-energy.com/
['geo:', [30.2644663, -97.7841169]]
- coords: [30.2644663, -97.7841169]
- lat: 30.2644663
- lng: -97.7841169
from pyparsing import *
EQ,COMMA = map(Suppress, "=,")
number = Regex(r'-?d+(.d*)?').addParseAction(lambda t: float(t[0]))
geo_coords = Group(number('lat') + COMMA + number('lng') +
Optional(COMMA + number('alt')))
crs_arg = Group('crs' + EQ + Word(alphanums))
u_arg = Group('u' + EQ + number)
other = Group(Word(alphas) + EQ + CharsNotIn(';'))
url_args = Dict(delimitedList(crs_arg | u_arg | other, ';'))
geo_url = "geo:" + geo_coords('coords') + Optional(';' + url_args('args'))
geo:36.246944,-116.816944,-85;u=50
['geo:', [36.246944, -116.816944, -85.0], ';', [['u', 50.0]]]
- args: [['u', 50.0]]
- u: 50.0
- coords: [36.246944, -116.816944, -85.0]
- alt: -85.0
- lat: 36.246944
- lng: -116.816944
geo:30.2644663,-97.7841169;a=100;href=http://www.allure-energy.com/
['geo:', [30.2644663, -97.7841169], ';',
[['a', '100'], ['href', 'http://www.allure-energy.com/']]]
- args: [['a', '100'], ['href', 'http://www.allure-energy.com/']]
- a: 100
- href: http://www.allure-energy.com/
- coords: [30.2644663, -97.7841169]
- lat: 30.2644663
- lng: -97.7841169
geo = geo_url.parseString('geo:27.9878,86.9250,8850;crs=wgs84;u=100')
print(geo.dump())
['geo:', [27.9878, 86.925, 8850.0], ';', [['crs', 'wgs84'], ['u', 100.0]]]
- args: [['crs', 'wgs84'], ['u', 100.0]]
- crs: wgs84
- u: 100.0
- coords: [27.9878, 86.925, 8850.0]
- alt: 8850.0
- lat: 27.9878
- lng: 86.925
print(geo.coords.alt)
8850.0
print(geo.args.asDict())
{'crs': 'wgs84', 'u': 100.0}
Best practice:
dump() is very useful for seeing the
structure and names in the parsed
results
Best practice:
pprint() is useful for seeing the results
structure if no results names are
defined
https://bitbucket.org/mchaput/whoosh/overview
TrafficLight = {
Red -> Green;
Green -> Yellow;
Yellow -> Red;
}
DocumentRevision = {
New -( create )-> Editing;
Editing -( cancel )-> Deleted;
Editing -( submit )-> PendingApproval;
PendingApproval -( reject )-> Editing;
PendingApproval -( approve )-> Approved;
Approved -( activate )-> Active;
Active -( deactivate )-> Approved;
Approved -( retire )-> Retired;
Retired -( purge )-> Deleted;
}
https://pypi.python.org/pypi/zhpy/1.7.4
http://zh-tw.enc.tfode.com/%E5%91%A8%E8%9F%92
https://iusb.edu/computerscience/faculty-and-
staff/faculty/jwolfer/005.pdf
rinit # initialize communication
rsens Rd # read robot sensors
rspeed Rright,Rleft # set robot motor speeds
rspeed $immed,$immed
Crumble
http://redfernelectronics.co.uk
http://www.enodev.fr/
LC
TD 90 AV 60
TD 90 AV 80
BC
REPETE 4 [ TG 90 AV 20 ]
LC
AV 80
BC
REPETE 4 [ AV 20 TG 90 ]
http://pyparsing.wikispaces.com
http://pythonhosted.org/pyparsing/
ptmcg@austin.rr.com
Writing Parsers in Python using Pyparsing
Writing Parsers in Python using Pyparsing
Writing Parsers in Python using Pyparsing

Writing Parsers in Python using Pyparsing