writing parsers in python using pyparsing

22
Paul McGuire APUG – May, 2016 Writing Parsers in Pytho Using Pyparsin Writing Parsers in Pytho Using Pyparsin Paul McGuire APUG – May, 2016

Upload: ptmcg

Post on 13-Apr-2017

75 views

Category:

Software


7 download

TRANSCRIPT

Page 1: Writing Parsers in Python using Pyparsing

Paul McGuireAPUG – May, 2016

Writing Parsers in PythonUsing Pyparsing

Writing Parsers in PythonUsing Pyparsing

Paul McGuireAPUG – May, 2016

Page 2: Writing Parsers in Python using Pyparsing

Agenda• Quick Intro / Demo• Parsing 'geo:' URLs• Other Examples of Using Pyparsing• How to Get Started

2

Best practices:… highlighted in the examples

Page 3: Writing Parsers in Python using Pyparsing

3

Paul McGuire• Mechanical Engineering degree from Rensselaer

Polytechnic Institute, Masters in Engineering from Univ of Texas; 30+ years developing planning and control software for electronics and semiconductor manufacturing (Pascal, PL/I, COBOL, Fortran, C/C++, Smalltalk, Java, C#, Python)• A long-time interest in parser applications, plus work

in O-O technologies in Smalltalk and Java, led to the object-based parser construction approach seen in Pyparsing, first released in 2003• Several articles published in Python Magazine, and an

e-book with O’Reilly, “Getting Started With PyParsing” published in 2007

Page 4: Writing Parsers in Python using Pyparsing

Quick Intro / Demo• Parsers are built up using Pyparsing classes• Word, Literal (or just string literals)• OneOrMore, ZeroOrMore• And, Or, MatchFirst, Each

• with overloaded operators +, ^, |, and &• Whitespace is implicitly skippedinteger = Word('0123456789')phone_number = Optional('(' + integer + ')') + integer + '-' + integer

# re.compile(r'(\(\d+\))?\d+-\d+')

greet = Word(alphas) + "," + Word(alphas) + "!"greet.parseString("Hello, World!")

4

Best practice:Don’t include whitespace in the parser definition

Page 5: Writing Parsers in Python using Pyparsing

Parsing 'geo:' URLs• URL for latitude / longitude / altitude values:

• geo:<latitude>,<longitude>[,<altitude>][;options…]

• options:• crs (coordinate reference system) – default = ‘wgs84’• u (uncertainty) – value in meters• other – key = value

5

Page 6: Writing Parsers in Python using Pyparsing

Sample 'geo:' URLs• Samples

geo:27.9878,86.9250,8850;crs=wgs84;u=100

geo:-26.416,27.428,-3900;u=100

geo:17.75,142.5,-11033;crs=wgs84;u=100

geo:36.246944,-116.816944,-85;u=50

geo:30.2644663,-97.7841169;a=100;href=http://www.allure-energy.com/

6

Page 7: Writing Parsers in Python using Pyparsing

'geo:' URL specification• IETF RFC 5870• from https://tools.ietf.org/html/rfc5870

geo-URI = geo-scheme ":" geo-pathgeo-scheme = "geo"geo-path = coordinates pcoordinates = num "," num [ "," num ]

p = [ crsp ] [ uncp ] [";" other]...crsp = ";crs=" crslabelcrslabel = "wgs84" / labeltextuncp = ";u=" uval

other = labeltext "=" valval = uval / chartext

7

Best practice:Start with a BNF

Page 8: Writing Parsers in Python using Pyparsing

Parsers included in Python libgeo:27.9878,86.9250,8850;crs=wgs84;u=100• urlparse

• repatt = r'geo:(-?\d+(?:\.\d*)?),(-?\d+(?:\.\d*)?)(?:,(-?\d+(?:\.\d*)?))?' +

r'(?:;(crs=[^;]+))?(?:;(u=\d+(?:\.\d*)?))?'

print(re.compile(patt).match(tests[0]).groups())('27.9878', '86.9250', '8850', 'crs=wgs84', 'u=100')

ParseResult(scheme='geo', netloc='', path='27.9878,86.9250,8850;crs=wgs84;u=100', params='', query='', fragment='')

8

Page 9: Writing Parsers in Python using Pyparsing

'geo' URL Parsing using Pyparsing

from pyparsing import *

EQ,COMMA = map(Suppress, "=,")number = Regex(r'-?\d+(\.\d*)?').addParseAction(lambda t: float(t[0]))

geo_coords = Group(number('lat') + COMMA + number('lng') + Optional(COMMA + number('alt')))

crs_arg = Group('crs' + EQ + Word(alphanums))u_arg = Group('u' + EQ + number)

url_args = Dict(delimitedList(crs_arg | u_arg, ';'))

geo_url = "geo:" + geo_coords('coords') + Optional(';' + url_args('args'))

9

Best practice:Use parse actions for conversions

Best practice:Use results names

Page 10: Writing Parsers in Python using Pyparsing

10

Parsing some samplestests = """\ geo:36.246944,-116.816944,-85;u=50 geo:30.2644663,-97.7841169;a=100;href=http://www.allure-energy.com/"""

geo_url.runTests(tests)

assert geo_url.matches("geo:36.246944,-116.816944,-85;u=50“)assert geo_url.matches("geo:36.246944;u=50“)

Best practice:runTests() is new in 2.0.4

Best practice:Use matches() for incremental inline validation of your parser elements

Page 11: Writing Parsers in Python using Pyparsing

11

Parsing some samples - resultsgeo:36.246944,-116.816944,-85;u=50

['geo:', [36.246944, -116.816944, -85.0], ';', [['u', 50.0]]]- args: [['u', 50.0]] - u: 50.0- coords: [36.246944, -116.816944, -85.0] - alt: -85.0 - lat: 36.246944 - lng: -116.816944

geo:30.2644663,-97.7841169;a=100;href=http://www.allure-energy.com/['geo:', [30.2644663, -97.7841169]]- coords: [30.2644663, -97.7841169] - lat: 30.2644663 - lng: -97.7841169

Page 12: Writing Parsers in Python using Pyparsing

'geo' URL – add support for 'other'

from pyparsing import *

EQ,COMMA = map(Suppress, "=,")number = Regex(r'-?\d+(\.\d*)?').addParseAction(lambda t: float(t[0]))

geo_coords = Group(number('lat') + COMMA + number('lng') + Optional(COMMA + number('alt')))

crs_arg = Group('crs' + EQ + Word(alphanums))u_arg = Group('u' + EQ + number)other = Group(Word(alphas) + EQ + CharsNotIn(';'))

url_args = Dict(delimitedList(crs_arg | u_arg | other, ';'))

geo_url = "geo:" + geo_coords('coords') + Optional(';' + url_args('args'))

12

Page 13: Writing Parsers in Python using Pyparsing

13

Parsing some samples (with 'other')geo:36.246944,-116.816944,-85;u=50

['geo:', [36.246944, -116.816944, -85.0], ';', [['u', 50.0]]]- args: [['u', 50.0]] - u: 50.0- coords: [36.246944, -116.816944, -85.0] - alt: -85.0 - lat: 36.246944 - lng: -116.816944

geo:30.2644663,-97.7841169;a=100;href=http://www.allure-energy.com/['geo:', [30.2644663, -97.7841169], ';', [['a', '100'], ['href', 'http://www.allure-energy.com/']]]- args: [['a', '100'], ['href', 'http://www.allure-energy.com/']] - a: 100 - href: http://www.allure-energy.com/- coords: [30.2644663, -97.7841169] - lat: 30.2644663 - lng: -97.7841169

Page 14: Writing Parsers in Python using Pyparsing

Using the Pyparsing 'geo:' parser

geo = geo_url.parseString('geo:27.9878,86.9250,8850;crs=wgs84;u=100')

print(geo.dump())['geo:', [27.9878, 86.925, 8850.0], ';', [['crs', 'wgs84'], ['u', 100.0]]]- args: [['crs', 'wgs84'], ['u', 100.0]] - crs: wgs84 - u: 100.0- coords: [27.9878, 86.925, 8850.0] - alt: 8850.0 - lat: 27.9878 - lng: 86.925

print(geo.coords.alt)8850.0

print(geo.args.asDict()){'crs': 'wgs84', 'u': 100.0}

14

Best practice:dump() is very useful for seeing the structure and names in the parsed results

Best practice:pprint() is useful for seeing the results structure if no results names are defined

Page 15: Writing Parsers in Python using Pyparsing

Other Examples of Using Pyparsing• State model Python code

• SQL SELECT statements• also a good starter for “FauxSQL”

• Lucene query• (see also https://bitbucket.org/mchaput/whoosh/overview)

• Elastic search query (plasticparser)

TrafficLight = { Red -> Green; Green -> Yellow; Yellow -> Red; }

DocumentRevision = { New -( create )-> Editing; Editing -( cancel )-> Deleted; Editing -( submit )-> PendingApproval; PendingApproval -( reject )-> Editing; PendingApproval -( approve )-> Approved; Approved -( activate )-> Active; Active -( deactivate )-> Approved; Approved -( retire )-> Retired; Retired -( purge )-> Deleted; }

15

Page 16: Writing Parsers in Python using Pyparsing

More Pyparsing Usages• zhpy – Python interpreter with Chinese keywords –

Fred Lin

• https://pypi.python.org/pypi/zhpy/1.7.4 http://zh-tw.enc.tfode.com/%E5%91%A8%E8%9F%92

16

Page 17: Writing Parsers in Python using Pyparsing

More Pyparsing Usages• Robot command language

• https://iusb.edu/computerscience/faculty-and-staff/faculty/jwolfer/005.pdf

rinit # initialize communicationrsens Rd # read robot sensorsrspeed Rright,Rleft # set robot motor speedsrspeed $immed,$immed

Crumblehttp://redfernelectronics.co.uk

17

Page 18: Writing Parsers in Python using Pyparsing

More Pyparsing Usages• Logo interpreter (in French) - Christophe Vu-

Brugier

• http://www.enodev.fr/

LCTD 90 AV 60TD 90 AV 80BCREPETE 4 [ TG 90 AV 20 ]LCAV 80BCREPETE 4 [ AV 20 TG 90 ]

18

Page 19: Writing Parsers in Python using Pyparsing

How to Get Started• Install Pyparsing (if not already available)

pip install –U pyparsing

• MIT license

• Go to the pyparsing wiki (http://pyparsing.wikispaces.com) and check out the examples page

• Online docs at http://pythonhosted.org/pyparsing/

• Post questions on StackOverflow (use pyparsing tag)

• Paul McGuire – [email protected] 19

Page 20: Writing Parsers in Python using Pyparsing

20

QUIZ!!!

Page 21: Writing Parsers in Python using Pyparsing

21

Best Practices Summary• Start with a BNF• Don’t use explicit whitespace in your parser• Use parse actions for parse-time conversions• Use results names to facilitate access to data after

parsing• parser.runTests() makes it easy to run through test

cases• parser.matches(test_string) is a simple test for unit

testing• results.dump() and results.pprint() are good for

examining the parsing results

Page 22: Writing Parsers in Python using Pyparsing

22

Thank You!