layered combinator parsers with a unique state
DESCRIPTION
Layered Combinator Parsers with a Unique State. Pieter Koopman Rinus Plasmeijer Nijmegen, The Netherlands. Overview. conventional parser combinators requirements new combinators system-architecture new parser combinators separate scanner and parser error handling. parser combinators. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Layered Combinator Parsers with a Unique State](https://reader036.vdocument.in/reader036/viewer/2022062722/56813a21550346895da1fdac/html5/thumbnails/1.jpg)
Layered Combinator Layered Combinator Parsers with a Unique Parsers with a Unique
StateState
Pieter KoopmanRinus Plasmeijer
Nijmegen, The Netherlands
![Page 2: Layered Combinator Parsers with a Unique State](https://reader036.vdocument.in/reader036/viewer/2022062722/56813a21550346895da1fdac/html5/thumbnails/2.jpg)
Parser Combinators Pieter Koopman 2
Overview
conventional parser combinators requirements new combinators system-architecture new parser combinators separate scanner and parser error handling
![Page 3: Layered Combinator Parsers with a Unique State](https://reader036.vdocument.in/reader036/viewer/2022062722/56813a21550346895da1fdac/html5/thumbnails/3.jpg)
Parser Combinators Pieter Koopman 3
parser combinators
Non deterministic, list of results:: Parser s r :== [s] -> [ ParseResult s r ]:: ParseResult s r :== ([s],r)
fail & yieldfail = \ss = []yield r = \ss = [(ss,r)]
recognize symbolsatisfy :: (s->Bool) -> Parser s ssatisfy f = pwhere p [s:ss] | f s = [(ss,s)] p _ = []
symbol sym :== satisfy ((==) sym)
![Page 4: Layered Combinator Parsers with a Unique State](https://reader036.vdocument.in/reader036/viewer/2022062722/56813a21550346895da1fdac/html5/thumbnails/4.jpg)
Parser Combinators Pieter Koopman 4
parser combinators 2 sequence-combinators(<&>) infixr 6::(Parser s r)(r->Parser s t)->Parser s t(<&>) p1 p2 = \ss1 = [ tuple \\ (ss2,r1) <- p1 ss1 , tuple <- p2 r1 ss2 ] (<+>)infixl 6::(Parser s(r->t))(Parser s r)->Parser s t(<+>) p1 p2 = \ss1 = [ (ss3,f r) \\ (ss2,f) <- p1 ss1 , (ss3,r) <- p2 ss2 ]
choose-combinator(<||>) infixr 4::(Parser s r) (Parser s r)->Parser s r(<||>) p1 p2 = \ss = p1 ss ++ p2 ss
![Page 5: Layered Combinator Parsers with a Unique State](https://reader036.vdocument.in/reader036/viewer/2022062722/56813a21550346895da1fdac/html5/thumbnails/5.jpg)
Parser Combinators Pieter Koopman 5
parser combinators 3
some useful abbreviations(@>) infixr 7(@>) f p :== yield f <+> p
(<:>) infixl 6(<:>) p1 p2 :== (\h t=[h:t]) @> p1 <+> p2
![Page 6: Layered Combinator Parsers with a Unique State](https://reader036.vdocument.in/reader036/viewer/2022062722/56813a21550346895da1fdac/html5/thumbnails/6.jpg)
Parser Combinators Pieter Koopman 6
parser combinators 4
Kleene starstar p = p <:> star p
<||> yield []
plus p = p <:> star p
parsing an identifieridentifier :: Parser Char String
identifier = toString @> satisfy isAlpha
<:> star (satisfy isAlphanum)
![Page 7: Layered Combinator Parsers with a Unique State](https://reader036.vdocument.in/reader036/viewer/2022062722/56813a21550346895da1fdac/html5/thumbnails/7.jpg)
Parser Combinators Pieter Koopman 7
parser combinators 5
context sensitive parserstwice the same character
doubleChar = satisfy isAlpha <&> \c -> symbol c
arbitrary look aheadlookAhead
= symbol 'a' +> symbol 'b' <||> symbol 'a' +> symbol 'c'
![Page 8: Layered Combinator Parsers with a Unique State](https://reader036.vdocument.in/reader036/viewer/2022062722/56813a21550346895da1fdac/html5/thumbnails/8.jpg)
Parser Combinators Pieter Koopman 8
parser combinators 5
context sensitive parserstwice the same character
doubleChar = satisfy isAlpha <&> \c -> symbol c
arbitrary look aheadlookAhead
= symbol 'a' +> symbol 'b' <||> symbol 'a' +> symbol 'c' <||> star (satisfy isSpace) +> symbol 'a' <||> symbol 'x'
![Page 9: Layered Combinator Parsers with a Unique State](https://reader036.vdocument.in/reader036/viewer/2022062722/56813a21550346895da1fdac/html5/thumbnails/9.jpg)
Parser Combinators Pieter Koopman 9
properties of combinators
+ concise and clear parsers+ full power of fpl available+ context sensitive + arbitrary look-ahead+ can be efficient, continuations IFL '98
- no error handling (messages & recovery)
- no unique symbol tables- separate scanner yields problems
scan entire file before parser starts
![Page 10: Layered Combinator Parsers with a Unique State](https://reader036.vdocument.in/reader036/viewer/2022062722/56813a21550346895da1fdac/html5/thumbnails/10.jpg)
Parser Combinators Pieter Koopman 10
Requirements parse state with
error file notion of position user-defined extension e.g. symbol table
possibility to add separate scanner efficient implementation, continuations
for programming languages we want a single result (deterministic grammar)
![Page 11: Layered Combinator Parsers with a Unique State](https://reader036.vdocument.in/reader036/viewer/2022062722/56813a21550346895da1fdac/html5/thumbnails/11.jpg)
Parser Combinators Pieter Koopman 11
Uniqueness
files and windows that should be single-threaded are unique in Clean
fwritec :: Char *File -> *File
data-structures can be updated destructively when they are unique
only unique arrays can be changed
![Page 12: Layered Combinator Parsers with a Unique State](https://reader036.vdocument.in/reader036/viewer/2022062722/56813a21550346895da1fdac/html5/thumbnails/12.jpg)
Parser Combinators Pieter Koopman 12
System-architecture
replace the list of symbols by a structure containing actual input position error administration user defined part of the state
use a type constructor class to allow multiple levels
![Page 13: Layered Combinator Parsers with a Unique State](https://reader036.vdocument.in/reader036/viewer/2022062722/56813a21550346895da1fdac/html5/thumbnails/13.jpg)
Parser Combinators Pieter Koopman 13
Type constructor class
Reading a symbolclass PSread ps s st :: (*ps s *st)->(s, *ps s *st)
Copying the state is not allowed,use functions to manipulate the input
class PSsplit ps s st :: (s, *ps s *st)->(s, *ps s *st)
class PSback ps s st :: (s, *ps s *st)->(s, *ps s *st)
class PSclear ps s st :: (s, *ps s *st)->(s, *ps s *st)
Minimal parser state requires Clean 2.0class ParserState ps symbol state
| PSread, PSsplit, PSback, PSclear ps symbol state
![Page 14: Layered Combinator Parsers with a Unique State](https://reader036.vdocument.in/reader036/viewer/2022062722/56813a21550346895da1fdac/html5/thumbnails/14.jpg)
Parser Combinators Pieter Koopman 14
New parser combinators
Parsers have three arguments1. success-continuation
determines action upon successSuccCont :== Item failCont State -> (Result, State)
2. fail-continuation specifies what to do if parser failsFailCont :== State -> (Result, State)
3. current input stateState :== (Symbol, ParserState)
![Page 15: Layered Combinator Parsers with a Unique State](https://reader036.vdocument.in/reader036/viewer/2022062722/56813a21550346895da1fdac/html5/thumbnails/15.jpg)
Parser Combinators Pieter Koopman 15
New parser combinators 2 yield and fail, apply appropriate
continuationyield r = \succ fail tuple = succ r fail tuple
failComb = \succ fail tuple = fail tuple
sequence of parsers, change continuation<&> p1 p2 = \sc fc t -> p1 (\a _ -> p2 a sc fc) fc t
choice, change continuations(<|>) p1 p2
= \succ fail tuple = p1 (\r f t = succ r fail (PSclear t)) (\t2 = p2 succ fail (PSback t2)) (PSsplit tuple)
![Page 16: Layered Combinator Parsers with a Unique State](https://reader036.vdocument.in/reader036/viewer/2022062722/56813a21550346895da1fdac/html5/thumbnails/16.jpg)
Parser Combinators Pieter Koopman 16
string input a very simple instance of ParserState:: *StringInput symbol state = { si_string :: String // string holds input , si_pos :: Int // index of current char , si_hist :: [Int] // to remember old positions , si_state :: state // user-defined extension , si_error :: ErrorState }instance PSread StringInput Char statewhere PSread si=:{si_string,si_pos} = (si_string.[si_pos],{si & si_pos = si_pos+1})
instance PSsplit StringInput Char statewhere PSsplit (c,si=:{si_pos,si_hist}) = (c,{si & si_hist = [si_pos:si_hist]})
instance PSback StringInput Char statewhere PSback (_,si=:{si_string,si_hist=[h:t]}) = (si_string.[h-1],{si & si_pos = h, si_hist = t})
![Page 17: Layered Combinator Parsers with a Unique State](https://reader036.vdocument.in/reader036/viewer/2022062722/56813a21550346895da1fdac/html5/thumbnails/17.jpg)
Parser Combinators Pieter Koopman 17
Separate scanner and parser sometimes it is convenient to have a
separate scannere.g. to implement the offside rule
task of scanner and parser is similar.So, use the same combinators
due to the type constructor class we can nest parser states
![Page 18: Layered Combinator Parsers with a Unique State](https://reader036.vdocument.in/reader036/viewer/2022062722/56813a21550346895da1fdac/html5/thumbnails/18.jpg)
Parser Combinators Pieter Koopman 18
a simple scanner
use of combinators doesn’t change produces tokens (algebraic datatype)
scanner = skipSpace +> ( generateOffsideToken <|> satisfy isAlpha <:> star (satisfy isAlphanum) <@ testReserved o toString <|> plus (satisfy isDigit)<@ IntToken o to_number 0 <|> symbol '=' <@ K EqualToken <|> symbol '(' <@ K OpenToken <|> symbol ')' <@ K CloseToken )
![Page 19: Layered Combinator Parsers with a Unique State](https://reader036.vdocument.in/reader036/viewer/2022062722/56813a21550346895da1fdac/html5/thumbnails/19.jpg)
Parser Combinators Pieter Koopman 19
generating offside tokens use an ordinary parse functiongenerateOffsideToken
= pAcc getCol <&> \col -> // get current coloumn
pAcc getOffside <&> \os_col -> // get offside position
handleOS col os_col
where
handleOS col os_col
| EndGroupGenerated os_col
| col < os_col
= pApp popOffside (yield EndOfGroupToken)
= pApp ClearEndGroup failComb
| col <= os_col
= pApp SetEndGroup (yield EndOfDefToken)
= failComb
![Page 20: Layered Combinator Parsers with a Unique State](https://reader036.vdocument.in/reader036/viewer/2022062722/56813a21550346895da1fdac/html5/thumbnails/20.jpg)
Parser Combinators Pieter Koopman 20
Parser state for nesting parser state contains scanner and its state:: *NestedInput token state = E. .ps sym scanState: { ni_scanSt :: (ps sym scanState) , ni_scanner :: (ps sym scanState) -> *(token, ps sym scanState)) , ni_buffer :: [token] , ni_history :: [[token]] , ni_state :: state }
can be nested to any depth we can, but doesn’t have to, use this
![Page 21: Layered Combinator Parsers with a Unique State](https://reader036.vdocument.in/reader036/viewer/2022062722/56813a21550346895da1fdac/html5/thumbnails/21.jpg)
Parser Combinators Pieter Koopman 21
Parser state for nesting 2
NestedInput
*File
*ErrorState
*OffsideState
ScanState
scanner
*HashTable
![Page 22: Layered Combinator Parsers with a Unique State](https://reader036.vdocument.in/reader036/viewer/2022062722/56813a21550346895da1fdac/html5/thumbnails/22.jpg)
Parser Combinators Pieter Koopman 22
Parser state for nesting 3
apply scanner to read token
instance PSread NestedState token state
where
PSread ns=:{ns_scanner, ns_scanSt}
# (tok, state) = ns_scanner ns_scanSt
= (tok, {ns & ns_scanSt = state})
here, we ignored the buffer define instances for other functions in class ParserState
![Page 23: Layered Combinator Parsers with a Unique State](https://reader036.vdocument.in/reader036/viewer/2022062722/56813a21550346895da1fdac/html5/thumbnails/23.jpg)
Parser Combinators Pieter Koopman 23
error handling general error correction is difficult correct simple errors skip to new definition otherwise
Good error messages: location: position in file what are we parsing: stack of
contexts
Error [t.icl,20,[caseAlt,Expression]]: ) expected instead of =
![Page 24: Layered Combinator Parsers with a Unique State](https://reader036.vdocument.in/reader036/viewer/2022062722/56813a21550346895da1fdac/html5/thumbnails/24.jpg)
Parser Combinators Pieter Koopman 24
error handling 2 basic error generationparseError expected val = \succ fail (t,ps) = let msg = toString expected +++ " expected instead of " +++ toString t in succ val fail (PSerror msg (PSread ps))
useful primitiveswantSymbol sym = symbol sym <|> parseError sym sym
want p msg value = p <|> parseError msg value
skipToSymbol sym = symbol sym <|> parseError sym sym +> star (satisfy ((<>) sym)) +> symbol sym
![Page 25: Layered Combinator Parsers with a Unique State](https://reader036.vdocument.in/reader036/viewer/2022062722/56813a21550346895da1fdac/html5/thumbnails/25.jpg)
Parser Combinators Pieter Koopman 25
Parser
Parsing expressionspExpression = "Expression" ::> BV @> match mBasicValue <|> pIdentifier <|> symbol CaseToken +> pDeter Case @> pCompoundExpression <+ wantSymbol OfToken <+> star pCaseAlt <+ skipToSymbol EndOfGroupToken
<|> symbol OpenToken +> pCompoundExpression <+ wantSymbol CloseToken
![Page 26: Layered Combinator Parsers with a Unique State](https://reader036.vdocument.in/reader036/viewer/2022062722/56813a21550346895da1fdac/html5/thumbnails/26.jpg)
Parser Combinators Pieter Koopman 26
identifiers in hashtable
use a parse-function hashtable is user defined state in
ParserStatepIdentifier
= match mIdentToken
<&> \ident = pAccSt (putNameInHashTable ident)
<@ \name={app_symb=UnknownSymbol name, app_args=[]}
the function pAccSt applies a function to the user defined state
![Page 27: Layered Combinator Parsers with a Unique State](https://reader036.vdocument.in/reader036/viewer/2022062722/56813a21550346895da1fdac/html5/thumbnails/27.jpg)
Parser Combinators Pieter Koopman 27
limitations of this approach syntax specified by parse
functions grammar is not a datastructure no detection of left recursion
runtime error instead of nice message no automatic left-factoring
do it by hand, or runtime overhead
p1 = p <&> q1 <|> p <&> q2
p2 = p <&> (q1 <|> q2)
![Page 28: Layered Combinator Parsers with a Unique State](https://reader036.vdocument.in/reader036/viewer/2022062722/56813a21550346895da1fdac/html5/thumbnails/28.jpg)
Parser Combinators Pieter Koopman 28
discussiondiscussion old advantages
concise, fpl-power, arbitrary look ahead, context sensitve
new advantages unique and extendable parser state one or more layers decent error handling,
simple error correction can be added still efficient, overhead < 2 non-determinism only when needed