Designing and Implementing Web Data Services in Perl
Michael McClennen
Server
DataStore
Client
Request
Response
What is "REST" ?
• REST is a set of architectural principles for the World Wide Web
• Developed by Roy Fielding, one of the Web's principal architects
• Stands for "REpresentational State Transfer"• No consensus about exactly what it means in
practice
REST: original principles• Separation of client and server by a uniform interface• Intermediate servers (i.e. proxies or caches) may be
interposed arbitrarily• All client-server interactions are stateless• Data is composed of resources, each identified by a URI• Server sends a representation of a resource• Clients can manipulate the resource by means of the
representation• Representations are self-describing• Client state transitions depend upon information embedded
in representations (HATEOAS)
REST: in practice
1. One protocol layer, generally HTTP– no extra layers (such as SOAP) on top of it– headers and status codes are used as designed
2. Resources are identified by URIs– individual resources– all resources matching particular criteria
3. Client-server interactions are stateless– with the possible exception of authentication
Server
DataStore
Client
Web Data
Service (API)
Query
HTTP Response
HTTP Request
HTTP Response
HTTP Request
Operation
Result
Result
Web Data Service (API)
• Parse HTTP requests• Validate parameters• Talk to the backend data store• Assemble representations of data• Serialize representations in JSON, XML, …• Set HTTP response headers• Generate appropriate error messages• Provide documentation about itself
What makes a good Web Data Service,
from the point of view of the USER?
Well designedWell documented
FlexibleConsistentResponsive
Example: Wikipedia API
http://en.wikipedia.org/w/api.php?action=query&list=allpages&apfrom=Perl&aplimit=50&format=json
“ List 50 pages whose title starts with ‘Perl’, in JSON format ”
Example: Wikipedia API
Execute
http://en.wikipedia.org/w/api.php? Base URLaction=query Specify type of operationlist=allpages Specify operationapfrom=Perl Query parameteraplimit=50 Specify size of result setformat=json Specify result format
Example: Wikipedia API
Execute
http://en.wikipedia.org/w/api.php? Base URLaction=query Specify type of operationlist=allpages Specify operationapfrom=Perl Query parameteraplimit=50 Specify size of result setformat=xml Specify result format
Example: Wikipedia API
Execute
http://en.wikipedia.org/w/api.php? Base URLaction=query Specify type of operationlist=allpages Specify operationapfrom=Perl Query parameteraplimit=5 Specify size of result setformat=xml Specify result format
Example: Wikipedia API
Execute
http://en.wikipedia.org/w/api.php? Base URLaction=query Specify type of operationlist=foobar Specify operationapfrom=Perl Query parameteraplimit=5 Specify size of result setformat=xml Specify result format
Example: Wikipedia API
Execute
http://en.wikipedia.org/w/api.php? Base URLaction=query Specify type of operationlist=foobar Specify operationapfrom=Perl Query parameteraplimit=5 Specify size of result setformat=json Specify result format
Example: Wikipedia API
Execute
http://en.wikipedia.org/w/api.php? Base URLaction=query Specify type of operationlist=allpages Specify operationapfrom=Perl Query parameteraplimit=5 Specify size of result setformat=json Specify result formatfoo=bar *Bad parameter*
Example: Wikipedia API
Execute
http://en.wikipedia.org/w/api.php Base URL only
Example: Google Feed API
https://ajax.googleapis.com/ajax/services/feed/find?v=1.0&q=Perl
“ List all feeds whose title contains ‘Perl’ ”
Example: Google Feed API
Execute
https://ajax.googleapis.com/ajax/services/ Base URLfeed/find? Specify operationq=Perl Query parameterv=1.0 Protocol version
Example: Google Feed API
https://ajax.googleapis.com/ajax/services/feed/load?v=1.0&q=http://www.perl.com/pub/atom.xml&num=10
“ Show the most recent 10 entries from the feed http://www.perl.com/pub/atom.xml ”
Example: Google Feed API
Execute
https://ajax.googleapis.com/ajax/services/ Base URLfeed/load? Specify operationq=http://www.perl.com/pub/atom.xml Query parameterv=1.0 Protocol versionnum=10 Size of result set
Example: Google Feed API
Execute
https://ajax.googleapis.com/ajax/services/ Base URLfeed/load? Specify operationq=http://www.perl.com/pub/atom.xml Query parameterv=1.0 Protocol versionnum=NOMNOMNOM * bad value *
Example: Google Feed API
Execute
https://ajax.googleapis.com/ajax/services/ Base URLfeed/load? Specify operationq=http://www.perl.com/pub/atom.xml Query parameterv=1.0 Protocol versionnumm=10 * bad parameter *
Example: Google Feed API
Execute
https://ajax.googleapis.com/ajax/services/ Base URLfeed/load? Specify operationq=http://www.perl.com/pub/atom.xml Query parameter
* missing version *
Example: Google Feed API
Execute
https://ajax.googleapis.com/ajax/services/ Base URL
Example: Google Feed API
Execute
Documentation is at:
http://developers.google.com/feed/v1/jsondevguide
What makes a good Web Data Service CODEBASE,
From the point of view of the programmer?
Easy to implementEasy to documentEasy to maintain
Low overhead
Web Data Service (API)
• Parse HTTP requests• Validate parameters• Talk to the backend data store• Assemble representations of data• Serialize representations in JSON, XML, …• Set HTTP response headers• Generate appropriate error messages• Provide documentation about itself
Basic data service procedure
1. Parse URL2. Determine operation and result format3. Validate and clean the parameter values4. Get data from the backend (using param. vals.)5. Serialize the data in the selected format6. Set HTTP response headers appropriately7. If anything goes wrong, generate an error response
Introducing Web::DataService
• On CPAN as Web::DataService• Built on top of Dancer• You define operations, parameter rules,
output blocks, and it handles the rest• Complete enough for real use• Documentation still incomplete• Needs collaborators, testers, users
Important early decisions
1. Which framework to use2. How to validate parameter values3. How to organize your parameter space4. How to handle output formats5. How to implement the response procedure6. How to handle versioning7. How to report errors8. How to handle documentation
Decisions that can wait
• Which HTTP server to use• Which backend framework to use• Strategies for Caching and other performance
enhancements
Plan for these from the start:
• Multiple output formats• Multiple output vocabularies• Multiple protocol versions• Auto-generated documentation
Decision 1: which framework?
• Dancer 1• Dancer 2• Mojolicious• Web::DataService
Decision 2: parameter values
• How will the parameter values be validated and cleaned?
• Recommendation: use HTTP::Validate
define_ruleset('1.1:taxa:specifier' => { param => 'name', valid => \&TaxonData::validNameSpec, alias => 'taxon_name' }, "Return information about the most fundamental taxonomic name",
"matching this string. The C<%> and C<_> characters may be used",
"as wildcards.",{ param => 'id', valid => POS_VALUE, alias => 'taxon_id' }, "Return information about the taxonomic name corresponding to
this", "identifier.",{ at_most_one => ['name', 'id'] } "You may not specify both C<name> and C<id> in the same query.");
Decision 2: parameter values
• How will the parameter values be validated and cleaned?
• Recommendation: use HTTP::Validate
Decision 3: parameter space
• How will users specify which operation to do?– http://exmpl.com/service/some/thing ? …– http://exmpl.com/service ? op=something & …
Decision 4: output formats
• How will users specify the output format?– http://exmpl.com/service/something.json ? …– http://exmpl.com/service ? … & format=json …
• Recommendation: separate the definition of output fields from output formats
x
x
x
x
x
x
x x
x
x
$ds->define_block('1.1:taxa:basic' =>{ output => 'taxon_no', dwc_name => 'taxonID', com_name => ’oid' }, "A positive integer that uniquely identifies this taxonomic name",{ output => 'record_type', com_name => 'typ', com_value => ’txn', dwc_value => 'Taxon', value => 'taxon' }, "The type of this record. By vocabulary:", "=over", "=item pbdb", "taxon", "=item com", "txn", "=item dwc", "Taxon",
"=back",{ set => 'rank', if_vocab => 'pbdb,dwc', lookup => \%RANK_STRING },{ output => 'rank', dwc_name => 'taxonRank', com_name => 'rnk' }, "The rank of this taxon, ranging from subspecies up to kingdom",{ output => 'taxon_name', dwc_name => 'scientificName', com_name
=> 'nam' }, "The scientific name of this taxon",{ output => 'common_name', dwc_name => 'vernacularName', com_name => 'nm2' }, "The common (vernacular) name of this taxon, if any",{ set => 'attribution', if_field => 'a_al1', from_record => 1, code => \&generateAttribution },… );
• Web::DataService provides:– Web::DataService::Plugin::JSON.pm– Web::DataService::Plugin::XML.pm– Web::DataService::Plugin::Text.pm– you can add your own
• Output is delegated to the appropriate module based on the selected format
Decision 4: output formats
• How will users specify the output format?– http://exmpl.com/service/something.json ? …– http://exmpl.com/service ? … & format=json …
• Recommendation: separate the definition of output fields from output formats
Decision 5: procedure
• How will you handle the basic request-response procedure?
• Recommendation: specify a set of attributes for each operation, and use a single body of code to handle operation execution
$ds->define_path({ path => 'taxa',class => 'TaxonData',output => '1.1:taxa:basic',doc_title => 'Taxonomic names' });
$ds->define_path({
path => 'taxa/single',allow_format => 'json,csv,tsv,txt,xml',allow_vocab => 'com,pbdb,dwc',method => 'get',doc_title => 'Single taxon' });
$ds->define_path({
path => 'taxa/list',allow_format => 'json,csv,tsv,txt,xml',allow_vocab => 'com,pbdb,dwc',method => 'list',doc_title => 'Lists of taxa' });
Decision 5: procedure
• How will you handle the basic request-response procedure?
• Recommendation: specify a set of attributes for each operation, and use a single body of code to handle operation execution
Decision 6: versioning
• How will users specify which protocol version?– http://exmpl.com/service/some/thing ? … & v=1.0– http://exmpl.com/service1.0/some/thing ? …
• Recommendation: make your users specify a version from the very beginning
Decision 7: error reporting
• Recommendation: report errors in JSON if that format was selected
• Recommendation: use the HTTP result codes– 400 Bad request– 404 Not found– 415 Unrecognized media type– 500 Server error
• Recommendation: if your code throws an exception, report a generic message
Decision 8: documentation
• Recommendation: auto-generate documentation as much as possible
• Recommendation: a request using the base URL with no parameters should return the main documentation page
Other recommendations
• Recommendation: know the HTTP protocol– Status codes (400, 404, 500, 301, etc.)– CORS ("Access-Control-Allow-Origin")– Cache-Control– Content-Type
Final example
• The Paleobiology Database Navigator– http://paleobiodb.org/navigator
• Based on the Paleobiology Database API– http://paleobiodb.org/data1.1/
Call for collaboration
• Please let me know if you are interested in:– Using Web::DataService– Testing Web::DataService– Helping to further develop Web::DataService