dev8d 2011-pipe2 py
DESCRIPTION
An introduction to pipe2py a Yahoo Pipes to Python compiler.TRANSCRIPT
Introducing Pipe2Py:Converting Yahoo Pipes to Python Code
Original code: Greg Gaughan
Additional development: Tuukka Hastrup
Based on an original idea by: Tony Hirst, Dept of Communication and Systems, The Open University
pipes.yahoo.com
But what happens if Yahoo Pipes dies?
Pipe2Pygithub.com/ggaughan/pipe2py
Yahoo pipelines are translated into pipelines of Python generators* to give a close match to the original data flow.
* based on ideas by David Beazley http://www.dabeaz.com/generators-uk
Each Yahoo module is coded as a separate Python module.
So you can use Yahoo Pipes as a graphical rapid prototyping application, and then generate a Python code equivalent you can host yourself
So what?
download codehttp://github.com/ggaughan/pipe2py
to dev8d/pipes/pipe2py
set pathexport PYTHONPATH=dev8d/pipes
installation
simplejson*sudo easy_install simplejson
dependencies
* only needed for Python pre 2.6
test directorypython testbasics.py
unit tests
python compile.py -p pipelineid
compilation - direct from Yahoo Pipes
generatespipe_pipelineid.py
python compile.py pipelinefile.json
compilation - from a file
generatespipelinefile.py
python pipe_pipelineid.py
command line execution
runspipe_pipelineid.py
from pipe2py import Contextfrom pipe2py.modules import *
def pipe_404411a8d22104920f3fc1f428f33642(context, _INPUT, conf=None, **kwargs): "Pipeline" if conf is None: conf = {}
forever = pipeforever.pipe_forever(context, None, conf=None)
sw_502 = pipefetch.pipe_fetch(context, forever, conf={u'URL': {u'type': u'url', u'value': u'http://blog.ouseful.info/feed'}}) _OUTPUT = pipeoutput.pipe_output(context, sw_502, conf={}) return _OUTPUT
compiled code of the form...
Each call to the final generator will ripple through the pipeline
issuing .next() calls onto the previous generator until the
source is exhausted.
Each item is typically passed through the whole pipeline one at a time, so:
memory usage is kept to a minimumno module is waiting on an earlier module to finish processing the whole data setby adding queues between the modules they could easily be made to run in parallel, each on a different CPU, to give great scalability
from pipe2py import Contextimport pipe_9dc8014dcfd34c834a960321afde68d9 as p
C=Context()
r = p.pipe_9dc8014dcfd34c834a960321afde68d9(C,None)
for i in r: print i print i['title']
usage - compiled pipe
from pipe2py.compile import parse_and_build_pipefrom pipe2py import Context
pipe_def = """json representation of the pipe"""
p = parse_and_build_pipe(Context(), pipe_def)
for i in p: print i
usage - interpreted pipe
context = Context(describe_input=True)
p = pipe_ac45e9eb9b0174a4e53f23c4c9903c3f(context, None)
need_inputs = pprint need_inputs
>>> [(u'0', u'username', u'Twitter username', u'text', u''),... (u'1', u'statustitle', u'Status title [string] or [logo] means twitter icon', u'text', u'logo')]
''' That is, tuples of the form (position, name, prompt, type, default)'''
usage - user inputs #1 Identifying console prompts
C = Context(inputs={'username':'greg', 'statustitle':'logo'}, console=False)p = pipe_ac45e9eb9b0174a4e53f23c4c9903c3f(C, None)
for i in p: print i
usage - user inputs #2 avoiding console prompts
Yahoo Pipes modules:Pipe2Py implementation progress
Yahoo Pipes modules:Pipe2Py implementation progress
Yahoo Pipes modules:Pipe2Py implementation progress
;-)
One more thing...
pipes-engine.appspot.com
pipe2py hosting on Google App Engine
- generate test pipes that work of increasing complexity
- generate test pipes that don't work
- commit pipe2py patches for test pipes that don't work
How can you help?
- simplify installation (easy_install?)
- identify a good convention for integrating pipe2py compiled pipes in arbitrary code
- - identify a good convention for inserting arbitrary python functions into, or in-between, compiled pipe2py pipelines
How else can you help?
the next step: produce an open source front end visual editor?
wireit?pypes?
Anything else?
generate a ready-to-run instance of a Google App Engine configuration
based around a compiled pipe?
Anything more else?
Pipe2Pygithub.com/ggaughan/pipe2py