if you have the content, then apache has the technology! a whistle-stop tour of the apache content...

57
If you have the Content, then Apache has the Technology! A whistle-stop tour of the Apache content related projects

Upload: jeffery-miller

Post on 25-Dec-2015

217 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: If you have the Content, then Apache has the Technology! A whistle-stop tour of the Apache content related projects

If you have the Content,then Apache has the

Technology!

A whistle-stop tour of the

Apache content related projects

Page 2: If you have the Content, then Apache has the Technology! A whistle-stop tour of the Apache content related projects

Nick Burch

Software EngineerAlfresco

Page 3: If you have the Content, then Apache has the Technology! A whistle-stop tour of the Apache content related projects

Apache Projects

• 79 Top Level Projects• 40 Incubating Projects

• 30 “Content Related” Main Projects• 7 “Content Related” Incubating

Projects

Page 4: If you have the Content, then Apache has the Technology! A whistle-stop tour of the Apache content related projects

37 Projects in 50 minutes

With time for questions...

This is not a comprehensive guide!

Page 5: If you have the Content, then Apache has the Technology! A whistle-stop tour of the Apache content related projects

Different Technologies

• Serving• Storing• Transforming• Generating• Hosting• Web Framework Rendering /

Templating / etc

Page 6: If you have the Content, then Apache has the Technology! A whistle-stop tour of the Apache content related projects

What can we get in 50 mins?

• A quick overview of each project• When talks on the project are

happening• When meetups on the project are

happening• Anything new/exciting about the

project?• What interests me in the project!

Page 7: If you have the Content, then Apache has the Technology! A whistle-stop tour of the Apache content related projects

Serving upyour Content

Page 8: If you have the Content, then Apache has the Technology! A whistle-stop tour of the Apache content related projects

Apache HTTPD Serverhttp://httpd.apache.org/

• Talks – All day WednesdayMeetup – Thursday evening

• Very wide range of features• (Fairly) easy to extend• Can host most programming

languages• Can front most content systems• Can proxy your content applications• Can host code and content

Page 9: If you have the Content, then Apache has the Technology! A whistle-stop tour of the Apache content related projects

Apache TrafficServerhttp://trafficserver.apache.org/

• High performance web proxy• Forward and reverse proxy• Ideally suited to sitting between your

content application and the internet• For proxy-only use cases, will probably

be better than httpd• Fewer other features though• Often used as a cloud-edge http router

Page 10: If you have the Content, then Apache has the Technology! A whistle-stop tour of the Apache content related projects

Apache Tomcathttp://tomcat.apache.org/

• Talks – All day Friday!• Java based, as many of the Apache

Content Technologies are• Java Servlet Container• And you probably all know the rest!

Page 11: If you have the Content, then Apache has the Technology! A whistle-stop tour of the Apache content related projects

Tomcat – What's Newhttp://tomcat.apache.org/

• Memory leak detection – for your applications, and for the JVM!

• Easier to embed – no need for large numbers of config files!

• Asynchronous request processing for things like Comet / Bayeux

• Servlet 3.0• Improved JMX configurability

Page 12: If you have the Content, then Apache has the Technology! A whistle-stop tour of the Apache content related projects

Storing allthat Content

Page 13: If you have the Content, then Apache has the Technology! A whistle-stop tour of the Apache content related projects

Apache Cassandrahttp://cassandra.apache.org/

• Talk - 11am WednesdayMeetup - Wednesday evening

• One of our many NoSQL Databases• Column-Family store• Eventually consistent• Distributed, replicating, no SPF• Can elastically add machines

Page 14: If you have the Content, then Apache has the Technology! A whistle-stop tour of the Apache content related projects

Apache CouchDBhttp://couchdb.apache.org/

• 12pm Wednesday• Relax!• Erlang• NoSQL• Document orientated distributed store• Eventually consistent if replicating• Map-Reduce queries

Page 15: If you have the Content, then Apache has the Technology! A whistle-stop tour of the Apache content related projects

Apache HBasehttp://hbase.apache.org/

• 2pm Wednesday• Recently graduated from Hadoop• Another NoSQL Database• Column-Family store, modelled on

Google's Big Table paper• Some transactions and locking• Fast range queries and sorting• Built on HDFS

Page 16: If you have the Content, then Apache has the Technology! A whistle-stop tour of the Apache content related projects

Which Apache NoSQL?

• Do you have tuples, documents, variable key/values or complex object?

• Must data always be consistent?• If you loose a chunk of machines

(partition), should read/write still work?• Query by id, range, arbitrary key/value

or map-reduce function?• How much human interaction is

required to add or remove nodes?

Page 17: If you have the Content, then Apache has the Technology! A whistle-stop tour of the Apache content related projects

Apache DB: Derbyhttp://db.apache.org/derby/

• Small, easy to embed SQL database• Can be embedded and accessed via

an embedded JDBC driver• Can be accessed over the network• Can be run entirely in-memory• Efficient on-disk format• Has a JavaME version – run it on basic

cell phones!

Page 18: If you have the Content, then Apache has the Technology! A whistle-stop tour of the Apache content related projects

Apache Directoryhttp://directory.apache.org/

• LDAP Directory• Optimised for many reads per write• Hierarchical, class/attribute based

storage• Triggers, stored procedures, queries

and views• Multi-master replication• Rich permissions model built in

Page 19: If you have the Content, then Apache has the Technology! A whistle-stop tour of the Apache content related projects

Apache JackRabbithttp://jackrabbit.apache.org/

• 1.30pm Thursday• JCR (Java Content Repository)• Hierarchical content store• Supports structured and unstructured

data• Transactional• Support versions• Full text search built in

Page 20: If you have the Content, then Apache has the Technology! A whistle-stop tour of the Apache content related projects

Apache Lucenehttp://lucene.apache.org/

• All day Friday + Meetup Tuesday night• Inverted index store• (Each term lists it documents, rather

than each document listing terms)• Searching is faster than adding• Normally stores text, but additional

data can be associated with it• Can hold indexed and un-indexed data

Page 21: If you have the Content, then Apache has the Technology! A whistle-stop tour of the Apache content related projects

Lucene – What's New?http://lucene.apache.org/

• Lucene and SOLR have merged• Near real-time support when indexing• Better storing of attributes and other

data in the token stream• Numeric fields improved – no need to

externally process numbers into range buckets yourself

• Fast vector highlighter for large docs

Page 22: If you have the Content, then Apache has the Technology! A whistle-stop tour of the Apache content related projects

Apache Subversionhttp://subversion.apache.org/

• Meetup Thursday evening• Versioning content store• Efficient at storing changes• Normally stores code, text and the odd

binary blob• If you have textual data and you want

a versioning store, it's a good fit!• Used by the new Apache CMS

Page 23: If you have the Content, then Apache has the Technology! A whistle-stop tour of the Apache content related projects

Apache Xindicehttp://xml.apache.org/xindice/

• Native XML Database• No need to map your complex XML

files to a different data structure• Ideally suited to problems where you

have large numbers of XML files, and little / no other content

• Schema independent model• XPath queries

Page 24: If you have the Content, then Apache has the Technology! A whistle-stop tour of the Apache content related projects

Transforming andReading Content

Page 25: If you have the Content, then Apache has the Technology! A whistle-stop tour of the Apache content related projects

Apache PDFBoxhttp://pdfbox.apache.org/

• 4pm Wednesday• Read, Write, Create and Edit PDFs• Create PDFs from text• Fill in PDF forms• Extract text and formatting (Lucene,

Tika etc)• Edit existing files, add images, add text

etc

Page 26: If you have the Content, then Apache has the Technology! A whistle-stop tour of the Apache content related projects

Apache POIhttp://poi.apache.org/

• 3pm Wednesday + FastFeatherTrack• File format reader and writer for

Microsoft office file formats• Support binary & ooxml formats• Strong read edit write for .xls & .xlsx• Read and basic edit for .doc & .docx• Read and basic edit for .ppt & .pptx• Read for Visio, Publisher, Outlook

Page 27: If you have the Content, then Apache has the Technology! A whistle-stop tour of the Apache content related projects

Apache Tikahttp://tika.apache.org/

• 9am Friday + Fast Feather Track• Java (+ command line) toolkit for

detecting and extracting content• Identifies what a blob of content is• Gives you consistent metadata back

for it• Parses the contents into plain text,

HTML, XHTML or sax events

Page 28: If you have the Content, then Apache has the Technology! A whistle-stop tour of the Apache content related projects

Tika – What's New?http://tika.apache.org/

• Lots of new parsers – text, office formats, publishing formats, images, audio, CAD, fonts etc

• Long standing parsers improved – better HTML from word for example

• Embedded resources and containers• Use expanding – used by many SOLR

users, Alfresco, lots of people crunching masses of data on Hadoop

Page 29: If you have the Content, then Apache has the Technology! A whistle-stop tour of the Apache content related projects

Apache Cocoonhttp://cocoon.apache.org/

• Component Pipeline framework• Plug together “Lego-Like” generators,

transformers and serialisers• Generate your content once in your

application, serve to different formats• Read in formats, translate and publish• Can power your own “Yahoo Pipes”• Modular, powerful and easy

Page 30: If you have the Content, then Apache has the Technology! A whistle-stop tour of the Apache content related projects

Apache Xalanhttp://xalan.apache.org/

• XSLT processor• XPath engine• Java and C++ flavours• Cross platform• Library and command line executables• Transform your XML• Fast and reliable XSLT transformation

engine

Page 31: If you have the Content, then Apache has the Technology! A whistle-stop tour of the Apache content related projects

Apache XML Graphics: Batikhttp://xmlgraphics.apache.org/#batik

• Java SVG toolkit + library• SVG Parser – read and process

existing SVG files• SVG Generator – Graphics2D

implementation that outputs SVG• SVG Dom – easy way to manipulate

your SVG files• SVG viewer program (Squiggle)• Command line SVG rasteriser

Page 32: If you have the Content, then Apache has the Technology! A whistle-stop tour of the Apache content related projects

Apache XML Graphics: FOPhttp://xmlgraphics.apache.org/#fop

• XSL-FO processor in Java• Reads W3C XSL-FO, applies the

formatting rules to your XML document, and renders it

• Output to Text, PS, PDF, SVG, RTF, Java Graphics2D etc

• Lets you leave your XML clean, and define semantically meaningful rich rendering rules for it

Page 33: If you have the Content, then Apache has the Technology! A whistle-stop tour of the Apache content related projects

Apache Commons: Codechttp://commons.apache.org/codec/

• Commons Track – Thursday Morning• Encode and decode a variety of

encoding formats• Base64, Hex, Phonetic and URLs• Handy when interchanging content

with external systems

Page 34: If you have the Content, then Apache has the Technology! A whistle-stop tour of the Apache content related projects

Apache Commons: Compresshttp://commons.apache.org/compress/

• Commons Track – Thursday Morning• Standard way to deal with archive

formats• Read and write support• zip, tar, gzip, bzip, cpio and ar• Wider range of capabilities than

java.util.Zip• Common API across all formats

Page 35: If you have the Content, then Apache has the Technology! A whistle-stop tour of the Apache content related projects

Apache Commons: Sanselanhttp://commons.apache.org/sanselan/

• Commons Track – Thursday Morning• Pure Java image reader and writer• Fast parsing of image metadata and

information (size, color space, icc etc)• Much easier to use than ImageIO• Slower though, as pure Java• Wider range of formats supported• PNG, GIF, TIFF, JPEG + Exif, BMP,

ICO, PNM, PPM, PSD, XMP

Page 36: If you have the Content, then Apache has the Technology! A whistle-stop tour of the Apache content related projects

GeneratingContent

Page 37: If you have the Content, then Apache has the Technology! A whistle-stop tour of the Apache content related projects

Apache Forresthttp://forrest.apache.org/

• Document rendering solution build on top of cocoon

• Reads in content in a variety of formats (xml, wiki etc), applies the appropriate formatting rules, then outputs to different formats

• Heavily used for documentation and websites

• eg read in a file, format as changelog and readme, output as html + pdf

Page 38: If you have the Content, then Apache has the Technology! A whistle-stop tour of the Apache content related projects

Apache Abderahttp://abdera.apache.org/

• Atom – syndication and publishing• High performance Java

implementation of RFC 4287 + 5023• Generate Atom feeds from Java or by

converting• Parse and process Atom feeds• Atompub server and clients• Supports Atom extensions like

GeoRSS, MediaRSS & OpenSearch

Page 39: If you have the Content, then Apache has the Technology! A whistle-stop tour of the Apache content related projects

Apache Droids (Incubating)http://incubator.apache.org/droids/

• Intelligent Robots!• Generic standalone crawler framework• Easy to extending existing common

crawlers• Easy to write custom ones• Queue requests for content, protocol

handler gets it, multi threaded• Uses Apache Tika for core of handling

fetched resources

Page 40: If you have the Content, then Apache has the Technology! A whistle-stop tour of the Apache content related projects

Apache JSPWiki (Incubating)http://incubator.apache.org/jspwiki/

• Feature-rich extensible wiki• Written in Java (Servlets + JSP)• Fairly easy to extend• Can be used as a wiki out of the box• Provides a good platform for new wiki

based application• Rich wiki markup and syntax• Attachments, security, templates etc

Page 41: If you have the Content, then Apache has the Technology! A whistle-stop tour of the Apache content related projects

Apache ManifoldCF (Incubating)http://incubator.apache.org/connectors/

• Name has changed a few times... (Lucene/Apache Connectors)

• Provides a standard way to get content out of other systems, ready for sending to Lucene etc

• Different goals to CMIS (Chemistry)• Uses many parsers and libraries to talk

to the different repositories / systems• Analogous to Tika but for repos

Page 42: If you have the Content, then Apache has the Technology! A whistle-stop tour of the Apache content related projects

Apache PhotArk (Incubating)http://incubator.apache.org/photark/

• 5pm Thursday• Open Source Photo Gallery application• Standalone or servlet modes• Can host photos locally• Can aggregate external photo albums

(Flickr, Picassa) for a unified view• SCA programming model – uses

Apache Tuscany to power it

Page 43: If you have the Content, then Apache has the Technology! A whistle-stop tour of the Apache content related projects

HostingContent

Page 44: If you have the Content, then Apache has the Technology! A whistle-stop tour of the Apache content related projects

Apache Chemistry (Incubating)http://incubator.apache.org/chemistry/

• 2pm Wednesday• Java, Python and PHP, Atom and WS*• OASIS CMIS (Content Management

Interoperability Services)• Client and Server bindings• “SQL for Content”• Consistent view on content across

different repositories• Read / Write / Manipulate content

Page 45: If you have the Content, then Apache has the Technology! A whistle-stop tour of the Apache content related projects

Chemistry vs ManifoldCFincubator /chemistry/ /connectors/

• ManifoldCF treats repo as nasty black box, and handles talking to the parsers

• Chemistry talks / exposes repo's contents through CMIS

• ManifoldCF supports a wider range of repositories

• Chemistry supports read and write• Chemistry delivers a richer model• ManifoldCF great for getting text out

Page 46: If you have the Content, then Apache has the Technology! A whistle-stop tour of the Apache content related projects

Apache Lenyahttp://lenya.apache.org/

• 9am Thursday• XML Content Management system• Powered by Apache Cocoon• WSIWYG editors onto Relax-NG XML• Rich workflow engine + staging• Clean URLs, CSS for styling• Sensible handling of metadata, assets,

internal links, users, permissions etc

Page 47: If you have the Content, then Apache has the Technology! A whistle-stop tour of the Apache content related projects

Apache Rollerhttp://roller.apache.org/

• Multi-user blog server• Used by the ASF internally• Scales to thousands of users & blogs• Should work with any JavaEE servlet

container and SQL database• Comment moderation and spam filters• Each author has full layout control• Indexes, feeds and Metaweblog API

support for 3rd party clients

Page 48: If you have the Content, then Apache has the Technology! A whistle-stop tour of the Apache content related projects

Apache Shindighttp://shindig.apache.org/

• Open Social Application Container• Hosts your open social widgets• Renders OpenSocial applications into

HTML + JavaScript• Stores the data for your application• Full client-side JavaScript libraries to

deliver gadget functionality• Reference implementation

Page 49: If you have the Content, then Apache has the Technology! A whistle-stop tour of the Apache content related projects

Apache Wookie (Incubating)http://incubator.apache.org/wookie/

• 5.30pm Wednesday• W3C Widgets server• Upload, Deploy and Host Widgets• Widgets can range from a badge,

through a small app to a full-blown collaborative system like chat

• Connector framework to make it easy to write widgets in many languages

Page 50: If you have the Content, then Apache has the Technology! A whistle-stop tour of the Apache content related projects

Web Frameworks

(those with a strong Content focus to them)

Page 51: If you have the Content, then Apache has the Technology! A whistle-stop tour of the Apache content related projects

Apache Slinghttp://sling.apache.org/

• 12pm Wednesday• “Fun” and easy web framework• REST based• Backed by Jackrabbit content repo• Powered by OSGi• Easy to script, supports multiple output

languages (JSP, server side javascript, scala etc)

• Stores both templates and content

Page 52: If you have the Content, then Apache has the Technology! A whistle-stop tour of the Apache content related projects

Apache Tapestryhttp://tapestry.apache.org/

• Object Orientated web applications• Build your application in terms of

objects, methods and properties• Tapestry handles URLs, query

parameters and state for you• Pages built with simple HTML• Concentrate on the content that backs

each part, and the business logic for it• Tapestry glues it together for you

Page 53: If you have the Content, then Apache has the Technology! A whistle-stop tour of the Apache content related projects

Apache Tileshttp://tiles.apache.org/

• Templating framework for Java• Works well with Struts and Shale• Lets you build your page from lots of

tiles (components), which can nest• Build tiles together to make templates• Clean separation between your

content, the business logic to select it, and the rendering rules

Page 54: If you have the Content, then Apache has the Technology! A whistle-stop tour of the Apache content related projects

Apache Velocityhttp://velocity.apache.org/

• Templating engine• MVC webapp or standalone• Can generate HTML, SQL, PostScript,

XML, Java Code or email from templates

• Anakia lets you make a xdoc file available to a velocity template, handy when generating HTML from xdoc

• Fairly rich templating language

Page 55: If you have the Content, then Apache has the Technology! A whistle-stop tour of the Apache content related projects

Apache Wickethttp://wicket.apache.org/

• Build your web applications in Java• Uses Java in preference to JavaScript,

CSS etc• Handy if you have a strong Java team

and you need to do some web stuff• Fits well with your Java components• But JS / CSS front end devs tend to be

cheaper than Java ones....

Page 56: If you have the Content, then Apache has the Technology! A whistle-stop tour of the Apache content related projects

Apache Clerezza (Incubating)http://incubator.apache.org/clerezza/

• OSGi based modular semantic web application framework

• Lets you build applications that fit into the Semantic Web

• Stores and easily manipulates RDF• Full control over REST and URIs• Build applications that both consume

semantic data (eg RDF files), and that expose content to others

Page 57: If you have the Content, then Apache has the Technology! A whistle-stop tour of the Apache content related projects

Any Questions?

Any cool projects thatI happened to miss?