avero mapper java
Post on 20-Dec-2015
23 Views
Preview:
DESCRIPTION
TRANSCRIPT
Overview
• A data serialization system.
• An RPC framework.
• For: storage & comm.
• Purpose:
– Provide rich data structures.
– A compact and fast binary data format.
– Simple integration with dynamic languages.
Overview
• Avro uses JSON for Interface Description Language (IDL).
– To specify data types.
– To specify protocols.
• Review: JavaScript Object Notation is just a light-weight text-based standard for data interchange.
Why the need for Avro?
• Primary usage in Hadoop, provides standard:
1. Serialization format for persistent data.
2. Wire format for communication ..
• .. among Hadoop nodes.
• .. from client programs to Hadoop services.
Overview
• Avro relies on schemas.
– Schema stored with data.
– Each datum written with no per-value overheads.
• Thus serialization is fast and small.
• Avro in RPC:
– Schema exchange during client-server handshake.
– Correspondence in fields can be easily resolved.
Comparison with other systems
• Avro vs. Protobuf and Thrift.
• A quick note about Thrift:
– Initially developed at Facebook by a Google intern.
– Closer to Google’s protobuf.
Comparison with other systems
Avro Google protobuf Thrift
Implementation Hmm.. Cleaner Hmm..
Error handling Complex Simple OK
Extensibility Hmm.. Richer OK
Compatibility Java, C, C++, C#, Python and Ruby
That and much more such as Adobe Actionscript, Microsoft Silverlight, etc.
About the same as protobuf
Specification
• Schema represented in one of: – JSON string, naming a defined type.
– JSON object of the form: • {"type": "typeName" ...attributes...}
– JSON array
• Primitive types: null, boolean, int, long, float, double, bytes, string – {"type": "string"}
• Complex types: records, enums, arrays, maps, unions, fixed
Specification, example protocol
{
"namespace": "com.acme",
"protocol": "HelloWorld",
"doc": "Protocol Greetings",
"types": [
{"name": "Greeting", "type": "record", "fields": [
{"name": "message", "type": "string"}]},
{"name": "Curse", "type": "error", "fields": [
{"name": "message", "type": "string"}]}
],
"messages": {
"hello": {
"doc": "Say hello.",
"request": [{"name": "greeting", "type": "Greeting" }],
"response": "Greeting",
"errors": ["Curse"]
}
}
}
SASL profile
• Simple Authentication and Security Layer.
• Provides a framework for
– Authentication.
– Security of network protocols.
SASL usage
• Negotiation procedure to use connection-oriented Avro RPC:
– 0: START Used in a client's initial message.
– 1: CONTINUE Used while negotiation is ongoing.
– 2: FAIL Terminates negotiation unsuccessfully.
– 3: COMPLETE Terminates negotiation sucessfully.
References
1. Apache Avro, http://avro.apache.org/docs/current/
2. Google protocol buffers vs Apache Avro, http://www.sammur.com/?p=36
3. Avro vs Thrift, http://tech.puredanger.com/2011/05/27/serialization-comparison/
4. SASL, http://avro.apache.org/docs/current/sasl.html
top related