parsing binaries and protocols with erlang

31
 “Parsing binaries and protocols with erlang ?!” http://developers.hover.in Bhasker V Kode co-founder  & CTO at hover.in at foss.in December 4th, 2009 Bhasker V Kode co-founder  & CTO at hover.in at foss.in December 4th, 2009

Upload: bhasker-kode

Post on 22-Jun-2015

3.867 views

Category:

Technology


2 download

DESCRIPTION

Delivered by Bhasker V Kode at foss.in/2009 Official talk page at http://foss.in/2009/schedules/talkdetailspub.php?talkid=17 Erlang 's support for handling binaries and pattern matching make it a great choice for parsing everything from IPv4 packets, to payloads from the Memcached protocol, SWF files, or databases like Tokyo Cabinet. From a functional programming perspective, there are various ways of building these parsers, taking advantage of the concurrent and recursive nature that is inherent to the language and other challenges which have been gathered while validating the storage & retrieval options for our distributed crawler, and submitting patches to projects like Medici & Tora ( erlang based Tokyo Cabinet clients). The talk will also touch upon Tokyo cabinet's support for mapreduce with Lua, and notes from building your own custom formats & our internal mapreduce'esque and caching frameworks used in building a multi-million impression platform utilizing under a gig of RAM per node. Notes on: - trends in disk/memory/bandwidth - why erlang, RAM, binaries - garbage collection in the erlang VM - message passing - use-cases

TRANSCRIPT

Page 1: Parsing binaries and protocols with erlang

   

“Parsing binaries and protocols with erlang ?!”

http://developers.hover.in

Bhasker V Kodeco­founder  & CTO at hover.in

at foss.inDecember 4th, 2009

Bhasker V Kodeco­founder  & CTO at hover.in

at foss.inDecember 4th, 2009

Page 2: Parsing binaries and protocols with erlang

   

“WHY ... ?!”

foss.in/2009                                                                                        http://developers.hover.in

Page 3: Parsing binaries and protocols with erlang

   

“BUT I'm BUILDING webapps !?!”

foss.in/2009                                                                                        http://developers.hover.in

Page 4: Parsing binaries and protocols with erlang

   

“Everything's quick enough :D”

foss.in/2009                                                                                        http://developers.hover.in

Page 5: Parsing binaries and protocols with erlang

   

foss.in/2009                                                                                        http://developers.hover.in

“doh!”

Page 6: Parsing binaries and protocols with erlang

   

“ha! ofcourse i knew that...err.... but people scale...that's what they do ..... that's our way out !!! scaling out ...scaling up ...auto scaling even...!!!: O ”

foss.in/2009                                                                                        http://developers.hover.in

Page 7: Parsing binaries and protocols with erlang

   

“scale UP ...!more RAM seems to stop those stall those silly CPU­unit warnings my hosting provider gives...

bring on those infinite loops & polling crons. RealTimeWeb FTW!”

foss.in/2009                                                                                        http://developers.hover.in

Page 8: Parsing binaries and protocols with erlang

   

“scaling OUT , maybe with a distributed filesystemand figure out a way for nodes to talk, and... Replication... and location transparency during weekends... and  commodity hardware which i can't pay for ”

foss.in/2009                                                                                        http://developers.hover.in

Page 9: Parsing binaries and protocols with erlang

   

More data becoming archival NOT by choice, but forced to. 

Not pushed to handling streams of data well ( even hadoop!) #bigdata 

If you're not compromising, you're not pushing enough. Disk's loss must be some else's gain. fixed­length eg's at fb, twitter, google

foss.in/2009                                                                                        http://developers.hover.in

Page 10: Parsing binaries and protocols with erlang

   

Erlang for RAMon the web is the new

Embedded C

foss.in/2009                                                                                        http://developers.hover.in

Page 11: Parsing binaries and protocols with erlang

   

“THE NEWS TODAY. Once popular retro format 'binary' continues to go unnoticed after brief sightings on wallpapers during the matrix trilogy ....”pssst! in files of any mime/content typein db's that accept binaryin RAM, via caching enginescompact for n/w transfer & storagethe answer to unicode

foss.in/2009                                                                                        http://developers.hover.in

Page 12: Parsing binaries and protocols with erlang

   

“fine! Binaries are everywhere, disk's are not keeping up, and i've got more cores on my nodes every year.”

foss.in/2009                                                                                        http://developers.hover.in

Page 13: Parsing binaries and protocols with erlang

   

“But i'm not still not going near a  strict, dynamically typed functional programming language with support for concurrency, communication, and distribution, automatic memory management & supports multiple platforms !!!”

foss.in/2009                                                                                        http://developers.hover.in

Page 14: Parsing binaries and protocols with erlang

   

Erlang!!!

over­rated ?                     OR

under­appreciated ?

“ [ 87, 84, 70]  :O !”

foss.in/2009                                                                                        http://developers.hover.in

Page 15: Parsing binaries and protocols with erlang

   

foss.in/2009                                                                                        http://developers.hover.in

What happens when you start a erlang shell  . SMP did'nt exist before erlang build R11 ('06)

Page 16: Parsing binaries and protocols with erlang

   

“ahh... so processes are pseudo threads in the erlang VM that are light weight & the base of erlang programs having their own heap or message inbox & are meant for message passing erlang primitaves. Also the developer can configure how many cores are used based on # of schedulers, which run process's.foss.in/2009                                                                                        http://developers.hover.in

Page 17: Parsing binaries and protocols with erlang

   

foss.in/2009                                                                                        http://developers.hover.in

Max of 1024 schedulers can be set =>  your erlang src today should utilize box's upto 1024 cores

Page 18: Parsing binaries and protocols with erlang

   

Let M=  msgs to random usersLet N= 100,000 usersRoute M msgs to right N users !typical one­node approach : for i to M  for j to N     if match, add_update

actor approach: N concurrent processes listening to all msgs As new msg arrives, msg pass to all N pidsin each concurrent process: if match, add_update

foss.in/2009                                                                                        http://developers.hover.in

Page 19: Parsing binaries and protocols with erlang

   

foss.in/2009                                                                                        http://developers.hover.in

3 papers to rule them all & 1 garbage collection method to free them!

Page 20: Parsing binaries and protocols with erlang

   

foss.in/2009                                                                                        http://developers.hover.in

3 papers to rule them all & 1 garbage collection method to free them!

Page 21: Parsing binaries and protocols with erlang

   

foss.in/2009                                                                                        http://developers.hover.in

3 papers to rule them all & 1 garbage collection method to free them!

Page 22: Parsing binaries and protocols with erlang

   

foss.in/2009                                                                                        http://developers.hover.in

EUREKA!!! we have a winner 

Page 23: Parsing binaries and protocols with erlang

   

“ahh... so this is what the no shared memory in erlang, or light weight process's being garbage collected easily since they dont have references to data in each other's process heap, & messages  copied or shared based on it's size, likelihood of reuse and also optimized for binary. tellmemore!!”

foss.in/2009                                                                                        http://developers.hover.in

Page 24: Parsing binaries and protocols with erlang

   

“How do you spawn a process?”

foss.in/2009                                                                                        http://developers.hover.in

Page 25: Parsing binaries and protocols with erlang

   

“Where can you spawn a process?”

foss.in/2009                                                                                        http://developers.hover.in

Page 26: Parsing binaries and protocols with erlang

   

“Can a spawned process talk back to the callee?”

foss.in/2009                                                                                        http://developers.hover.in

Page 27: Parsing binaries and protocols with erlang

   

“Can a spawned process listen as long as i want it to?”

“Can a spawned process stop listening when I want it to?”

“Can a spawned process spawn more processes?”

foss.in/2009                                                                                        http://developers.hover.in

Page 28: Parsing binaries and protocols with erlang

   

“So though erlang gives a library called OTP & a db called mnesia for making life easier ­ you can parse or create binaries easily, make client­server programs, distributed rpc calls, tail­recursive servers, message/priority queue's for flowcontrol, talk to ports and other lang's, or create any data structure explicitly (a) in­memory (b)on­disk of any connected node!foss.in/2009                                                                                        http://developers.hover.in

Page 29: Parsing binaries and protocols with erlang

   

“show me the demo's”● Process related

– Message queue's , Client – server– RPC , Timeouts

● Binary

– Binary pattern matching, Parse swf/mp3 for metadata– Networking, comm. with C, Tokyocabinet client eg.

● Process + Binary!

– Building a production ready in­memory CDN consistently faster than Am4z0n cl0udfr0nt, in stagesopen & gzip < concat js's < inmemory < streaming?

foss.in/2009                                                                                        http://developers.hover.in

Page 30: Parsing binaries and protocols with erlang

   

“Binary pattern matching ?”

<<Value:Size/Type­Signedness­Endianism­unit:Unit>>

<<1:32>> = <<0,0,0,1>.<<1:32/unsigned-little>> = <<1,0,0,0>.<<_:8,“mnesia”/binary>> = <<”Amnesia”>>.

So <<Bin>> could be unicode characters ( English, hindi, tamil ) or JPG's or http headers or basically segments of binaries

NewBinary=<<Segment1,Segment2>>.

foss.in/2009                                                                                        http://developers.hover.in

Page 31: Parsing binaries and protocols with erlang

   

summary of tech at hover.in● LYME stack since ~dec 07 , 4 (­1) nodes (64bit 4GB)● python crawler + associated NLP parsers, index's now 

in tokyo cabinet, inverted index's in erlang 's mnesia db with binaries of 5 diff indian languages + multiple content­types, cpu time­splicing algo's, priority queue's for heat­seeking algo, flowcontrol, caching engines, cyclic queues, map­reduces with non­blocking gathers, headless­firefox for thumbnails, patches to tokyocabinet client 'medici'

● Beta in Jan 09, 1 million hovers/month in May'09● 2­4 developers + several interns across ~2 years 

foss.in/2009                                                                                        http://developers.hover.in