erlang big data and the wild west - files.meetup.com big data... · interesting open source...

12
1/31/2014 Erlang, Big Data, and the Wild West Chris Brown and Brett Cameron January 2014 AGENDA Introductions (about us and about the talk) Some history (how we became involved with Erlang) too BIG to IGNORE; The Business Case for Big Data Comments on Erlang and Big Data Summary/conclusions Questions Beer and pizza

Upload: others

Post on 11-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Erlang Big Data and the Wild West - files.meetup.com Big Data... · interesting Open Source projects involving Erlang and Erlang-based software products such as RabbitMQ. He is responsible

1/31/2014

Erlang, Big Data, and the Wild West

Chris Brown and Brett Cameron January 2014

AGENDA • Introductions (about us and about the talk)

• Some history (how we became involved with Erlang)

• too BIG to IGNORE; The Business Case for Big Data

• Comments on Erlang and Big Data

• Summary/conclusions

• Questions

• Beer and pizza

Page 2: Erlang Big Data and the Wild West - files.meetup.com Big Data... · interesting Open Source projects involving Erlang and Erlang-based software products such as RabbitMQ. He is responsible

1/31/2014

3

About us – Chris Brown

Chris Brown currently works as a consultant for OCF (http://www.ocf.co.uk), developing new lines of business in Big Data and Analytics. Chris has spent over 30 years in the IT industry, including 12 years as a UK company director and 11 years working for DEC/Compaq/HP as a strategist and Director of Strategy for the DEC/Compaq/HP OpenVMS Software group. Chris started his working life (many years ago) as an applications programmer and although he has not written any code for many years Chris still likes to think of himself as a software engineer at heart, even though much of his recent activity has been centred more around hardware technologies than software. Much of Chris' time with DEC/Compaq/HP was spent travelling the world engaging with customers, acting as an advocate for the company, a customer champion, and as a communications channel between customers and internal DEC/Compaq/HP engineering groups. Coupled with this extensive customer interaction Chris has also had numerous dealings with some of the major global IT players, including IT companies such as Oracle, EMC, Red Hat, Sun, and Microsoft, and consultancy companies such as Accenture and HP Enterprise Services (formerly EDS). Chris has become a trusted advisor to many of the customers with whom he has interacted. The diversity of this work has helped Chris to develop a rather unique and well-balanced view of the IT industry and has also provided him with many interesting stories and fascinating encounters (some of which are even IT related). Chris is now enjoying applying his broad experience to the development of business opportunities and the provision of strategic guidance in emerging and often poorly defined areas such as Cloud Computing and Big Data. In his spare time Chris enjoys listening to music and watching rugby.

4

About us – Brett Cameron

Brett Cameron currently works as a senior software architect with HP’s corporate Cloud Services group, focusing on the design and implementation of message queuing and related integration services for customers and for internal use. Brett lives in Christchurch, New Zealand and has worked in the software industry for over 20 years. In that time he has gained experience in a wide range of technologies, many of which have long since been retired to the software scrapheap of dubious ideas. Brett has been involved in the research and development of low-latency and highly scalable messaging solutions for the Financial Services sector running on HP platforms and as a consequence of this work he has participated in several interesting Open Source projects involving Erlang and Erlang-based software products such as RabbitMQ. He is responsible (or should that be irresponsible) for porting various Open Source solutions (including Erlang, RabbitMQ, and most of Riak) to HP’s legacy OpenVMS operating system. Brett holds a doctorate in chemical physics from the University of Canterbury, and still maintains close links with the University, delivering guest lectures and acting as an advisor to the Computer Science and Electronic and Computer Engineering departments on course structure and content. In his spare time Brett enjoys listening to music, playing the guitar, and drinking beer.

Page 3: Erlang Big Data and the Wild West - files.meetup.com Big Data... · interesting Open Source projects involving Erlang and Erlang-based software products such as RabbitMQ. He is responsible

1/31/2014

5

About the talk… “Big Data” seems to be an unavoidable business buzz phrase these days. According to Wikipedia, Big data is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications. This definition seems reasonable; however the term is badly overloaded and surrounded by considerable industry hype, and accordingly there continues to be considerable debate and confusion about what Big Data is and is not. For example, a recent survey of 154 companies found that nearly 70% used a volume-based definition for Big Data; 25% defined Big Data as 'massive growth of transaction data'; 24% thought Big Data referred to new technologies for managing massive data; and 19% defined it as the 'requirement to store and archive data for regulatory compliance'. In this talk the speakers will share their opinions about what Big Data really means and will consider the evolving role that Erlang is playing in the Big Data space and in Cloud computing. The speakers will also briefly discuss how they came to be involved with Erlang by porting it to HP's legacy OpenVMS operating system, the story behind this work, and some of the technical challenges that were encountered along the way.

AGENDA • Introductions (about us and about the talk)

• Some history (how we became involved with Erlang)

• too BIG to IGNORE; The Business Case for Big Data

• Comments on Erlang and Big Data

• Summary/conclusions

• Questions

Page 4: Erlang Big Data and the Wild West - files.meetup.com Big Data... · interesting Open Source projects involving Erlang and Erlang-based software products such as RabbitMQ. He is responsible

1/31/2014

7

The Journey (November 4th 2008)

8

Our introduction to Erlang

• We wanted an AMQP implementation on OpenVMS

o AMQP is/was gaining considerable traction within the Financial Services industry

o OpenVMS is heavily used in Financial Services

o Also wanted alternatives to DEC MessageQ (now owned by Oracle) and MQSeries

• Heavily used on OpenVMS across multiple industry segments

• We had previously ported OpenAMQ but iMatix had ceased to support it and were focusing on ZeroMQ

• Chris basically dared me to port Erlang to OpenVMS

o This would allow us to use RabbitMQ... and other Erlang-based applications for that matter

• So how hard could it be?

Page 5: Erlang Big Data and the Wild West - files.meetup.com Big Data... · interesting Open Source projects involving Erlang and Erlang-based software products such as RabbitMQ. He is responsible

1/31/2014

9

Our introduction to Erlang

• OpenVMS is not at all UNIX-like

o Has a good and fairly standards-compliant C compiler

o Some deficiencies in the OpenVMS C RTL

• Erlang beam and associated components are written in C

o Appreciable codebase

o Approximately 400,000 lines of code (*.c, *.h) in total

o OTP libraries comprise about 2M lines of code

Working on the port whilst on holiday in Majorca

10

Our introduction to Erlang

• There is no UNIX-like fork() function on OpenVMS

o Luckily only a small number of fork() calls in the Erlang code

• Basically just fork()/exec() sequences that start other processes like inet_gethost

and set up a pipe for communication with parent

• Could work around on OpenVMS using vfork()/exec() sequences

• You can’t use fcntl() on OpenVMS to toggle sockets blocking/non-blocking

o Simple case of replacing any such calls with appropriate ioctl() calls

• poll()/select() only work with sockets on OpenVMS

o Had to implement special versions to handle other types of file descriptor

o These wrappers probably have performance implications...

Something like 91 changes (via conditional compilation) required to C code (beam and other

components)

Page 6: Erlang Big Data and the Wild West - files.meetup.com Big Data... · interesting Open Source projects involving Erlang and Erlang-based software products such as RabbitMQ. He is responsible

1/31/2014

11

Our introduction to Erlang • All up the port took a few weeks effort (working on and off)… and a lot of beer

o Porting new releases now just takes a few hours

• Latest port is 16A

o Now able to run RabbitMQ, most of Riak, Yaws, ...

• To avoid having to make excessive changes to any of the standard libraries and OTP code, I made OpenVMS look to Erlang/OTP like a variant of UNIX

o Wherever the Erlang checks the operating system type it thinks it’s a variant of UNIX

o And it doesn’t much care (except in a small number of cases) what the specific variant is

• A few remaining things to do

o Need to properly sort out SMP support

o Use 64-bit pointers

o … but unfortunately HP is now retiring OpenVMS!

$ erl

Eshell V5.7 (abort with ^Z)

1> os:type().

{unix,openvms}

2>

Also done ports to AIX and HP-UX (Itanium). Need to see about

resurrecting this work...

12

Example – some OTP changes cmd(Cmd) ->

validate(Cmd),

case type() of

{unix, openvms} ->

Command = lists:concat(["doit ", Cmd]),

Port = open_port({spawn, Command}, [stream, in, eof, hide]),

get_data(Port, []);

{unix, _} ->

unix_cmd(Cmd);

{win32, Wtype} ->

Command = case {os:getenv("COMSPEC"),Wtype} of

{false,windows} -> lists:concat(["command.com /c", Cm

{false,_} -> lists:concat(["cmd /c", Cmd]);

{Cspec,_} -> lists:concat([Cspec," /c",Cmd])

end,

Port = open_port({spawn, Command}, [stream, in, eof, hide]),

get_data(Port, [])

end.

%% For OpenVMS we use ^Z

io:fwrite(<<"Eshell V~s (abort with ^Z)\n">>,

[erlang:system_info(version)])

Page 7: Erlang Big Data and the Wild West - files.meetup.com Big Data... · interesting Open Source projects involving Erlang and Erlang-based software products such as RabbitMQ. He is responsible

1/31/2014

13

$ RABBITMQ_MNESIA_DIR = "/rabbitmq$root/mnesia"

$ RABBITMQ_LOGS = "/rabbitmq$root/log"

$ RABBITMQ_SASL_LOGS = "/rabbitmq$root/log"

$ RABBITMQ_NODENAME = "rabbit@ccin02"

$ RABBITMQ_NODE_PORT = 5672

$ RABBITMQ_NODE_IP_ADDRESS = "16.156.32.108"

$ RABBITMQ_ENABLED_PLUGINS_FILE = "/rabbitmq$root/sbin/enabled_plugins.dat"

$ RABBITMQ_PLUGINS_DIR = "/rabbitmq$root/plugins"

$ RABBITMQ_PLUGINS_EXPAND_DIR = "/rabbitmq$root/etc"

$

$ define decc$fd_locking 1

$

$ erl :== $erlang$root:[bin]erlexec.exe

$ erl -

"-pa" "/rabbitmq$root/ebin" -

"-pa" "/erlang$root/lib/mnesia/ebin" -

"-pa" "/erlang$root/lib/os_mon/ebin" -

"-pa" "/erlang$root/lib/sasl/ebin" -

"-pa" "/erlang$root/lib/kernel/ebin" -

"-pa" "/erlang$root/lib/ssl/ebin" -

"-pa" "/erlang$root/lib/stdlib/ebin" -

"-noinput" -

"-emu_args" -

"-boot" "start_sasl" -

"-s" "rabbit" -

"-config" "/rabbitmq$root/sbin/rabbitmq.config" -

"-sname" "''RABBITMQ_NODENAME'" -

"+W" "w" -

"+A30" -

"-rabbit" "enabled_plugins_file" """''RABBITMQ_ENABLED_PLUGINS_FILE'""" -

"-rabbit" "plugins_dir" """''RABBITMQ_PLUGINS_DIR'""" -

"-rabbit" "plugins_expand_dir" """''RABBITMQ_PLUGINS_EXPAND_DIR'""" -

"-rabbit" "tcp_listeners" "[{""''RABBITMQ_NODE_IP_ADDRESS'"",''RABBITMQ_NODE_PORT'}]" -

"-kernel" "error_logger" "{file,""''RABBITMQ_LOGS'/rabbit.log""}" -

"-sasl" "errlog_type" "error" -

"-sasl" "sasl_error_logger" "{file,""''RABBITMQ_SASL_LOGS'/sasl.log""}" -

"-mnesia" "dir" """''RABBITMQ_MNESIA_DIR'"""

$

$ exit

Assorted symbol definitions (basically the same as environment variables)

You really don't want to know what this is for

Normally you run something on OpenVMS by typing “run” followed by the name of the program. To make things work more like UNIX, we need to define a foreign command.

And this is our command line to start the RabbitMQ broker on OpenVMS. Not all of the double quotes are strictly necessary, but we do need to be careful to preserve case.

Example – spinning up RabbitMQ on VMS

14

Example – spinning up RabbitMQ on VMS $ show system

OpenVMS V8.3-1H1 on node CCIN02 29-FEB-2012 13:20:01.44 Uptime 214 21:54:49

Pid Process Name State Pri I/O CPU Page flts Pages

00000401 SWAPPER HIB 16 0 0 00:09:21.55 0 4

00000404 USB$UCM_SERVER HIB 6 331 0 00:00:00.09 253 362

00000405 LANACP HIB 14 86 0 00:00:00.00 168 214

00000407 FASTPATH_SERVER HIB 10 8 0 00:00:00.00 108 134

00000408 IPCACP HIB 10 8 0 00:00:00.00 78 109

00000409 ERRFMT HIB 8 1132537 0 00:00:39.30 167 203

0000040B OPCOM HIB 8 26408 0 00:00:01.69 5305 93

0000040C AUDIT_SERVER HIB 10 403 0 00:00:00.07 170 215

0000040D JOB_CONTROL HIB 9 4309528 0 00:03:08.45 124 189

00000411 QUEUE_MANAGER HIB 9 17591 0 00:00:02.72 201 275

00000412 SECURITY_SERVER HIB 10 74163 0 00:00:00.31 488 643

00000413 ACME_SERVER HIB 10 76 0 00:00:01.56 411 550 M

00000415 DNS$ADVER LEF 5 3824980 0 00:02:48.65 842 954

00000416 LES$ACP_V30 HIB 8 132 0 00:00:00.01 123 146

00000417 NET$ACP HIB 6 978 0 00:00:00.08 235 275

00000418 REMACP HIB 10 40 0 00:00:00.00 77 79

00000419 NET$EVD HIB 6 44 0 00:00:03.02 262 517

0000041A TP_SERVER HIB 10 13 0 00:00:00.00 151 182

00000421 TCPIP$INETACP HIB 10 29160 0 00:05:04.19 460 439

00000422 TCPIP$FTP_1 LEF 10 32560 0 00:00:01.00 1045 743 N

00000423 SYMBIONT_1 HIB 6 104 0 00:00:00.10 763 138

00000424 SYMBIONT_2 HIB 6 43938 0 00:00:00.17 1025 209

00000428 RDMS_MONITOR72 LEF 15 32147 0 00:00:03.13 16704 212

0000042A ACMS_SWL HIB 9 184 0 00:00:00.03 123 164

0000042B WSI$MANAGER HIB 8 22602508 0 00:35:28.14 20751 7567 M

00000434 SMHANDLER HIB 8 52 0 00:00:00.00 247 246

00018086 EPMD$SERVER LEF 6 1262813 0 00:00:00.33 1125 375

001BE53C MEMCACHE$SERVER LEF 5 6735644 0 00:00:18.55 8496 6704

001C314F CAMERON CUR 0 4 1861 0 00:00:00.23 1638 310

000D8175 RabbitMQ HIB 6 2148 0 00:00:00.13 1039 381

000D2177 CAMERON_41642 HIB 4 98136883 0 00:23:58.72 3816 1649 MS

000CB17A CAMERON_53419 HIB 6 850 0 00:00:00.03 300 287 S

000D197B CAMERON_49049 LEF 6 515 0 00:00:00.02 440 421 S

The empd processes had been

started previously

RabbitMQ-related processes. The one with the big numbers is beam.exe.

Page 8: Erlang Big Data and the Wild West - files.meetup.com Big Data... · interesting Open Source projects involving Erlang and Erlang-based software products such as RabbitMQ. He is responsible

1/31/2014

AGENDA • Introductions (about us and about the talk)

• Some history (how we became involved with Erlang)

• too BIG to IGNORE; The Business Case for Big Data

• Comments on Erlang and Big Data

• Summary/conclusions

• Questions

16

Over to Chris...

Page 9: Erlang Big Data and the Wild West - files.meetup.com Big Data... · interesting Open Source projects involving Erlang and Erlang-based software products such as RabbitMQ. He is responsible

1/31/2014

AGENDA • Introductions (about us and about the talk)

• Some history (how we became involved with Erlang)

• too BIG to IGNORE; The Business Case for Big Data

• Comments on Erlang and Big Data

• Summary/conclusions

• Questions

18

The Disco project

Erlang and Big Data – some examples

• See http://discoproject.org/, http://discodb.readthedocs.org/en/latest/

• A lightweight, open-source framework for distributed computing based on the MapReduce paradigm

• Python, C (DiscoDB), Erlang (the Disco core is written in Erlang)

• Created by Nokia Research Center in 2008 to solve real challenges in handling massive amounts of data

o Now used by Nokia and many other companies for a variety of purposes

• Log analysis, probabilistic modeling, data mining, and full-text indexing

• See also https://erlangcentral.org/dancing-with-big-data-disco-inferno/

Page 10: Erlang Big Data and the Wild West - files.meetup.com Big Data... · interesting Open Source projects involving Erlang and Erlang-based software products such as RabbitMQ. He is responsible

1/31/2014

19

LBD in-memory database

Erlang and Big Data – some examples

• As discussed by Chris

• See

o https://www.bugsense.com/

o http://blog.bugsense.com/post/47189059216/erlang-powered-big-data-for-mobile-ldb-presentation

o http://highscalability.com/blog/2012/11/26/bigdata-using-erlang-c-and-lisp-to-fight-the-tsunami-of-mobi.html

• Error-reporting and quality metrics service that tracks thousand of apps every day

• When mobile apps crash, BugSense helps developers pinpoint and fix the problem

• LDB powers BugSense and analyses data coming from more than 200 million mobile devices, in realtime

• Erlang, C, Lisp...

20

Erlang and Big Data – some examples

• Riak (of course)

o http://www.basho.com/

o Applicable to many big data problems

o Recent nice win: http://www.theregister.co.uk/2013/10/10/nhs_drops_oracle_for_riak/

• Solution also uses RabbitMQ

• Game Analytics (http://www.gameanalytics.com/)

o Developing a next-generation analytics platform for the gaming industry

o Real-time stream processing analytical engine implemented in Erlang

• VoltDB

o http://voltdb.com/

o http://voltdb.com/877000-tps-with-erlang-and-voltdb/

o Erlang VoltDB driver developed by Eonblast Corporation

• A games company

• See http://www.eonblast.com/

o Perhaps not so much Big Data; more OLTP, but interesting nonetheless

• Should mention IoT…

o A $19 trillion opportunity according to Cisco CEO

o Lot’s of opportunity for Erlang in this space

And a few others...

Page 11: Erlang Big Data and the Wild West - files.meetup.com Big Data... · interesting Open Source projects involving Erlang and Erlang-based software products such as RabbitMQ. He is responsible

1/31/2014

AGENDA • Introductions (about us and about the talk)

• Some history (how we became involved with Erlang)

• too BIG to IGNORE; The Business Case for Big Data

• Comments on Erlang and Big Data

• Summary/conclusions

• Questions

22

Summary • Big Data is really about extracting full value from data

o The big ideas that data can provide or the big insights that data can bring that we weren't getting before

o Big Data is about using only your valuable data; not just managing crushing amounts of data

• Volume doesn't really provide value by itself

• Not every business requires the capability to stream petabytes of data in real-time

• Erlang is a powerful tool in the Big Data space and usage is increasing

o Distributed processing

o Concurrency

o ...

• Porting Erlang to exotic platforms when you don't know anything about Erlang makes for a steep learning curve!

• “It doesn’t matter how bad your day has been, good things can happen with the addition of a little alcohol and the company of a good friend”

Page 12: Erlang Big Data and the Wild West - files.meetup.com Big Data... · interesting Open Source projects involving Erlang and Erlang-based software products such as RabbitMQ. He is responsible

1/31/2014

AGENDA • Introductions (about us and about the talk)

• Some history (how we became involved with Erlang)

• too BIG to IGNORE; The Business Case for Big Data

• Comments on Erlang and Big Data

• Summary/conclusions

• Questions

Woof (thank you)!