big data innovation, issue 4

32
1

Upload: george-hill

Post on 09-Mar-2016

218 views

Category:

Documents


0 download

DESCRIPTION

Issue 4 of Big Data Innovation Magazine

TRANSCRIPT

Page 1: Big Data Innovation, Issue 4

1

BUILDING A DATA TEAMDavid Barton talks to Pamela Peele,Chief Analytics Officer at UPMC

A DIFFERENT PERSPECTIVESean Patrick Murphygives us his unique views on Big Data

FINDING THE KEY DATA

Andrew Claster, from Obama for America discusses the use of

Big Data in the re-election campaign

Page 2: Big Data Innovation, Issue 4

2

SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. © 2013 SAS Institute Inc. All rights reserved. S112807US.0813

Go full throttle with big data.

Want to get to relevant data quicker? Buckle up. With SAS® High-Performance Analytics, you can reduce your big data analysis from days and hours to just minutes and seconds. Then, use that extra time to pre-dict and solve the toughest business problems – while your competitors are still spinning their wheels.

sas.com/big-data-infoto learn more about SAS big data solutions.

112807-big-data-innovation-summit-print-ad.indd 1 8/14/13 9:29 AM

Page 3: Big Data Innovation, Issue 4

3

Welcome to this issue of Big Data Innovation, whether you are reading the printed version or online, we hope you enjoy this edition as much as we have enjoyed putting it together.

With growing appreciation of the benefits of Big Data, its popularity is only going to in-crease, and we hope the skills gap that or-ganizations suffer from will shrink and even-tually close completely.

In this issue we have worked with the sharp-est minds currently working in Big Data. We wanted to know their thoughts on the indus-try and how it will look in the future.

We speak to Sean Murphy, Pamela Peele, Gregory Shapiro-Piatesky and Kirk Borne about their thoughts. Each have their own thoughts and experiences to draw on and give a unique and interesting perspective on how we can get round the gap currently affecting the industry.

In addition to this, we also talk to Andrew Claster about his experiences using big data in the Obama re-election campaign.

I hope you enjoy this issue and if you want to contribute to future issues or feel that you have a unique perspective or response to anything you read in here, please email me.

George HillManaging Editor

[email protected]

§

Managing EditorGeorge Hill

PresidentJosie King

Art DirectorsGavin Bailey & Joanna Violaris

Assistant EditorChloe Thompson

AdvertisingHannah Sturgess

ContributorsDan MillerChris TowersDavid BartonHeather James

All [email protected]

Letter From The Editor

Page 4: Big Data Innovation, Issue 4

4

6 David Barton looks at how Pamela Peele has built a team around her big data needs

10 Chris Towers talks to Gregory Piatetsky-Shapiro about his take on current big data education and how it could be improved

16 Heather James talks data with Kirk Borne, discussing current issues at college level and before with one of the world’s leading big data professors

22 Sean Patrick Murphy, Senior Scientist at John Hopkins Uni-versity talks to George Hill about his unique perspectives on data and approaches to it in education

Contents

27 Daniel Miller talks to Andrew Claster, Deputy Chief Analytics Officer at Obama for America about his use of analytics in the Obama re-election campaign

Page 5: Big Data Innovation, Issue 4

5

Page 6: Big Data Innovation, Issue 4

6

When we are looking at the big data skills gap and education, one of the most important aspects to look at is how it is affecting individual indus-tries.

Pamela Peele is the Chief Analytics Officer at UPMC and I have had dis-cussions with her in the past about how she has managed to build ef-fective big data teams and what is needed to create an effective part-nership.

Building Teams in Big Data:An interview with Pamela Peele, Chief Analytics Officer at UPMC

David Barton

5

Page 7: Big Data Innovation, Issue 4

7

Pamela believes the most impor-tant aspect of her big data team is having an analytics leader who can create the data strategy and implementation within the scope of the company whilst also tak-ing a leading role in the hands on analysis.

Many companies tend to create teams revolving around technical-ly minded people, which can often create business problems in the future. Having an analytics lead-er with not only leadership skills but also business and technical awareness allows the team to be effectively steered. Having a team of technically minded people can

mean that busi-ness problems are often looked at in the wrong ways, for in-stance data driv-en decisions that would not work in a particular busi-ness sector.

The investment in an analytics leader alone is not enough however, the compa-ny must have the trust and the bravery to make significant in-vestments in the team around the leader, otherwise the skills may be there, but the manpower would not be enough for the task.

Pamela is also interested in the big data skills gap, a current trend that has caused considerable is-sues in the healthcare industry.

Whilst other industries such as finance, insurance and retail are also feeling the pinch in terms of the numbers of qualified and ex-perienced data scientists, health-

Page 8: Big Data Innovation, Issue 4

8

care has been hit even harder. The reason for this, according to Pamela, is "in healthcare whilst it is somewhat transactional deliver-ing services, the service isn't exactly the same because the consump-tion and action of service varies by patient so it’s much harder to deal with health data than transaction pieces which are claims or transac-tion data."

Of course, the only real way around this is through the ways in which we are educating graduates. Pamela believes that at PHD level, the grad-uates that come through the sys-tem are good, however at bachelor degree level, there could be some improvement.

However, this may be a changing trend as in the US especially we are seeing universities making in-vestments in their big data, analyt-ics and statistical courses. This will hopefully see an improvement in the quality of their statistical bach-elor degree graduates.

One of the ways in which healthcare companies have tried in the past to make up for the dearth of health-care centric analytical talent is by attempting to adapt either techni-cal thinkers to healthcare or health-care thinkers to become more tech-nical. The issues that this creates

is a bias towards one side of a role than should be balanced.

However, a strength and unique aspect of Pamela's thinking is that she manages to utilise a plethora of roles within her big data team.

Pamela uses the example of a fac-tory in order to explain why this is the case. "The way to think about this is that you are making knowl-edge...When you make knowledge it is no different from making widgets, it is a production. You would never staff a factory with everybody who is a dye cutter or a machinist. You need to have a whole different di-verse set of skills to run the factory, in the same way that you need di-verse skills to run an analytics shop."

The data team at UPMC now in-cludes biostatisticians, physicians, lawyers, policy makers and even journalists.

Each has an important role in creat-ing, presenting and acting upon the data that is created and therefore making a successful team. This kind of integration of different skills is a novel and useful way of addressing the big data skills gap and can often overcome some of the limitations that you find in analytics graduates.

Page 9: Big Data Innovation, Issue 4

9

4

8#1

6

$ ¥

€ £

TweetTweetTweetTweetTweetTweetShare

$

£ €

¥ £ $

$

£

¥

23

0

5

Page 10: Big Data Innovation, Issue 4

10

One of the aspects of big data that many in the industry are currently concerned by is the perceived skills gap. The lack of qualified and experienced data scientists has meant that many companies find themselves adrift of where they want to be in the data world.

Gregory Piatetsky-Shapiro Talks Big Data Education

Chris Towers

10

Page 11: Big Data Innovation, Issue 4

11

I thought I would talk to one of the most knowledgeable and influential big data lead-ers in the world, Gregory Pia-tetsky-Shapiro. After running the first ever Knowledge Discovery in Da-tabases (KDD) workshop in 1989, he has stayed at the sharp end of analytics and big data for the past 25 years. His website and con-sultancy, KD Nuggets, is one of the most widely read data information sources and he has worked with some of the largest companies in the world. The first thing that I wanted to discuss with Gregory was his perception of the big data skills gap. Many have claimed that this could just be a flash in the pan and something that has been manipulated, rather than something that actually exists. Gregory references the McK-insey report of May 2011 which quotes: “There will be a shortage of talent necessary for organi-zations to take advantage of big data. By 2018, the Unit-ed States alone could face a shortage of 140,000 to 190,000 people with deep analytical skills as well as 1.5 million managers and ana-lysts with the know-how to use the analysis of big data to make effective decisions.”

The report predicts that this kind of skills gap will exist in 2017, but Gregory believes that we are already see-ing this. Whilst using Indeed.com to look at what exper-tise companies are looking for, Gregory found that of the top 10 job trends both Mongo DB and Hadoop appear. “Big Data is actually ris-ing faster than any of them. This indicates that demand for Big Data skills exceeds the supply. My experience with KDnuggets jobs board confirms it - many compa-nies are finding it hard to get enough candidates.”There are people responding to this however, with many universities and colleges rec-ognising not only the short-ages, but also the desire from people to learn. Companies looking to expand their data teams are also looking at both internal and external training. For instance companies such as EMC and IBM are training their data scientists internal-ly. Not only does this mean that they know that they are getting a high quality of training, but that the data scientists that they are em-ploying are being educated in ‘their ways’. With companies finding it hard to employ qualified candidates, through training

Page 12: Big Data Innovation, Issue 4

12

programs like this, compa-nies can look for great can-didates and make sure they are sufficiently qualified af-terwards.The IBMs and EMCs of this world are few and far be-tween. The money that needs to be invested in in-depth internal training is consider-able and so many compa-nies would struggle with this proposition. So what about those other companies? How can they avoid falling through the big data skills gap?Gregory thinks that most companies have three op-tions. Do you need BIG data?Most companies confuse big data with basic data analy-sis. At the moment with the buzz around big data, many companies are over investing in technology that realistical-ly isn’t required. A company with 10,000 cus-tomers, for instance, does not necessarily need a big data solution with multiple Hadoop clusters. Gregory makes the point that on his standard laptop he would be able to process data for a large software company with 1 million customers. Companies need to ask if they really need the depth of data skills that they think. What if you do need it?For large companies who may need to manage larger

data sets, the reality is that it is not necessary to employ a big data expert straight from university. Gregory makes the point that somebody who is trained in Mongo DB can become trained as a data scientist relatively easily. If an internal training pro-gramme is not a realistic tar-get in this instance, then ex-ternal training may become the best option. There are several companies who can offer this such as Cloudera and many others, who can train data scientists to a rel-atively high standard.Gregory also mentions that one way in which sever-al companies are learning about big data and ana-lytics is through attending conferences. There are now hundreds of conferences a year on Big Data and relat-ed topics, from leaders in the field such as Innovation En-terprise and other smaller conferences all around the world. What if these are untena-ble?Some Big Data and analytics work can be outsourced or given to consultants. This al-lows to not only free up their existing data team to spe-cific tasks, but also means that they are not having to risk taking on a full time em-ployee who may not be suffi-ciently qualified. Here, the leading companies

Page 13: Big Data Innovation, Issue 4

13

include IBM, Deloitte, Accen-ture and also pure-play an-alytics outsourcing providers like Opera Solutions and Mu Sigma.Having discussed the big data skills gap with several people who have worked in big data for years, one of the main concerns they have is the fanfare affecting the long term viability of the business function. Gregory does not have this concern, but does make it clear that we need to make sure that the buzzword ‘big data’ is separated from the technological trend. He has written in Harvard Business Review about this belief that the ‘sexy’ big data is being overhyped. The ma-jority of companies who have implemented big data have done so in order to predict human behaviour, but this is not something that can be done consistently with big data. Therefore, Gregory believes that any disillusionment with big data will not come from an inability to find the right talent, but in it’s build-up not living up to the reality. On the other hand, Gregory is quick to point out that the

amount of data that we are producing will continue its rapid growth for the foresee-able future. This data will still need people to manage and analyze it and so we are go-ing to continue to see growth even if the initial hype dies down. We are also seeing an in-creasing interest in countries outside the US, the current market leader. This global interest is likely to increase the big data talent pool and therefore allow for expan-sion. Having used Google Trends Gregory points out that the top 10 searches for ‘Big Data’ are:

Page 14: Big Data Innovation, Issue 4

14

Given the interest from else-where we are going to see an increasingly globalized talent pool and potentially the mi-gration of the big data hub from the US to Asia. Gregory also points out that given that the top five do not have english as a primary language (the trend anal-ysis was purely for english language searches) the like-lihood is that this does not represent every search for big data in those countries. This interest in the subject certainly shows that the ap-petite for big data educa-tion exists globally and those working in the big data edu-cational sphere are utilising technology to increase effec-tiveness. Gregory points out that many companies are using analyt-ics within their online educa-tion to make the experience more productive for both the students and teachers. Through the use of this tech-nology, big data education is becoming more productive and also more tenable to a truly international audience.

One of the aspects of big data that is clear, is that in order to succeed you need curiosity and passion. The other aspects of the role will always involve training and the kind of options and plat-forms for this will mean that in the coming years, we will see this gap closing. Gregory is a fine example of somebody who has man-aged to not only innovate within the industry for the past 25 years, but was one of the first to try and share the practice across many peo-ple. If we can find even one person from those working in data with the same passion and curiosity as him then the quality and breadth of edu-cation can continue to grow at the same speed as this ex-citing industry.

Page 15: Big Data Innovation, Issue 4

15

BIG DATA

BIG D

ATA

Page 16: Big Data Innovation, Issue 4

16

Big data has been described as the sexiest job in the world. Those working on it every day may disa-gree, but we have seen data sci-entists become superstars in the business world. The use of predic-tive analytics and effective analy-sis has seen an upturn in the suc-cesses of thousands of companies and spawned new and even more effective ways of communicating with customers.

Big Data Superstars: An interview with Kirk Borne

Heather James

16

Page 17: Big Data Innovation, Issue 4

17

When we think of big data today, for most people it is something that has been around for a few years and that has been hyped almost beyond perception by the media. The use of big data to do everything from pre-dicting the pregnancy of a woman before she had even told her par-ents to winning the most power-ful job in the world, has dominated news stands and thrust data sci-ence into the forefront of business thinking.

One man however has done everything from briefing the White-house on data mining to using sys-tems for discoveries at NASA. He has spoken at TED and been voted as the most influential big data in-fluencer on Twitter and number 11 in the Big Data top 100.

If we talk about data scientists be-ing superstars, Kirk Borne is the quarterback.

I wanted to speak to Kirk about his experiences in data and how he has seen the industry change and

where he sees it going in the future.

Kirk is the Professor of Astrophys-ics and Computational Science at George Mason University as well as being on the board for multiple data organizations, so he is the perfect person with whom to discuss data education.

The first thing I wanted to know from Kirk was his thoughts on the current idea that there is a skills gap in the big data industry.

The answer is an unequivocal yes.

Kirk is now a man in demand, both for his opinion, connections and the people that he teaches. He says that “I now get two or three calls every day from companies trying to find data scientists”. This is at odds to what he was finding at the begin-ning of his career.

Kirk has described this a flip in the equation from when he

first graduated, where there were 100

g r a d u -

Page 18: Big Data Innovation, Issue 4

18

ates for every data job going. Giv-en that at this time, this role was predominantly utilized by science and government based agencies, the numbers of positions available were considerably less.

Today Kirk is finding that for every 100 jobs, there is one applicant.

The reason for this shift is that our society is becoming increas-ingly social and digitally focussed, where incredible amounts of data are being created every day. One of the aspects of big data that Kirk also finds interesting is the way in which it is perceived due to this.

Many have said to him that ‘big data has always existed’ but Kirk believes that this is a misleading statement. It is almost incompre-

hensible how much in-formation an individ-ual makes now due to

things like s m a r t -p h o n e s ,social me-

dia and dealing w i t h t h o u -

sands of people who are all cre-ating this much data is not some-thing that we have ever dealt with before.

In Kirk’s TED talk in April 2013, he discusses how up until 2002 the amount of data that we had creat-ed was 5 billion gigabytes. In 2003 alone this was created again. By 2011 this much data was made in 2 days. Today this is made in 10 minutes.

This kind of growth in data, not only in terms of the amounts created, but in terms of the speed in which it is created, means that despite us always having had data, the abil-ity to not only deal with what we have, but to adapt in order to deal with the ever increasing amounts means that education in dealing with this kind of data needs to be good.

Kirk’s view on this is that there are two perspectives that need to be looked at in order to effectively as-sess current big data education initiatives.

‘The phrase that I use with people is that it’s an education in data as well as data in education'

Page 19: Big Data Innovation, Issue 4

19

Kirk believes that data should be included in education from a young age data, as regard-less of your future profession, it will be used in one way or another. For instance it can even be done at kindergarten level, the ways in which toys are sorted by colour, type, size or shape are all forms of data siloing. Using this kind of tech-nique early where children can identify and explain why cer-tain things are in certain areas forms a strong foundation to add more complex ideas on.

Education in Data: This initial education throughout earlier school opportunities will also allow the education in data aspect to be more thorough and successful. What many lecturers currently find is that people come into higher data education with a gap in under-standing, with some teachers actually saying that students don’t know what ‘data’ is.

The need to teach people these aspects of data throughout their lives will be vital to im-proving education and closing the skills gap.

Many, when looking to data for business solutions, want to find an all encompassing data scientist. Kirk believes that this is not always necessary.

A business team is like any other team, you have different people in it to do different jobs. Kirk believes that companies who are struggling to find the complete package data scien-tist, can avoid this by looking at this concept. Sure there are ‘all star data scientists’ around, the ones who know about the algorithms and know about the business, sales, strategy, finance and can run almost as a department in themselves, but they are like “all stars” everywhere else; rare.

The way in which companies are looking for data scientists

Page 20: Big Data Innovation, Issue 4

20

at the moment could be trans-formed to make it more of a team effort as opposed to just looking for an individual who can do it all. This collaborative approach (as discussed with Pamela Peele on page 6) can reap rewards and should be approached like a fac-tory, you have many different spe-cialities specialising in their chosen areas. Why should big data be any different?

This approach allowing organiza-tions to utilize the skills needed in data (be this through one all star or a collaborative effort through a team) will drive the industry forward. A forward thrust and pragmatism is what is going to be needed in the coming years.

This is due to Kirk’s prediction that with the ways in which big data has grown, this growth is hard to see stopping.

The growth so far has been ex-ponential, growing year on year not only in scale but in speed. Kirk points out that an exponential growth curve is not only about ex-ponential growth but an exponen-tial growth in the rate of growth. This means that we are going to be seeing considerably more data produced, considerably quicker.

The only way that this could stop would be if companies stopped putting sensors in devices or peo-ple stopped using social media. As both of these seem unlikely, the amount of data created and therefore the need for it to be managed will continue to increase.

One thing that Kirk is sure of is that the hype that we are current-ly seeing around big data will not destroy the potential that it has.

He equated the hype around big data to the Titanic.

When the ship set off it was the largest, fastest, most luxurious ship in the world. When it was sink-ing it was still all of these things but people weren’t concentrating on that, they were concentrating on swimming.

Big data is like this now. There is still all the hype about what it can do and the reasons for doing it, but ultimately with the amount of data constantly increasing, we will need to start swimming to make the most of it.

One of the things that I was struck by with Kirk was his genuine ex-citement at what big data has be-come, the ways in which it can be used to make breakthrough dis-coveries and help organizations everywhere make the best deci-sions. His success has been down to decades of dedication to data and it’s uses. Through this he has achieved unprecedented success and although he rightly says that there are few big data superstars, Kirk is undoubtedly one of them.

Page 21: Big Data Innovation, Issue 4

21

ONLY VITIA OPERATIONAL INTELLIGENCE (OI) PROVIDES:

tmContinuous, Real-time Analytics on Big Data in motion & at rest

Built-in process management capabilities that enable intelligent action

Elastically scalable Operational Intelligence on premise or in the cloud

LEARN MORE ABOUT VITRIA OI FOR BIG DATA ANALYTICS AT:

www.vitria.com

TURNING BIG DATA INTO INTELLIGENT ACTION

Page 22: Big Data Innovation, Issue 4

22

his assimilation that ‘A data sci-entist, is a data analyst from San Francisco’.

This aptly demonstrates the way that Sean looks at the uses of data across the board. It is something that is important, but the hype that surrounds the subject has warped its true use.

One of the clearest indications of Sean’s thinking is his following

Looking At Data From A Different Perspective:An Interview With Sean Patrick Murphy

George Hill

Page 23: Big Data Innovation, Issue 4

23

description of big data at the moment:

“While many have tried, the term “big data” lacks a true consensus definition. At the moment the most popular definitions seem to coalesce around the idea that big data is one or more data sets so large and complex that they are challenging to process using traditional databases and tools. Often associat-ed with this concept are the characteristic “three V’s” of big data: the volume (amount of data), velocity (speed of data in and out) and varie-ty (range of data types and sources). Some enterprising companies and consultants throw in a 4th “V” for veracity or some other “V” word.

Regardless, these definitions miss a key aspect of the term. To put it into hyperbolic lan-guage, “Big Data” isn’t about the size of data at all. Instead, it is the simple yet seemingly revolutionary belief that data is valuable.

While “big data” does often happen to be large in size (although this is always rela-tive to the available tool set), I believe that “big” actually means important (think big deal). Scientists have long known that data could cre-ate new knowledge but now

the rest of the world, includ-ing government and man-agement in particular, has realized that data can create value, principally financial but also environmental and social value. And, if data is valuable, more data is more valuable and who doesn’t want “big” (ie. large) value.”

Taking this view and mak-ing the concept of ‘big’ data more of an abstract term in order to say that big data is more about the importance of data as opposed to its size simplifies the idea whilst also making it clear that this data revolution is about a new perception as opposed to a new size.

Sean is also a believer that there is nothing that big data cannot improve. He sees the use of data not as something that will be of use in itself, but will be of use to improve and focus other areas. In order for anything to be improved there needs to be some kind of measurement and this measurement needs to be measured and analysed.

This is the whole idea of what data is and therefore as long as there are elements to an-ything that can be improved there will be no limit to what data can achieve within the improvement process.

22

Page 24: Big Data Innovation, Issue 4

24

Another unique way that Sean looks at one of the key areas that those working in big data are currently con-centrating on, is the big data skills gap.

Although he admits that there may well be a gap in the amount of people who can actually analyse and col-lect the data, what is real-ly missing is the knowledge needed throughout the rest of the company. Without management knowledge and willingness to act on the results of the collected data, realistically it does not mat-ter whether or not there are enough people to analyze the data, as it will make no difference anyway.

He believes that moving away from just using the opinion of the HiPPO’s (High-est Paid Person’s Opinion)is the only way that we can make data really drive the future of companies, organ-izations and even more im-portantly, governments.

So where does this leave the industry? I was curious

about whether Sean saw this change occurring through the increased use of technol-ogy or a heavier involvement from people. Seans answer was, “People provide the cre-ativity, the drive to explore, and the flashes of insight while the technology enables them to execute”. This shows the balanced approach that needs to be taken within the industry in order to drive it forwards.

I was once told that big data is like cooking a good meal. You can have a great stove, pans and knives but without the correct chef to put it all together they are pointless. If you have the technology, but not the analytics skills to uti-lize them properly, then the technology is useless.

Sean has a unique oppor-tunity within big data at the moment. He has worked on several different data pro-jects across business, gov-ernment and several other spaces. This breadth of us-age gives him one of the most innovative and interesting

Page 25: Big Data Innovation, Issue 4

25

perspectives on big data that I have come across.

His work at John Hopkins also means that he has the oppor-tunity to use some of the lat-est technologies years before they are available to busi-nesses. This is perhaps one of the main reasons why he has such high hopes for the future whilst making a concerted ef-fort to not overhype the in-dustry in its current state.

One thing is for sure, with people like Sean looking to bring through breakthrough data techniques, the chances of us seeing a more data driven society are greatly increased.

Page 26: Big Data Innovation, Issue 4

26

On-DemandBusiness Education

www.membership.theiegroup.com

Page 27: Big Data Innovation, Issue 4

27

We were lucky enough to talk to Andrew Claster, Deputy Chief An-alytics Officer for President Barack Obama’s 2012 re-election cam-paign ahead of his presentation at the Big Data Innovation Summit in Boston, September 12 & 13 2013.

Andrew Claster, Deputy Chief An-alytics Officer for President Barack Obama’s 2012 re-election cam-paign, helped create and lead the largest, most innovative and most

spirit of america / Shutterstock.com

Data In An Election:An Interview With Andrew Claster, Deputy Chief Analytics OfficerObama for America

Daniel Miller

Page 28: Big Data Innovation, Issue 4

28

successful political analytics operation ever developed. Andrew previously devel-oped microtargeting and communications strategies as Vice President at Penn, Schoen & Berland for clients including Hillary Rodham Clinton, Tony Blair, Gordon Brown, Ehud Barak, Leonel Fernandez, Verizon, Alcatel, Microsoft, BP, KPMG, TXU and the Washington Nation-als baseball team. Andrew completed his undergrad-uate studies in political sci-ence at Yale University and his graduate training in eco-nomics at the London School of Economics.

What was the biggest chal-lenge for the data team dur-ing the Obama re-election campaign?

It is difficult to identify just one. Here are some of the most important:

- Data Integration: We have several major database platforms – the national vot-er file, our proprietary email list, campaign donation his-tory, volunteer list, field con-tact history, etc. How do we integrate these and use

a unified dataset to inform campaign decisions?

- Online/Offline: How do we encourage online activ-ists to take action online and vice-versa? How do we fa-cilitate and measure this ac-tivity?

- Models: How do we de-velop and validate our mod-els about what the electorate is going to look like in Novem-ber 2012?

- Communications: Our opponents and the press are continually discussing areas in which they say we are falling short. When is it in our interest to push back, when is it in our interest to let them believe their own spin, and what information are we willing to share if we do push back?

- Cost: How do we evaluate everything we do in terms of cost per vote, cost per vol-unteer hour or cost per staff hour?

- Prioritization: We don’t have enough resources to test everything, model everything and do everything. How do we efficiently allocate human

Page 29: Big Data Innovation, Issue 4

29

and financial resources?

- Internal Communica-tion, Sales and Marketing: How do we support every department within the cam-paign (communications, field, digital, finance, scheduling, advertising)? How do we demonstrate value? How do we build relationships? How do we ensure that data and analytics are used to inform decision-making across the campaign?

- Hiring and Training: Where and how do we re-cruit more than 50 highly qualified analysts, statistical modelers and engineers who are committed to re-elect-ing Barack Obama and will-ing to move to Chicago for a job that ends on Election Day 2012, requires that they work more than 80 hours a week for months with no vacation in a crowded room with no windows (nicknamed ‘The Cave’), and pays less with fewer benefits than they would earn doing a similar job in the private sector?

Many working within political statistics and analytics say that the incumbent candi-

date always has a significant advantage with their data effectiveness, do you think this is the case?

The incumbent has many ad-vantages including the fol-lowing:

- Incumbent has data, in-frastructure and experience from the previous campaign.

- Incumbent is known in advance – no primary – and can start planning and im-plementing general election strategy earlier.

- Incumbent is known to voters – there is less uncer-tainty regarding underlying data and models.

However, the incumbent may also have certain disadvan-tages:

- Strategy is more likely to be known to the other side because it is likely to be simi-lar to the previous campaign.

- With a similar strategy and many of the same strat-egists and vested interests as the previous campaign, it could be harder to innovate.

On balance, the incumbent has an opportunity to put herself or himself in a supe-

Page 30: Big Data Innovation, Issue 4

30

rior position regarding data, analytics and technology. However, it is not necessarily the case that s/he will do so – the incumbent must have the will and the ability to devel-op and invest in this potential advantage.

When there is no incumbent and there is a competitive primary, it is the role of the national party and other af-filiated groups to invest in and develop this data, ana-lytics and technology infra-structure.

How much effect do you think data had on the election re-sult?

The most important deter-minants of the election result were:

- Having a candidate with a record of accomplish-ment and policy positions that are consistent with the preferences of the majority of the electorate.

- Building a national organization of supporters, volunteers and donors to register likely supporters to vote, persuade likely voters to support our candidate, turn out likely supporters and protect the ballot to ensure

their vote is counted.

Data, technology and ana-lytics made us more effective and more efficient with every one of these steps. They helped us target the right people with the right mes-sage delivered in the right medium at the right time.

We conducted several tests to measure the impact of our work on the election result, but we will not be sharing those results publicly.

As an example however, I can point out that there were times during the campaign when the press and our op-ponent claimed that states such as Michigan and Minne-sota were highly competitive, that we were losing in Ohio, Iowa, Colorado, Virginia and Wisconsin, and that Florida was firmly in our opponent’s camp. We had internal data (and there was plenty of public data, for those who are able to analyze it proper-ly) demonstrating that these statements were inaccurate. If we didn’t have accurate internal data, our campaign might have made multi-mil-lion dollar mistakes that could have cost us several key states and the election.

Page 31: Big Data Innovation, Issue 4

31

Given the reaction of the public to the NSA and PRISM data gathering techniques, what kind of effect is this like-ly to have on the wider data gathering activities of others working within the data sci-ence community?

Consumers are becoming more aware of what data is available and to whom. It is increasingly important for those of us in the data sci-ence community to help edu-cate consumers about what information is available, when and how they can opt out of sharing their informa-tion and how their informa-tion is being used.

Do you think that after the success of the data teams in the previous two elections that it is no longer an advan-tage, but a necessity for a successful campaign?

Campaigns have always used data to make decisions, but new techniques and technology have made more data accessible and allowed it to be used in innovative ways.

Campaigns that do not in-vest in data, technology or analytics are missing a huge opportunity that can help them make more intelligent decisions. Furthermore, their supporters, volunteers and donors want to know that the campaign is using their con-tributions of time and money as efficiently and effective-ly as possible, and that the campaign is making smart strategic decisions using the latest techniques.

Filip Fuxa / Shutterstock.com

Page 32: Big Data Innovation, Issue 4

32