the accidental data scientist: a new role for librarians ... · pdf filethe accidental data...

35
The Accidental Data Scientist: A New Role for Librarians and Information Professionals Amy Affelt SLA Virtual Conference Worldwide! 16 October 2014

Upload: hakhue

Post on 20-Mar-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

The Accidental Data Scientist: A New Role for Librarians and Information

Professionals Amy Affelt

SLA Virtual Conference Worldwide!

16 October 2014

“We ARE Big. It’s the data that got bigger!”

Info Pros are ready for their Close-Up:

Big Data + Info Pros = BETTER DATA

What is Big Data?

• McKinsey:

– Amount of data collected will grow by 40% per year

– 15 out of 17 industry sectors in the US have more data stored per company than the Library of Congress

How Is It Different?

• Server Log Files • Mobile Sensors • Social Media Content • Digital Images • Smartphone geospatial location data • Internet of Things • Personal Security Data • Video • Data that “used to be dropped on the floor”

Gartner’s “Five V” Characteristics of Big Data

• Volume • Velocity • Variety • Verification • Value

• Challenging • Risky • Expensive

– The last two are the Info Pro Opps!

• The data is Big, but the new uses for it and the insights gained from it are even Bigger.

Cool Big Data Apps

• Healthcare • Msft Readmissions Manager • Stanford Drug Pairings • MyAchoo

• Transportation • Street Bump • Xerox ExpressLanes • Fixed

• Entertainment • My Magic + • RUWT • Qcue

“If we are seen as the people who are willing to get into to the ‘cage’ to tame

the tiger, with or without a whip and a chair, that helps us become ‘untouchable.’

I think that one thing we’ve done wrong in the past is try to make things look too easy

or minimize the effort we put out and the knowledge and skills we possess so

other people think they can do what we do. Working with Big Data provides us

with a major opportunity to change some perceptions in that regard.”

---Dr. Bill Fisher

Big Data Busts

• Google Flu Trends

• Crimson Tide v. Auburn

• Target “targeted” coupons

• Lego

• Boston Marathon Manhunt

Bad Big Data Advice

• Sketchy Citation Algorithms • What if the citing article states that the citation is junk?

• Re-Use of Data • How do you ensure that the recycled data is clean?

• Global Data Sharing • “Garbage In, Garbage Out”

– How do you prevent “Garbage In”?

Raw Data Quality Checklist

• Where did you get this data? • How was the data compiled? • Do errors and duplicates need to be scrubbed from the data? • Is the data incomplete or sporadic? • Is the data in a usable format that is compatible with other data

being used? • Which formulas were used to analyze the data? • Did you consider alternative data sources?

– Were the alternatives disregarded because they might have revealed complicated or surprising results?

– Suggest alternatives

• What bias are inherent in the interpretation? – Was the data selected because it was likely to provide the expected

answer?

We’ll Take It From Here

• Search

• Discover

• Analyze

• Communicate Impacts

• Create Illustrative Deliverables

Six Big Data Tools Anyone Can Use

http://gigaom.com/2013/01/31/data-for-dummies-5-data-analysis-tools-anyone-can-use

• BigML • Source, Dataset, Model, Prediction

– Spreadsheet of S&P, Fitch, and Moody’s ratings by country

• Google Fusion Tables • Interact of map of occurrences

• Infogram • Enter data, produce a chart (bar, pie, line, pictorial, etc.)

• Many Eyes • Enter text, produce a graphical representation

• Statwing • Upload data, check variables of concern, plot relationship

• Tableau Public • Create comparison charts between two uploaded datasets

What’s In It For Me?

• Search for “Big Data” • Vexing Issues

• Stimulus package, Sequestration Effects, Hurricane Sandy

• What is our mission? • Set the context to build connections between data points

• Patterns v. Predictions • Tamiflu v. Flu Shot

• Coincidence v. Causation • Swedish Milk Study • Curly Fries/IQ/Arby’s

• Embed into IT and Big Data teams to provide point-of-need research • Curiosity=High Quality • Data Science V. Data Intelligence

• Not Big Data but Better Data

Big Data Communication Framework

• Understand the business problem • Embed with project team if possible

• Determine impact measurements • Review raw data checklist

• Discover available data • Decide which data is most valuable

• Where did the data come from? • Which data can be merged?

• Formulate hypothesis (ses) • Prove and Disprove

– Could a change in conditions affect assumptions?

• Communicate the business impact of the results

Story Time!

Data Scientists: How To Get Hired http://gigaom.com/2013/04/16/how-to-hire-data-scientists-and-get-hired-as-one

• Core Competencies

– Embrace Online Learning • Computer Science 101 from Udacity • Coursera Machine Learning from Andrew Ng

• Learn To Tell A Story

• Exercise Creativity and Curiosity/Healthy Skepticism

• Show Up and Be Ready to Learn! • “We need someone who is comfortable using Python to scrape data

from websites with heavy javascript/AJAX usage.”

New Big Data Roles for Info Pros

• Data Policy Expert

• Data Release Expert

• Exit Survey on Data Expert

Bibliography • Affelt, Amy. “Big Data: The Opportunity Formerly Known as Information Overload.” Presentation given at Internet Librarian Conference 2012.

22 October 2012. Monterey, California. • Affelt, Amy. “Big Data: The Opportunity Formerly Known as Information Overload.” FreePint. 20 November 2012. At:

http://web.freepint.com/go/sub/article/69515 • Affelt, Amy. “Acting on Big Data: A Data Scientist Role for Info Pros.” Online Searcher. 1 September 2014. p. 11-14. • Barnes, Brooks. “At Disney Parks, a Bracelet Meant to Build Loyalty (and Sales).” 7 January 2013. The New York Times. At:

http://www.nytimes.com/2013/01/07/business/media/at-disney • Barton, Dominic, and David Court. “Making Advanced Analytics Work for You.” Harvard Business Review. October 2012. • Barnwell, Bill. “Thank You For Not Coaching Week Thirteen,” Grantland.com, December 3, 2013, accessed January 13, 2014,

http://www.grantland.com/blog/the-triangle/post/_/id/84347/thank-you-for-not-coaching-week-13 • Bilton, Nick. “Disruptions: Data Without Context Tells a Misleading Story,” The New York Times, February 23, 2013, accessed December 10,

2013, http://bits.blogs.nytimes.com/2013/02/24/disruptions-google-flu-trends-shows-problems-of-big-data-without-context/ • Bollier, David. “The Promise and Peril of Big Data.” The Aspen Institute. Washington, D.C.: 2010. • CMP TechWeb. “When Wireless Sensors Meet Big Data.” CMPT. 22 August 2012. • Davenport, Tom. “Data is Worthless if You Don’t Communicate It.” HBR Blog Network. 18 Junes 2013. At:

http://blogs.hbr.org/cs/2013/06/data_is_worthless_if_you_dont.html?utm_source • Davenport, Tom. The Right Questions to Ask Your Data Analysts.” Harvard Business Review Management Tip of the Day. 16 July 2013. At:

http://hbr.org/tip/2013/07/16/the-right-questions-to-ask-your-data-analysts • DesLauriers, Rick, and Stephanie Douglas. “Manhunt: Inside the Boston Marathon Investigation.Interview with Scott Pelley. 60 Minutes. • Dumbill, Edd. “Big Data in 2012: Five Predictions.” Forbes. 15 December 2011, at http://www.forbes.com/sites/oreillymedia/2011/12/15/big-

data-in-2012-five-predictions/ • Eisenberg, Anne. “Avalanches of Words, Sifted and Sorted.” The New York Times. 24 March 2012, at

http://www.nytimes.com/2012/03/25/business/words-by-the-millions-sorted-by-software/ • Gordon-Murnane, Laura. “Big Data: A Big Opportunity for Librarians.” Online Magazine. 1 September 2012. p. 30. • Harbert, Tam. “Can Computers Predict Trial Outcomes?” Fulton County Daily Report, 9 July 2012, p.5. • Harford, Tim. “Big Data: Are We Making A Big Mistake?” The Financial Times, March 28, 2014, accessed March 31, 2014,

http://www.ft.com/intl/cms/s/2/21a6e7d8-b479-11e3-a09a-00144feabdc0.html#axzz2xXtmUnBz • Harlow, Tim. “The Drive: New App Can Help Fix a Parking Ticket,” Star Tribune, February 24, 2014, accessed March 4, 2014,

http://www.startribune.com/local/246807041.html • Harris, Derrick. “Hey Los Angeles, Xerox thinks it can clear traffic on 1-10.” 20 July 2012. GigaOM. At: http://gigaom.com/cloud/hey-los-

angeles-xerox • Harris, Derrick. “Five Ideas to Help Everyone Make the Most of Big Data.” 17 September 2012. GigaOM. At: http://gigaom.com/data/5-ideas-

to-help-everyone-make-the-most-of-big-data • Harris, Derrick. “Liking Curly Fries Might Not Mean You’re Smart.” 26 March 2013. GigaOM. At: http://gigaom.com/2013/03/25/liking-curly-

fries

Bibliography (Cont’d)

• Harris, Derrick. “How One Sports Geek Wants to Save Cable TV with Data.” 1 March 2013. GigaOM. At: http://gigaom.com/2012/03/01/how-one-sports-geek-wants-to-save-cable-tv-with-data

• Harris, Derrick. “How to Hire data Scientists and Get Hired As One.” 16 April 2013. GigaOM. At: http://gigaom.com/2013/04/16/how-to-hire-data-scientists-and-get-hired-as-one

• Henschen, Doug. “Hadoop Spurs Big Data Revolution.” InformationWeek. 9 November 2011, at http://www.informationweek.com/news/development/database/231902466

• Herther, Nancy. “Content Curation: Quality Judgment and the Future of Media and Web Search.” Searcher. 1 September 2012. p. 30. • Higginbotham, Stacey. “How Aetna is Using Big Data to Improve Patient Health.” 20 November 2012. GigaOM. At:

http://wwwgigaom.com/2012/2012/11/20/how-aetna-is-using-big-data-to-improve-patient-health • Higginbotham, Stacey. “Data Science is Not Enough. We Need Data Intelligence Too.” 20 March 2013. GigaOM. At

http://gigaom.com/2013/03/20/data-science-is-not-enough-we-need-data-intelligence-too • Jackson, Joab. “Five Things CIOs Should Know About Big Data.” CIO, 11 May 2012, p. 29. • Jaret, Peter. “Mining Electronic Records for Revealing Health Data.” 14 January 2013. The New York Times. At:

http://www.nytimes.com/2013/01/15/health/mining-electronic • Jenkins, Holman W., Jr. “Can Data Mining Stop the Killing?” The Wall Street Journal. 24 July 2012, p. A13. • Kalakota, Ravi. “What is a ‘Hadoop?’ Explaining Big Data to the C-Suite.” Business Analytics 3.0. 6 November 2011, at

http://practicalanalytics.wordpress.com/2011/11/06/explaining-hadoop-to-management/ • Kelly, Michael, and Meredith Schwartz. “OverDrive Offers First Glimpse at ‘Big Data.’” Library Journal. 15 May 2012, p. 16. • Madsbjerg, Christian, and Mikkel B. Rasmussen, “The Power of Thick Data,” The Wall Street Journal, March 21, 2014, at

http://online.wsj.com/news/articles/SB10001424052702304256404579449254114659882#printMode • Mayer-Schonberger, Viktor. Big Data: A Revolution That Will Transform How We Live, Work, and Think. New York, New York: Houghton Mifflin

Harcourt, 2013. • Manjoo, Farhad. “Big Changes Are Ahead for the Health Care Industry, Courtesy of Big Data.” Fast Company. 4 June 2012, p. 3. • McAfee, Andrew, and Erik Brynjolfsson. “Big Data: The Management Revolution.” Harvard Business Review. October 2012. • McCarthy, Bede, and Robert Cookson. “Facebook Reveals Secrets You Haven’t Shared.” The Financial Times. 11 March 2013. At:

http://www.ft.com/intl/cms/s/0/09c8172c-8a45-11e2-bf79-00144feabdc0.html • McKinsey Global Institute. “Big Data: The Next Frontier for Innovation, Competition, and Productivity.” San Francisco, California, May 2011. • Miller, Claire Cain. “In Google’s Inner Circle, a Falling Number of Women.” The New York Times. 23 August 2012. p. B1

Bibliography (Cont’d) • Neff, Jack. “Kleenex Builds Flu-Prediction Tool to Warn You When You'll Get Sick” Advertising Age, September 17, 2013, accessed December 19,

2013, http://adage.com/article/news/kleenex-builds-flu-prediction-tool/244176/

• Nikkei Report. “’Big Data A Game Changer in Soccer, Other Sports.” 12 July 2012.

• Olavsrud, Thor. “Big Data Analytics Today Lets Businesses Play Moneyball.” CIO. 23 August 2012. At: http://www.cio.com/article/print/714559

• O’ Reilly, Lara. “Google Wants To Make One App For All The Everyday Things In Your Life.” 3 October 2014, at http://www.businessinsider.com/google-launches-physical-web-project-2014-10#ixzz3FTSlQS1M

• Owen, David. “Hands Across America.” 4 March 2013. The New Yorker. P. 30-35

• Pew Research Center. “Big Data.” Washington, D.C., 20 July 2012.

• PR Newswire. “INRIX Taps Big Data to Help Ohio Keep Traffic Moving on State Roads.” 23 July 2012.

• Price, Gary. “OverDrive Shares Some Network Usage Statistics with More to Come.” 12 April 2012, at http://www.infodocket.com/2012/04/12/overdrive-shares-some-network-usage-statistics-with-more-to-come/

• Weaver, Matt. “Where is the ‘Big Data’ for Libraries?” 24 May 2012, at http://www.infodocket.com/2012/05/24/guest-post-where-is-the-big-data-for-libraries

• Rudder, Christian. Dataclysm: Who We Are (When We Think No One’s Looking). New York, New York: Crown Publishing, 2014.

• Russell Scibetti, “How Qcue Prices Tickets To Pack The Stands,” Business Insider, May 19, 2011, accessed December 10, 2013, http://www.businessinsider.com/how-this-company-prices-tickets-to-pack-the-stands-2011-5

• Seitz, Patrick. “Automated Storyteller Now Targets Big Data.” Investor’s Business Daily, 22 August 2012, p. A04

• Silver, Nate. The Signal and the Noise. New York, New York: The Penguin Press, 2012.

• Silver, Nate. “The Weatherman is Not a Moron.” The New York Times. 7 September 2012, at http://www.nytimes.com/2012/09/09/magazine/the-weatherman-is-not-a-moron.html?pagewanted=all

• Slocum, Matt. “The Work of Data Journalism: Find, Clean, Analyze, Create…Repeat.” 15 September 2011, at http://radar.oreilly.com/print/2011/09/data-journalism-process-guardian-html

• Swanson, S.A. “Use Data to Target Your Message.” Crain’s Chicago Business. 22 July 2013. At: http://www.chicagobusiness.com/article/20130720

• Warr, Philippa. “Liking Curly Fries on Facebook Reveals Your High IQ.” Wired. 12 March 2013. At: http://www.wired.co.uk/news/archive/2013-03/12/facebook-personality-predictions

• Winkler, Rolfe. “Splunk’s Data with Destiny.” The Wall Street Journal. 18 April 2012, p. B1

Thank You SLA and Good Luck Out There!

Amy Affelt

[email protected]

Follow me on Twitter: aainfopro