data science and the ncds · 2016. 3. 30. · importance driven by technology data science and the...

41
Data Science and the NCDS Putting North Carolina First in Data Through the National Consortium for Data Science Stanley C. Ahalt, PhD Director, RENCI Professor of Computer Science, UNCChapel Hill October 14, 2013 RENAISSANCE COMPUTING INSTITUTE

Upload: others

Post on 18-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Science and the NCDS · 2016. 3. 30. · Importance Driven by Technology Data Science and the NCDS 5 • The Internet made it easy to move, share, and find data: - “information

Data Science and the NCDS!Putting North Carolina First in Data Through the National Consortium for Data Science !!

Stanley  C.  Ahalt,  PhD  Director,  RENCI  Professor  of  Computer  Science,  UNC-­‐Chapel  Hill  October  14,  2013  

RENAISSANCE COMPUTING INSTITUTE

Page 2: Data Science and the NCDS · 2016. 3. 30. · Importance Driven by Technology Data Science and the NCDS 5 • The Internet made it easy to move, share, and find data: - “information

Why Data Science? The Challenges and Opportunities of Data and Data Science Defining Data Science Why North Carolina? Possible Approaches: NCDS Conclusion

RENAISSANCE COMPUTING INSTITUTE

Outline  

Page 3: Data Science and the NCDS · 2016. 3. 30. · Importance Driven by Technology Data Science and the NCDS 5 • The Internet made it easy to move, share, and find data: - “information

Why Data Science? ABUNDANCE

Data Science and the NCDS 3

Percentage  of  worldwide  digital  data  created  in  the  last  two  years?  

90%  Since  2010  we  have  been  creaKng  as  much  data  every  two  days  as  was  previously  created  in  all  of  history  up  to  2003.  

Tipping  Point:    From  Data  Scarcity  to  Data  Abundance!    This  is  a  challenge  and  a  golden  opportunity.  

Page 4: Data Science and the NCDS · 2016. 3. 30. · Importance Driven by Technology Data Science and the NCDS 5 • The Internet made it easy to move, share, and find data: - “information

Source: Wall Street Journal, Special Report on Big Data, March 11, 2013 !

From Compute-Centric to Data-centric Research!

Page 5: Data Science and the NCDS · 2016. 3. 30. · Importance Driven by Technology Data Science and the NCDS 5 • The Internet made it easy to move, share, and find data: - “information

Importance Driven by Technology

Data Science and the NCDS 5

•  The Internet made it easy to move, share, and find data: -  “information wants to be free,” and it wants to be expensive

•  Faster processors, more and cheaper storage capacity: -  Creating, processing, storing data is easier, clouds have

accelerated this trend. •  Sensors and the explosion of real-time data:

-  More than 1 trillion sensors now connected to the Web -  Example: Google I/O 2013 conference deployed hundreds of

sensors to collect ambient data •  The Internet of Things = an explosion of data created

by connected devices, not people. •  Biological data: sequencing/medicine could produce

50EBs of data/year.

Page 6: Data Science and the NCDS · 2016. 3. 30. · Importance Driven by Technology Data Science and the NCDS 5 • The Internet made it easy to move, share, and find data: - “information

Why Data Science? The Challenges and Opportunities of Data and Data Science Defining Data Science Why North Carolina? Possible Approaches: NCDS Conclusion

RENAISSANCE COMPUTING INSTITUTE

Page 7: Data Science and the NCDS · 2016. 3. 30. · Importance Driven by Technology Data Science and the NCDS 5 • The Internet made it easy to move, share, and find data: - “information

Big Data, Big Results •  Express Scripts:

–  1 billion pharmacy insurance claims analyzed and used to drive patients to more cost-effective mail order prescriptions

–  Predictive modeling of 400 factors to find patients at risk for non-adherence to subscriptions (a $317 billion/year problem).

•  UPS: –  Analyzing continuous streams of sensor data from

thousands of delivery trucks eliminated 5.3M miles from routes, reduced engine idling time by 10M minutes, saved 650,000 gallons of fuel, reduced carbon emissions by + 6,500 metric tons.

•  Intel: –  Analysis of massive data and application of predictive

algorithms helped ID potential high-sale resellers (result: +$20M in potential new sales).

–  Manufacturing predictive analytics reduced microprocessor testing time (result: $3M saved during proof of concept period. $30M savings expected by 2014).

Data Science and the NCDS 7

Source:  CIO,  July  15,  2013  

Page 8: Data Science and the NCDS · 2016. 3. 30. · Importance Driven by Technology Data Science and the NCDS 5 • The Internet made it easy to move, share, and find data: - “information

How big is the opportunity? •  $300B potential annual value to US healthcare—more than

total annual healthcare spending in Spain. –  McKinsey Global Institute, May 2011

•  €250B potential annual value to Europe’s public sector administration.

–  McKinsey Global Institute, May 2011

•  Energy savings of 1% in gas-powered plants – savings of $68B over 15 years.

–  Industrial Internet: Pushing the Boundaries of Minds and Machines, GE, Nov. 12, 2012

•  Companies using data-directed decision making boost productivity by 5-6%.

–  Cukier, K., Data, data everywhere, The Economist, Feb. 25, 2010

•  Jobs: demand for data-related administrators and software developers projected to grow by ~32% in US by 2020.

–  Occupational Outlook Handbook, 2012-2013, US Bureau of Labor Statistics

Data Science and the NCDS 8

Page 9: Data Science and the NCDS · 2016. 3. 30. · Importance Driven by Technology Data Science and the NCDS 5 • The Internet made it easy to move, share, and find data: - “information

Big Data Jobs: The Opportunity

•  Globally: –  Big Data and analytics jobs expected to exceed 4 million by

2015. (source: icrunchdata Big Data Jobs Index)

•  Nationally: –  Big data job postings up 63% on icruchdata job site.(source:

icrunchdata.com)

–  1.9M new big data jobs by 2015, but only 1/3 will be filled due to lack of trained talent (source: Gartner, October 2012)

–  Each big data job will create 3 additional jobs. (source: Gartner, 2012)

–  Demand for data-related administrators and software developers projected to grow by ~32% in US by 2020 (source: Occupational Outlook Handbook, 2012-2013, US Bureau of Labor Statistics

–  $300B potential annual value to US healthcare—more than total annual healthcare spending in Spain (source: McKinsey Global Institute, May 2011)

Data Science and the NCDS 9

Page 10: Data Science and the NCDS · 2016. 3. 30. · Importance Driven by Technology Data Science and the NCDS 5 • The Internet made it easy to move, share, and find data: - “information

NC Data Science Job Growth

Data Science and the NCDS 10

0   500   1,000   1,500   2,000   2,500   3,000   3,500  

Computer  and  Informa?on  Research  Scien?sts  

Computer  Science  Teachers,  Postsecondary  

Computer  Occupa?ons,  All  Other  

Computer  Programmers  

Database  Administrators  

Librarians,  Curators,  and  Archivists  

Computer  and  Informa?on  Systems  Managers  

SoJware  Developers,  Systems  SoJware  

Network  and  Computer  Systems  Administrators  

Informa?on  Security  Analysts,  Web  Developers,  and  Computer  Network  

Computer  Systems  Analysts  

Computer  Support  Specialists  

SoJware  Developers,  Applica?ons  

Net  Change  due  to  Growth,  2010-­‐2020  

Source:  North  Carolina  Department  of  Commerce,  Labor  and  Economic  Analysis  Division      

Page 11: Data Science and the NCDS · 2016. 3. 30. · Importance Driven by Technology Data Science and the NCDS 5 • The Internet made it easy to move, share, and find data: - “information

NC Data Science Job Growth 2010-2020

Data Science and the NCDS 11

•  18,130 new jobs predicted to be added in data science-related fields

•  4% of all new jobs in North Carolina will be in data science

•  Represents a 10 year increase of 15.6%, compared to an average increase of 11.3% across all sectors

•  Nearly all these jobs will require a bachelor’s degree or higher

•  3 subcategories projected to show more than 20% increase: database administrators (25.7%), network and computer systems administrators (24.0%), software applications developers (20.9%)                

   Source:  North  Carolina  Department  of  Commerce,    Labor  and  Economic  Analysis  Division              

Page 12: Data Science and the NCDS · 2016. 3. 30. · Importance Driven by Technology Data Science and the NCDS 5 • The Internet made it easy to move, share, and find data: - “information

Challenges: Big Data Talent Shortage

•  78 percent of 2012 survey respondents said there is a big data talent shortage (The Big Data London Group in Raywood, 2012)

•  70 percent of survey respondents noted a knowledge gap between data workers and managers/CIOs (The Big Data London Group in Raywood, 2012)

•  60 percent of survey respondents say it’s difficult to find big data professionals (NewVantage Partners 2012)

•  50 percent of survey respondents have difficulty finding and hiring business leaders and managers who understand how to apply big data (NewVantage Partners 2012)

Data Science and the NCDS 12

Page 13: Data Science and the NCDS · 2016. 3. 30. · Importance Driven by Technology Data Science and the NCDS 5 • The Internet made it easy to move, share, and find data: - “information

Big data experts need skills in: •  Advance analytics and predictive analysis •  Complex event processing •  Rule management •  Business intelligence tools •  Data integration Big  data  scien?sts  need  the  skills  of  their  IT  

predecessors,  plus  a  solid  computer  science  background  (knowledge  apps,  modeling,  sta?s?cs,  analy?cs,  math),  business  savvy,  and  the  ability  to  communicate  their  findings.    

Data Science and the NCDS 13

Page 14: Data Science and the NCDS · 2016. 3. 30. · Importance Driven by Technology Data Science and the NCDS 5 • The Internet made it easy to move, share, and find data: - “information

Why Data Science? The Challenges and Opportunities of Data and Data Science Defining Data Science Why North Carolina? Possible Approaches: NCDS Conclusion

RENAISSANCE COMPUTING INSTITUTE

Page 15: Data Science and the NCDS · 2016. 3. 30. · Importance Driven by Technology Data Science and the NCDS 5 • The Internet made it easy to move, share, and find data: - “information

Defining “Big” Data The Five Vs:

•  Volume: The Large Hadron Collider discards 99.999% of its data because the data cannot be processed!

•  Velocity: Retail transactions, communications, industrial sensor data, demand real-time analysis and action.

•  Variety: Health data includes images, test results, medical histories, doctor’s notes.

•  Veracity: Data quality essential for discovery and informed decision making

•  Value: How important or rare is the data, and what do we keep and for how long?

Data use cases are heterogeneous •  Importance of each V varies, even within same data set

Data management and analytics hardware and expertise are expensive

•  Can be barriers to entry, especially for small businesses and new researchers

Data Science and the NCDS 15

Page 16: Data Science and the NCDS · 2016. 3. 30. · Importance Driven by Technology Data Science and the NCDS 5 • The Internet made it easy to move, share, and find data: - “information

Defining Data Science

Data Science and the NCDS 16

Data Science: SystemaKc  study  of  organizaKon  and  use  of  digital  data  for:  q research  discoveries,  q decision-­‐making,  and  q the  data-­‐driven  economy.  

Page 17: Data Science and the NCDS · 2016. 3. 30. · Importance Driven by Technology Data Science and the NCDS 5 • The Internet made it easy to move, share, and find data: - “information

What Is a Data Scientist? “Good data scientists will not just address business problems, they will pick the right problems that have the most value to the organization.”

-IBM

Data scientists “must be able to take data sets, model them mathematically, and understand the math required to build those models. And they must be able to find insights and tell stories from that data. That means asking the right questions.”

-Hilary Mason, Wall Street Journal, in Rooney 2012

Data Science and the NCDS 17

Page 18: Data Science and the NCDS · 2016. 3. 30. · Importance Driven by Technology Data Science and the NCDS 5 • The Internet made it easy to move, share, and find data: - “information

RENAISSANCE COMPUTING INSTITUTE

Why Data Science? The Challenges and Opportunities of Data and Data Science Defining Data Science Why North Carolina? Possible Approaches: NCDS Conclusion

Page 19: Data Science and the NCDS · 2016. 3. 30. · Importance Driven by Technology Data Science and the NCDS 5 • The Internet made it easy to move, share, and find data: - “information

NC has major competitive advantages in data-centric resources

•  Abundant data sets (at NC Universities, NC Hospitals, NC Federal Agencies, and NC Industries!)

•  Data management tools (e.g., iRODS, Secure Research Space)

•  Intellectual resources (Industrial and Universities)

•  Data centers: Physical infrastructure (abandoned textile mills and MCNC)

Data Science and the NCDS 19

Proximity  to  Data  is  a  Huge  advantage!  

Page 20: Data Science and the NCDS · 2016. 3. 30. · Importance Driven by Technology Data Science and the NCDS 5 • The Internet made it easy to move, share, and find data: - “information

Major Data Centers in NC

Data Science and the NCDS 20

Page 21: Data Science and the NCDS · 2016. 3. 30. · Importance Driven by Technology Data Science and the NCDS 5 • The Internet made it easy to move, share, and find data: - “information

California (UC Berkeley, $25M)!

Illinois (University of Illinois, ~$20M)!

Ohio!(Ohio State, $N/A)!

Massachusetts!(MIT, $12.5M)!

New Jersey!(Rutgers, $N/A)!

US Big Data Initiatives

Data Science and the NCDS 21

North Carolina!(UNC, Duke,

NCSU, NCDS)!

Page 22: Data Science and the NCDS · 2016. 3. 30. · Importance Driven by Technology Data Science and the NCDS 5 • The Internet made it easy to move, share, and find data: - “information

RENAISSANCE COMPUTING INSTITUTE

Why Data Science? The Challenges and Opportunities of Data and Data Science Defining Data Science Why North Carolina? Possible Approaches: NCDS Conclusion

Page 23: Data Science and the NCDS · 2016. 3. 30. · Importance Driven by Technology Data Science and the NCDS 5 • The Internet made it easy to move, share, and find data: - “information

The National Consortium for Data Science

•  Mission: Secure US role as leaders in data science research & education, position US industry to use the power of data to drive economic growth

•  Vision: Focused multi-sector, multidisciplinary data science community to solve big data challenges and drive the field forward

•  Goals: •  Engage broad communities of data experts

•  Coordinate data science research priorities that span disciplines and industries

•  Facilitate development education & training programs

•  Support development of technical, ethical & policy standards

•  Apply NCDS expertise to data challenges in science, business and government

Data Science and the NCDS 23

www.data2discovery.org

NCDS  is  a  strategic  approach  to  data  science  and  big  data  opportuni5es  

Page 24: Data Science and the NCDS · 2016. 3. 30. · Importance Driven by Technology Data Science and the NCDS 5 • The Internet made it easy to move, share, and find data: - “information

NCDS Founding Members

The Big Data Frontier Data Science and the NCDS 24

Page 25: Data Science and the NCDS · 2016. 3. 30. · Importance Driven by Technology Data Science and the NCDS 5 • The Internet made it easy to move, share, and find data: - “information

NCDS Components •  Data Observatory

•  Shared, distributed infrastructure housing large organized research data; platform for data science education

•  Data Laboratory •  R&D into critical tools and techniques for data science

•  Data Fellows program •  Seed grants for faculty and post-docs to work on

consortium-approved projects; NCDS review panel will evaluate proposals

•  Industry internships for graduate students •  Visiting industry data scientists at member

universities

•  Data Science Events •  Leadership Summits (Spring) •  Outreach events and speakers (Fall and Spring)

Data Science and the NCDS 25

Page 26: Data Science and the NCDS · 2016. 3. 30. · Importance Driven by Technology Data Science and the NCDS 5 • The Internet made it easy to move, share, and find data: - “information

NCDS Data Science Faculty Fellow Program •  Will foster private-public

relationships, engage future data scientists, bridge gaps between research and practice, create NCDS-sponsored scholarship

Year-one Focus •  Seed grant approach to fund initial

cadre of Fellows from NCDS academic member campuses

•  Teaming with an NCDS member encouraged, but not required; potential for future collaboration part of review criteria

•  Funds used for course buy-outs, summer salary, graduate student support, conference travel and modest infrastructure costs

•  Target: 3-5 awards in year 1, $30K each

Timeline  Mid  September:  RFP  released  November  1:  Proposal  due  November  15:  No?fica?on  of  acceptance  

Support provided by UNC General Administration to offer fellowships to all UNC System campuses www.data2discovery.org/data-­‐fellows  

Data Science and the NCDS 26

Page 27: Data Science and the NCDS · 2016. 3. 30. · Importance Driven by Technology Data Science and the NCDS 5 • The Internet made it easy to move, share, and find data: - “information

First NCDS Leadership Summit

•  Keynote address: Dr. Eric Green, Director, National Human Genome Research Institute,

•  First in annual Leadership Summits on big data issues in targeted domains.

•  Purpose: Focused discussion by top data and domain scientists to elicit key data problems and opportunities

•  Final Product: White Paper on data challenges and opportunities in genomic science. Summary version under review for publication by a major scientific journal.

Data to Discovery: Genomes to Health, April 23 – 24, 2013

Next  Leadership  Summit:  Working  Title:  Sustainability  in  the  21st  Century:  “Big  Data  for  Smaller  

Carbon  Footprints”    April  2014,  Chapel  Hill,  NC  

Data Science and the NCDS 27

Page 28: Data Science and the NCDS · 2016. 3. 30. · Importance Driven by Technology Data Science and the NCDS 5 • The Internet made it easy to move, share, and find data: - “information

Shared Benefits

•  Cost  reducKons  (  access  to  shared  data  plaWorm)  •  Access  to  emerging  academic  tools  •  Access  to  organizaKons  with  complimentary  agendas  •  Glimpse  into  future  trends,  leads  to  compeKKve  advantages  •  PosiKve  exposure  and  visibility  •  OpportuniKes  for  joint  educaKonal/workforce  materials  •  NCDS  helps  to  fill  a  “concierge”  role  facilitaKng  such  things  as:  

•  IdenKfying  ideas  for  collaboraKon,  revenue  generaKon  •  IdenKfying  opportuniKes  for  cross-­‐markeKng,  public  relaKons  and  communicaKons  

Industry   Academic   Nonprofit  and  agency  

Benefits   Through   Benefits   Through   Benefits   Through  

•  Cost  reduc?on  •  Risk  reduc?on  •  Influence  on  key  

open  data  science  tools  

•  Data  science  research  on  the  horizon  

•  Poten?al  future  employees,  lower-­‐risk  ve[ng/recrui?ng  

•  Opportuni?es  for  pre-­‐compe??ve  collabora?on  

•  Place  industry  scien?sts  in  academe  

•  Shared  curated  data  

•  Shared  protocols  •  Hos?ng  student  

interns  •  Sponsoring  

research  fellows  •  Working  directly  

with  academic  researchers  on  joint-­‐projects  

•  Preferred  access  to  and/or  customized  training  and  educa?on  for  industry  staff  

•  Cost  reduc?on  •  Funding  for  

faculty  and  students  

•  Opportuni?es  to  par?cipate  in  collabora?ve  research  with  NCDS  partners  

•  Access  to  industry  

•  New  curriculum,  new  programs  

•  A_ract  best  students  and  faculty  

•  Shared  curated  data  

•  Faculty  course  ‘buy-­‐outs’  to  fund  selected  research  projects  

•  Funding  for  graduate  students  to  work  in  partnership  with  industry  

•  Access  to  industry  resources  such  as  reduced  cost  soJware  and  hardware  

 

•  Access  to:  •  Leading  edge  

research  •  Access  to  

industry  •  Applied  problem  

solving  •  Regional  

economic  development  

•  Policy  enhancements  

 

•  Hos?ng  research  fellows  

•  Working  with  industry  and  academe  

•  Increased  understanding  of  issues  and  opportuni?es  

•  Coali?ons  to  provide  end-­‐to-­‐end  solu?ons  for  business  development  

 

Data Science and the NCDS 28

NCDS:  A  public  –  private  partnership  

Page 29: Data Science and the NCDS · 2016. 3. 30. · Importance Driven by Technology Data Science and the NCDS 5 • The Internet made it easy to move, share, and find data: - “information

Membership structure

Data Science and the NCDS 29

InsKtuKon  Type  Founding/Board  

members  General  Members  

University   $25,000   $10,000  

Industry   $50,000   $20,000  

Non-­‐profit  organiza?ons   $25,000   $10,000  

Government  agency   $25,000   $10,000  

AddiKonal  categories  under  consideraKon:    

 Affiliate  Members:  other  consor?a  and  like-­‐minded  groups/ac?vi?es    Associate  Members:  small  businesses/startups  

 

Page 30: Data Science and the NCDS · 2016. 3. 30. · Importance Driven by Technology Data Science and the NCDS 5 • The Internet made it easy to move, share, and find data: - “information

NCDS Year 1 Goals •  Establish Data Fellows and Visiting Industry programs •  Organize Fall workshop and invited speaker •  Implement initial Data Observatory/Lab test bed •  Recruit Executive Director and start planning for

staffing •  Recruit at least 3 additional members in all 3

categories (9-10 total)

Leadership  Summit  (Spring  2013)  

Data  Fellows  (Fall2013)  

Data  Lab  and  Observatory  (2nd  Pilot  Fall  2013)  

EducaKon/Workforce  Development  Program  (Spring  2014)  

Data Science and the NCDS 30

Page 31: Data Science and the NCDS · 2016. 3. 30. · Importance Driven by Technology Data Science and the NCDS 5 • The Internet made it easy to move, share, and find data: - “information

Five Year Goal: A National Center for Data Science

Data Science and the NCDS 31

Page 32: Data Science and the NCDS · 2016. 3. 30. · Importance Driven by Technology Data Science and the NCDS 5 • The Internet made it easy to move, share, and find data: - “information

RENAISSANCE COMPUTING INSTITUTE

Why Data Science? The Challenges and Opportunities of Data and Data Science Defining Data Science Why North Carolina? Possible Approaches: NCDS Conclusion

Page 33: Data Science and the NCDS · 2016. 3. 30. · Importance Driven by Technology Data Science and the NCDS 5 • The Internet made it easy to move, share, and find data: - “information

Developing Data Science Will:

– Develop the next generation of data science experts and leaders

– Create strategies, practices, and scientific methods for understanding data

– Enable more collaborations among data and domain scientists, business, academia and government

–  Assist those who are struggling to collect, analyze, manage and use data

– Establish methodologies for measuring the value and impact of data

Data Science and the NCDS 33

Page 34: Data Science and the NCDS · 2016. 3. 30. · Importance Driven by Technology Data Science and the NCDS 5 • The Internet made it easy to move, share, and find data: - “information

Developing a National Center for Data Science Will: •  Aid in developing principles and theories that enable data

discoveries and innovations to power economic activity. •  Accelerate technology transfer and creation of data-

related businesses and products. •  Shape and create national curricula for data science

education. •  Promote development of a national data science

strategy. •  Engage stakeholders from all sectors to address grand

challenge problems of data science. •  Develop technical, ethical and policy standards for

using and sharing data.

Data Science and the NCDS 34

Page 35: Data Science and the NCDS · 2016. 3. 30. · Importance Driven by Technology Data Science and the NCDS 5 • The Internet made it easy to move, share, and find data: - “information

Developing the Data Workforce 35

Extras  

Page 36: Data Science and the NCDS · 2016. 3. 30. · Importance Driven by Technology Data Science and the NCDS 5 • The Internet made it easy to move, share, and find data: - “information

US Big Data Clusters

Data Science and the NCDS 36

Page 37: Data Science and the NCDS · 2016. 3. 30. · Importance Driven by Technology Data Science and the NCDS 5 • The Internet made it easy to move, share, and find data: - “information

NCDS Foundations •  Shared, distributed infrastructure will be the

foundation for the NCDS Data Observatory and a Data Laboratory, a virtual lab providing access to tools and infrastructure needed to test techniques for storing, sharing, analyzing, transforming, and visualizing data.

Year-one Focus •  Create initial sets of federated data collections. •  Document and integrate set of initial tools •  Pilot a data science education platform comprised of

compute, storage and data management tools for classroom use

•  Target data-intensive courses across multiple disciplines

•  Offer 2-3 courses, expand in subsequent years •  Data sets and tools/software to be contributed by

NCDS members •  Distribute hosting model

www.data2discovery.org/data-­‐observatory  

Why Data Science? 37

Page 38: Data Science and the NCDS · 2016. 3. 30. · Importance Driven by Technology Data Science and the NCDS 5 • The Internet made it easy to move, share, and find data: - “information

NCDS Components •  Data Lab and Observatory

•  Shared, distributed infrastructure housing large organized research data; platform for data science education

•  R&D into critical tools and techniques for data science

•  Data Fellows program •  Seed grants for faculty and post-docs to work on

consortium-approved projects; NCDS review panel will evaluate proposals

•  Industry internships for graduate students •  Visiting industry data scientists at member

universities

•  Data Science Events •  Leadership Summits (Spring) •  Outreach events and speakers (Fall and Spring)

Data Science and the NCDS 38

Page 39: Data Science and the NCDS · 2016. 3. 30. · Importance Driven by Technology Data Science and the NCDS 5 • The Internet made it easy to move, share, and find data: - “information

Data Observatory/Laboratory

•  Shared, distributed infrastructure will be the foundation for the NCDS Data Laboratory, a virtual lab providing access to tools and infrastructure needed to test techniques for storing, sharing, analyzing, transforming, and visualizing data.

Data Science and the NCDS 39

Year-one Focus •  Pilot a data science education platform comprised of

compute, storage and data management tools for classroom use

•  Target data-intensive courses across multiple disciplines

•  Offer 2-3 courses, expand in subsequent years •  Data sets and tools/software to be contributed by

NCDS members •  Can be hosted centrally or locally at campus sites

www.data2discovery.org/data-­‐observatory  

Page 40: Data Science and the NCDS · 2016. 3. 30. · Importance Driven by Technology Data Science and the NCDS 5 • The Internet made it easy to move, share, and find data: - “information

NCDS Data Science Faculty Fellow Program •  Will foster private-public

relationships, engage future data scientists, bridge gaps between research and practice, create NCDS-sponsored scholarship

Data Science and the NCDS 40

Year-one Focus •  Use seed grant approach to fund initial

cadre of Data Science Faculty Fellows from NCDS academic member campuses

•  Teaming with an NCDS member on a project encouraged, but not required; potential for future collaboration part of review criteria

•  Funds used for course buy-outs, summer salary, graduate student support, conference travel and modest infrastructure costs

•  Target: 3-5 awards in year 1, $30K each

Timeline  Mid  September:  RFP  released  November  1:  Proposal  due  November  15:  No?fica?on  of  acceptance  

Support provided by UNC General Administration to offer fellowships to all UNC System campuses www.data2discovery.org/data-­‐fellows  

Page 41: Data Science and the NCDS · 2016. 3. 30. · Importance Driven by Technology Data Science and the NCDS 5 • The Internet made it easy to move, share, and find data: - “information

First NCDS Leadership Summit

•  Keynote address: Dr. Eric Green, Director, National Human Genome Research Institute,

•  First in annual Leadership Summits on big data issues in targeted domains.

•  Purpose: Focused discussion by top data and domain scientists to elicit key data problems and opportunities

•  Final Product: White Paper on data challenges and opportunities in genomic science. Summary version under review for publication by a major scientific journal.

Data Science and the NCDS 41

Data to Discovery: Genomes to Health April 23 – 24, 2013

Next  Leadership  Summit:  April  2014,  Chapel  Hill,  NC