big data - a view

18
1 Big Data – a view DBC 14 January 2016 Bjarne Kjær Ersbøll / [email protected] 2 DTU Compute, Technical University of Denmark Acknowledgements This slide deck is compiled from material from a lot of my colleagues and people I collaborate with at DTU. The following list is incomplete: • Jakob Eg Larsen • Mark Riis • Mads Odgaard • Knut Conradsen • Tage Thyrsted • Lone Falsig Hansen • Elena Guarneri • And many more…

Upload: dansk-bibliotekscenter

Post on 20-Feb-2017

914 views

Category:

Technology


1 download

TRANSCRIPT

1

Big Data – a viewDBC14 January 2016

Bjarne Kjær Ersbøll / [email protected]

2 DTU Compute, Technical University of Denmark

AcknowledgementsThis slide deck is compiled from material from a lot of my colleagues and people I collaborate with at DTU. The following list is incomplete:

• Jakob Eg Larsen• Mark Riis• Mads Odgaard• Knut Conradsen• Tage Thyrsted• Lone Falsig Hansen• Elena Guarneri• And many more…

2

3 DTU Compute, Technical University of Denmark

So, what is Big Data anyway?

4 DTU Compute, Technical University of Denmark

The 4 V’s

3

5 DTU Compute, Technical University of Denmark

Data explosion

6 DTU Compute, Technical University of Denmark

4

7 DTU Compute, Technical University of Denmark

Crowds, Bluetooth and Rock n’ Roll:Understanding Music Festival Participant Behavior

8 DTU Compute, Technical University of Denmark

5

9 DTU Compute, Technical University of Denmark

10 DTU Compute, Technical University of Denmark

6

BIG1

Den 3. december 2013

12 DTU Compute, Technical University of Denmark

BIG1 purpose

• Identify technological challenges associated with exploiting the potential of Big Data / Data-driven business development - to improve animal health and higher food quality and safety.

7

13 DTU Compute, Technical University of Denmark

BIG1 participants

• DTU Compute

• DTU National Food Institute

• DTU Veterinary Institute

• DTU Management

• DTU Biosys

• DTU Administration

14 DTU Compute, Technical University of Denmark

Big Data Value-chain

Data Origins

The Internet, sensors, machines, 

etc.

Data Collection 

Web log,sensor data,  images/au‐dio, RFID and videos etc.

Data Storage

Technologies supporting data storage

Analytics 

Predictive analytics, patterns in 

data, decision making

Consumers

Business processes, humans, and applications

Sense Think Act

8

15 DTU Compute, Technical University of Denmark

Feed/plants Animals Processing Consumers

Value chain

Actors

Data

Feed producers

Plant producers

Equipm. producers

FarmersAbbatoir

Dairy

Retail sector

Export

Eg feed quality Eg growth rate of animals

Eg efficiency in slaughtering

process

Consumer patterns and food quality

Big Data

Stakeholders in BIG1 value-chain

16 DTU Compute, Technical University of Denmark

Optimere/speede algoritmernes funktionalitet og gøre beregningerne billigere

Gen

eric

Big

Dat

a p

rob

lem

top

ics

Domain / application areas

Cattle Pigs Nutritionalcomposition

… and otherapplications

Collection of data, eg sensors on individuals (eg RFID or image analysis)

Storage, manipulation, real-time data

Establising a dynamic Big Data cloud

Structuring data, distributed data and data-sharing

Merging and integration of databases

Pattern recognition, machine learning, artificial intelligence, query-algorithms

Multivariat analsis and advanced statistics and data analysis

Privacy/ethics regarding data

Visualisation of data wrt descision support

Platform project

Targeted projects

Optimation/speed-up algorithm functionality and lower cost of calculation

BIG1: What can we do?

9

17 DTU Compute, Technical University of Denmark

18 DTU Compute, Technical University of Denmark

Sensors and data generation

10

19 DTU Compute, Technical University of Denmark

Hardware and software

DTU Compute, Technical University of Denmark

Big Data – 1991 – Economic Geology

20 18.01.2016

11

DTU Compute, Technical University of Denmark

Data

• Landsat satellite (common reference) – 4 scenes – 8 tapes– Geometric rectification, mosaicking, ratios, factor scores,

• Geological – geological maps, topographic maps– Structural information, lineaments converted to concentrations in 10

directions• Geochemical – K, Rb, Sr, U, Nb, Y, Ga, Fe in stream sediments.

– Kriging to a 1 km grid, interpolation by bicubic spline to Landsat pixels

• Radiometric – helicoptor-bourne gamma-spectrometric measurements, U, Th, K, and Total concentration.

– Max in 1 km grid interpolated by minimum curvature and further by bicubic spline

• Aeromagnetic data – 11 map sheets– Manually digitized and interpolated

• Resulting in 40 variables on a pixel level (50.8m x 50.8m)21 18.01.2016

DTU Compute, Technical University of Denmark

Data• Converted to a 5km x 5km grid – trying to preserve information by

taking (when relevant):– Min, max, 1%, 5%, median, 95%, 99%, mean, stddev, %land-cover

– 240 variables in all in 1084 squares

• Training set of– 17 mineralized, central– 21 mineralized, marginal– 14 barren, central– 5 barren, marginal

• Discriminant analysis using stepwise selection– 1084 squares classified

22 18.01.2016

12

DTU Compute, Technical University of Denmark23 18.01.2016

DTU Compute, Technical University of Denmark24 18.01.2016

13

DTU Compute, Technical University of Denmark

Big Data ?

25 18.01.2016

DTU Compute, Technical University of Denmark

Other Big Data casesELIXIR Data describing the human

genetic variation

Development of personalmedical drugs which takevariation between patients into account

Global Microbial Identifier Global system on genome-sequence data from micro-organismes to improvenational clinical diagnosticsand international surveillance of diseases

CITIES IT-solutions for analysis, operation and developmentof integrated energy-systems (electricity, gas, district heating and bio-masse) in cities to achievehigher flexibility in eg energy-storage

14

Data Science (Big Data) Profile at DTU Compute

28 DTU Compute, Technical University of Denmark

Data Science – main elements

Ambitious – courses: 45 ECTS (4/6 core) + thesis: A further 30-35 ECTS

Pioneering – across the Big Data value chain and competences

Application oriented:

o Work with concrete data sets

o Collaboration with companies

15

29 DTU Compute, Technical University of Denmark

Entry via all 3 DTU Compute programs

• Computer Science and Engineering

• Mathematical Modelling and Computation

• Digital Media Engineering

• …and now also: IT & Health (combination education btw KU & DTU)

• Cross-educational skills

30 DTU Compute, Technical University of Denmark

Big Data Value chain

data BIG data model

analysis

Data OriginsThe Internet, sensors, 

machines, etc.

Data Collection Web log, sensor data, images/audio, RFID and 

videos, etc.

Data StorageTechnologies 

supporting data storage

Analytics: Predictive analytics, patterns in data, decision making

Consumers: Business processes, 

humans, and applications

Sense Think Act

16

31 DTU Compute, Technical University of Denmark

Courses in Data Science specializationOrigin Collection Storage Analytics Consumers

01227 Graph theory (5) 1 3

01405 Error correcting codes 2 1 1

01617 Dynamical Systems 1 2

02170 Database systems (5) 4

02232 Applied Cryptography (5) 2 3 1 1

Core 02239 Data Security 1 4 1

02249 Computationally hard problems (7.5) 1 1 4

02266 User experience engineering 1 1 5

02281 Data Logic (5) 1 2 1 1

Core 02282 Algorithms for Massive Data Sets (7.5) 2 3 3

Core 02288 Missing a course on “Advanced databases/w arehouses”? 2

02407 Stochastic Processes (5) 3

02409 Multivariate Statistics (5) 4

02417 Time Series Analysis (5) 4

02443 Stochastic Simulation (5) 4 1

02450 Introduction to Machine Learning and Data Modeling (5) 3 1

02457 Non-linear signal processing 1 1

02458 Cognitive Modelling (5) 3 2

02460 Advanced Machine Learning (5) 1 3 1

02506 Advanced Image Analysis 3

02515 Health technology 1 2

Core 02582 Computational dataanalysis 3

02586 Statistical Genetics (5) 2

Core 02806 Social data analysis and visualization(5) 2 3

Core 02819 Data Mining using Python (5) 1 3 1

30530 Geographical information systems 1 1 1

25303 Mathematical Biology 1 1 1 1

27411 Biological data analysis and chemometrics 1

27625 Algorithms in bioinformatics 1 1

42112 Mathematical Programming w ith Modelling Softw are 1 1

32 DTU Compute, Technical University of Denmark

Big Data Hackathon

65 students 10 groups

48 hours

DTU's Skylab

Funding

1-2 start up companies

17

33 DTU Compute, Technical University of Denmark

Big Data solutions for Lyngby-Taarbækmunicipality

”Smart City app” to make it a better place to live

34 DTU Compute, Technical University of Denmark

Projects!

Energy utilization in buildings

Optimization of Bus-routes

Smart Traffic-regulation

Smart Energy renovation

Personalized Care for elderly

Smart tests for the Schools

Flexible collection of Waste

18

35 DTU Compute, Technical University of Denmark

36 DTU Compute, Technical University of Denmark

Implementation of first recommendation:

Big Data•DTU