introduction - eth zürich · pdf file ... – smart grids, geolocation, traffic...
Post on 26-Mar-2018
221 Views
Preview:
TRANSCRIPT
IntroductionEvangelos Pournaras Izabela Moise
Evangelos Pournaras Izabela Moise 1
Outline
1 Data Science
2 Course Description
Evangelos Pournaras Izabela Moise 2
Part 1 - Data Science
Evangelos Pournaras Izabela Moise 3
What is Data Science
A collection of orchestrated methods from different scientific fieldseg statistics computer science etc that provide understanding ofdomain data and result in data-based products and services
Evangelos Pournaras Izabela Moise 4
Is Data Science about Big Data I
Evangelos Pournaras Izabela Moise 5
Is Data Science about Big Data II
Itrsquos more about using the right dataand asking the right questions
Evangelos Pournaras Izabela Moise 6
What about Techno-socio-economic Systems
Evangelos Pournaras Izabela Moise 7
ICT amp Techno-socio-economic Systems
bull Embedded ICT systems in most societal domains How
bull Internet of Things pervasiveubiquitous computing advancednetworking systems inter-operability Result
bull A new explosion of data sources Opportunities
bull Understanding improving managing amp sustaining our complexsociety Threats
bull Privacy discrimination misinterpretations over-fitting etc
Evangelos Pournaras Izabela Moise 8
Threats I
Evangelos Pournaras Izabela Moise 9
Threats II
Evangelos Pournaras Izabela Moise 10
Who is a Data Scientist
bull A statistician
bull A computer programmer
bull Both and More
TipDomain knowledge can be more valuable than machine learning datamining etc
Evangelos Pournaras Izabela Moise 11
Real-world Profile I
Evangelos Pournaras Izabela Moise 12
Real-world Profile II
Evangelos Pournaras Izabela Moise 13
More about Data Scientists
httpshbrorg201210data-scientist-the-sexiest-job-of-the-21st-century
Evangelos Pournaras Izabela Moise 14
More about Data Scientists
Data scientistsrsquo most basic universal skill is the ability to writecode This may be less true in five yearsrsquo time when many morepeople will have the title data scientist on their business cardsMore enduring will be the need for data scientists to communicate inlanguage that all their stakeholders understand-and to demonstratethe special skills involved in storytelling with data whetherverbally visually or - ideally both
Evangelos Pournaras Izabela Moise 15
More about Data Scientists
But we would say the dominant trait among data scientists is anintense curiosity-a desire to go beneath the surface of a problemfind the questions at its heart and distill them into a very clearset of hypotheses that can be tested
Evangelos Pournaras Izabela Moise 16
More about Data Scientists
A quantitative analyst can be great at analyzing data but not atsubduing a mass of unstructured data and getting it into a form inwhich it can be analyzed
A data management expert might be great at generating andorganizing data in structured form but not at turning unstructureddata into structured data-and also not at actually analyzing thedata
And while people without strong social skills might thrive intraditional data professions data scientists must have suchskills to be effective
Evangelos Pournaras Izabela Moise 17
Part 2 - Course Description
Evangelos Pournaras Izabela Moise 18
Course ObjectivesQualify you with knowledge amp skills to tackle real-world problemsusing data
1 Acquiring domain knowledge and understanding2 Better understanding and interpretation of data
3 Awareness about the applicability of different data sciencemethods
4 Development of technical skills eg programming use ofdifferent tools etc
5 Presenting scientific results both written and orally
Evangelos Pournaras Izabela Moise 19
Course Prerequisites
Some programming skills are required eg skills for the material ofthis course
1 JavaC++Python
2 UNIX
Didnrsquot you have an opportunity to practice this earlier
No problem this is a golden opportunity
TipProgramming skills will make you more flexible and efficient datascientist
Evangelos Pournaras Izabela Moise 20
Assessment
bull Seminar thesis
bull 100 of the grade no exams
bull Detailed illustration in a next lecture
TipStart early Give the opportunity for your project and your skills todevelop during the course
Evangelos Pournaras Izabela Moise 21
Lectures
bull Every Monday 1715-1900 at LFW B 1
bull Participation is not obligatory but highly recommended
bull 60 minutes lectures followed by 40 minutes interactivediscussions
bull Opportunity to discuss your projectbull Lectures at
httpwwwcossethzcheducationdatasciencehtml
Evangelos Pournaras Izabela Moise 22
Subjects I
1 Computational Social Science Applicationsndash Smart Grids geolocation traffic systems social sensingmining
privacyndash Tools amp platforms Nervousnet Twitter GDELT
2 Data Science Fundamentalsndash databases data types data collection data pre-processing
plotting visualization etcndash Tools Java AWK MySQL Gnuplot Gephi etc
3 Data Mining and Machine Learningndash classification clustering prediction neural networks etcndash Tools Weka
4 Big Data Analytics
Evangelos Pournaras Izabela Moise 23
Subjects II
ndash MapReduce parallel computing data streaming social mediaetc
ndash Tools Hadoop Spark Mahout Spark Streaming Storm etc
5 Otherndash Project presentations
Evangelos Pournaras Izabela Moise 24
Lectures OutlineLecture 01 (200216)Introduction amp Coursework OutlineLecture 02 (270216)Data Mining Machine Learning ampApplicationsLecture 03 (060316)Data Science Techniques ampApplicationsLecture 04 (130316)Data Mining Machine Learning ampApplicationsLecture 05 (200316)Data Science Techniques ampApplicationsLecture 06 (270316)Big Data Analytics amp Applications
Lecture 07 (030416)Big Data Analytics amp ApplicationsLecture 08 (100416)Data Science Techniques ampApplicationsLecture 09 (080516)Big Data Analytics amp ApplicationsLecture 10 (150516)Data Science Techniques ampApplicationsLecture 11 (220516)Oral PresentationsLecture 12 (290516)Oral Presentations
Evangelos Pournaras Izabela Moise 25
How to contact us
Communication
bull Discussion session in the course
bull E-mail with subject[DATA-SCIENCE-COURSE-2016]ltotherinfogtto
ndash Iza Moise imoiseethzchndash Evangelos Pournaras epournarasethzch
Supervision - strictly for issues not addressed in the course
bull Mondays 1500-1700Clausiusstrasse 50 (CLU C 4) 8092 Zurich
Evangelos Pournaras Izabela Moise 26
Proposed Literature
B Ellis
Real-Time Analytics Techniques to Analyze and Visualize Streaming Data
Wiley Publishing 1st edition 2014
J Han
Data Mining Concepts and Techniques
Morgan Kaufmann Publishers Inc San Francisco CA USA 2005
T White
Hadoop The Definitive Guide
OrsquoReilly Media Inc 2015
I H Witten E Frank and M A Hall
Data Mining Practical Machine Learning Tools and Techniques
Morgan Kaufmann Publishers Inc San Francisco CA USA 3rd edition2011
Evangelos Pournaras Izabela Moise 27
What is next
bull Seminar thesis
bull Examples and applications
Evangelos Pournaras Izabela Moise 28
Outline
1 Data Science
2 Course Description
Evangelos Pournaras Izabela Moise 2
Part 1 - Data Science
Evangelos Pournaras Izabela Moise 3
What is Data Science
A collection of orchestrated methods from different scientific fieldseg statistics computer science etc that provide understanding ofdomain data and result in data-based products and services
Evangelos Pournaras Izabela Moise 4
Is Data Science about Big Data I
Evangelos Pournaras Izabela Moise 5
Is Data Science about Big Data II
Itrsquos more about using the right dataand asking the right questions
Evangelos Pournaras Izabela Moise 6
What about Techno-socio-economic Systems
Evangelos Pournaras Izabela Moise 7
ICT amp Techno-socio-economic Systems
bull Embedded ICT systems in most societal domains How
bull Internet of Things pervasiveubiquitous computing advancednetworking systems inter-operability Result
bull A new explosion of data sources Opportunities
bull Understanding improving managing amp sustaining our complexsociety Threats
bull Privacy discrimination misinterpretations over-fitting etc
Evangelos Pournaras Izabela Moise 8
Threats I
Evangelos Pournaras Izabela Moise 9
Threats II
Evangelos Pournaras Izabela Moise 10
Who is a Data Scientist
bull A statistician
bull A computer programmer
bull Both and More
TipDomain knowledge can be more valuable than machine learning datamining etc
Evangelos Pournaras Izabela Moise 11
Real-world Profile I
Evangelos Pournaras Izabela Moise 12
Real-world Profile II
Evangelos Pournaras Izabela Moise 13
More about Data Scientists
httpshbrorg201210data-scientist-the-sexiest-job-of-the-21st-century
Evangelos Pournaras Izabela Moise 14
More about Data Scientists
Data scientistsrsquo most basic universal skill is the ability to writecode This may be less true in five yearsrsquo time when many morepeople will have the title data scientist on their business cardsMore enduring will be the need for data scientists to communicate inlanguage that all their stakeholders understand-and to demonstratethe special skills involved in storytelling with data whetherverbally visually or - ideally both
Evangelos Pournaras Izabela Moise 15
More about Data Scientists
But we would say the dominant trait among data scientists is anintense curiosity-a desire to go beneath the surface of a problemfind the questions at its heart and distill them into a very clearset of hypotheses that can be tested
Evangelos Pournaras Izabela Moise 16
More about Data Scientists
A quantitative analyst can be great at analyzing data but not atsubduing a mass of unstructured data and getting it into a form inwhich it can be analyzed
A data management expert might be great at generating andorganizing data in structured form but not at turning unstructureddata into structured data-and also not at actually analyzing thedata
And while people without strong social skills might thrive intraditional data professions data scientists must have suchskills to be effective
Evangelos Pournaras Izabela Moise 17
Part 2 - Course Description
Evangelos Pournaras Izabela Moise 18
Course ObjectivesQualify you with knowledge amp skills to tackle real-world problemsusing data
1 Acquiring domain knowledge and understanding2 Better understanding and interpretation of data
3 Awareness about the applicability of different data sciencemethods
4 Development of technical skills eg programming use ofdifferent tools etc
5 Presenting scientific results both written and orally
Evangelos Pournaras Izabela Moise 19
Course Prerequisites
Some programming skills are required eg skills for the material ofthis course
1 JavaC++Python
2 UNIX
Didnrsquot you have an opportunity to practice this earlier
No problem this is a golden opportunity
TipProgramming skills will make you more flexible and efficient datascientist
Evangelos Pournaras Izabela Moise 20
Assessment
bull Seminar thesis
bull 100 of the grade no exams
bull Detailed illustration in a next lecture
TipStart early Give the opportunity for your project and your skills todevelop during the course
Evangelos Pournaras Izabela Moise 21
Lectures
bull Every Monday 1715-1900 at LFW B 1
bull Participation is not obligatory but highly recommended
bull 60 minutes lectures followed by 40 minutes interactivediscussions
bull Opportunity to discuss your projectbull Lectures at
httpwwwcossethzcheducationdatasciencehtml
Evangelos Pournaras Izabela Moise 22
Subjects I
1 Computational Social Science Applicationsndash Smart Grids geolocation traffic systems social sensingmining
privacyndash Tools amp platforms Nervousnet Twitter GDELT
2 Data Science Fundamentalsndash databases data types data collection data pre-processing
plotting visualization etcndash Tools Java AWK MySQL Gnuplot Gephi etc
3 Data Mining and Machine Learningndash classification clustering prediction neural networks etcndash Tools Weka
4 Big Data Analytics
Evangelos Pournaras Izabela Moise 23
Subjects II
ndash MapReduce parallel computing data streaming social mediaetc
ndash Tools Hadoop Spark Mahout Spark Streaming Storm etc
5 Otherndash Project presentations
Evangelos Pournaras Izabela Moise 24
Lectures OutlineLecture 01 (200216)Introduction amp Coursework OutlineLecture 02 (270216)Data Mining Machine Learning ampApplicationsLecture 03 (060316)Data Science Techniques ampApplicationsLecture 04 (130316)Data Mining Machine Learning ampApplicationsLecture 05 (200316)Data Science Techniques ampApplicationsLecture 06 (270316)Big Data Analytics amp Applications
Lecture 07 (030416)Big Data Analytics amp ApplicationsLecture 08 (100416)Data Science Techniques ampApplicationsLecture 09 (080516)Big Data Analytics amp ApplicationsLecture 10 (150516)Data Science Techniques ampApplicationsLecture 11 (220516)Oral PresentationsLecture 12 (290516)Oral Presentations
Evangelos Pournaras Izabela Moise 25
How to contact us
Communication
bull Discussion session in the course
bull E-mail with subject[DATA-SCIENCE-COURSE-2016]ltotherinfogtto
ndash Iza Moise imoiseethzchndash Evangelos Pournaras epournarasethzch
Supervision - strictly for issues not addressed in the course
bull Mondays 1500-1700Clausiusstrasse 50 (CLU C 4) 8092 Zurich
Evangelos Pournaras Izabela Moise 26
Proposed Literature
B Ellis
Real-Time Analytics Techniques to Analyze and Visualize Streaming Data
Wiley Publishing 1st edition 2014
J Han
Data Mining Concepts and Techniques
Morgan Kaufmann Publishers Inc San Francisco CA USA 2005
T White
Hadoop The Definitive Guide
OrsquoReilly Media Inc 2015
I H Witten E Frank and M A Hall
Data Mining Practical Machine Learning Tools and Techniques
Morgan Kaufmann Publishers Inc San Francisco CA USA 3rd edition2011
Evangelos Pournaras Izabela Moise 27
What is next
bull Seminar thesis
bull Examples and applications
Evangelos Pournaras Izabela Moise 28
Part 1 - Data Science
Evangelos Pournaras Izabela Moise 3
What is Data Science
A collection of orchestrated methods from different scientific fieldseg statistics computer science etc that provide understanding ofdomain data and result in data-based products and services
Evangelos Pournaras Izabela Moise 4
Is Data Science about Big Data I
Evangelos Pournaras Izabela Moise 5
Is Data Science about Big Data II
Itrsquos more about using the right dataand asking the right questions
Evangelos Pournaras Izabela Moise 6
What about Techno-socio-economic Systems
Evangelos Pournaras Izabela Moise 7
ICT amp Techno-socio-economic Systems
bull Embedded ICT systems in most societal domains How
bull Internet of Things pervasiveubiquitous computing advancednetworking systems inter-operability Result
bull A new explosion of data sources Opportunities
bull Understanding improving managing amp sustaining our complexsociety Threats
bull Privacy discrimination misinterpretations over-fitting etc
Evangelos Pournaras Izabela Moise 8
Threats I
Evangelos Pournaras Izabela Moise 9
Threats II
Evangelos Pournaras Izabela Moise 10
Who is a Data Scientist
bull A statistician
bull A computer programmer
bull Both and More
TipDomain knowledge can be more valuable than machine learning datamining etc
Evangelos Pournaras Izabela Moise 11
Real-world Profile I
Evangelos Pournaras Izabela Moise 12
Real-world Profile II
Evangelos Pournaras Izabela Moise 13
More about Data Scientists
httpshbrorg201210data-scientist-the-sexiest-job-of-the-21st-century
Evangelos Pournaras Izabela Moise 14
More about Data Scientists
Data scientistsrsquo most basic universal skill is the ability to writecode This may be less true in five yearsrsquo time when many morepeople will have the title data scientist on their business cardsMore enduring will be the need for data scientists to communicate inlanguage that all their stakeholders understand-and to demonstratethe special skills involved in storytelling with data whetherverbally visually or - ideally both
Evangelos Pournaras Izabela Moise 15
More about Data Scientists
But we would say the dominant trait among data scientists is anintense curiosity-a desire to go beneath the surface of a problemfind the questions at its heart and distill them into a very clearset of hypotheses that can be tested
Evangelos Pournaras Izabela Moise 16
More about Data Scientists
A quantitative analyst can be great at analyzing data but not atsubduing a mass of unstructured data and getting it into a form inwhich it can be analyzed
A data management expert might be great at generating andorganizing data in structured form but not at turning unstructureddata into structured data-and also not at actually analyzing thedata
And while people without strong social skills might thrive intraditional data professions data scientists must have suchskills to be effective
Evangelos Pournaras Izabela Moise 17
Part 2 - Course Description
Evangelos Pournaras Izabela Moise 18
Course ObjectivesQualify you with knowledge amp skills to tackle real-world problemsusing data
1 Acquiring domain knowledge and understanding2 Better understanding and interpretation of data
3 Awareness about the applicability of different data sciencemethods
4 Development of technical skills eg programming use ofdifferent tools etc
5 Presenting scientific results both written and orally
Evangelos Pournaras Izabela Moise 19
Course Prerequisites
Some programming skills are required eg skills for the material ofthis course
1 JavaC++Python
2 UNIX
Didnrsquot you have an opportunity to practice this earlier
No problem this is a golden opportunity
TipProgramming skills will make you more flexible and efficient datascientist
Evangelos Pournaras Izabela Moise 20
Assessment
bull Seminar thesis
bull 100 of the grade no exams
bull Detailed illustration in a next lecture
TipStart early Give the opportunity for your project and your skills todevelop during the course
Evangelos Pournaras Izabela Moise 21
Lectures
bull Every Monday 1715-1900 at LFW B 1
bull Participation is not obligatory but highly recommended
bull 60 minutes lectures followed by 40 minutes interactivediscussions
bull Opportunity to discuss your projectbull Lectures at
httpwwwcossethzcheducationdatasciencehtml
Evangelos Pournaras Izabela Moise 22
Subjects I
1 Computational Social Science Applicationsndash Smart Grids geolocation traffic systems social sensingmining
privacyndash Tools amp platforms Nervousnet Twitter GDELT
2 Data Science Fundamentalsndash databases data types data collection data pre-processing
plotting visualization etcndash Tools Java AWK MySQL Gnuplot Gephi etc
3 Data Mining and Machine Learningndash classification clustering prediction neural networks etcndash Tools Weka
4 Big Data Analytics
Evangelos Pournaras Izabela Moise 23
Subjects II
ndash MapReduce parallel computing data streaming social mediaetc
ndash Tools Hadoop Spark Mahout Spark Streaming Storm etc
5 Otherndash Project presentations
Evangelos Pournaras Izabela Moise 24
Lectures OutlineLecture 01 (200216)Introduction amp Coursework OutlineLecture 02 (270216)Data Mining Machine Learning ampApplicationsLecture 03 (060316)Data Science Techniques ampApplicationsLecture 04 (130316)Data Mining Machine Learning ampApplicationsLecture 05 (200316)Data Science Techniques ampApplicationsLecture 06 (270316)Big Data Analytics amp Applications
Lecture 07 (030416)Big Data Analytics amp ApplicationsLecture 08 (100416)Data Science Techniques ampApplicationsLecture 09 (080516)Big Data Analytics amp ApplicationsLecture 10 (150516)Data Science Techniques ampApplicationsLecture 11 (220516)Oral PresentationsLecture 12 (290516)Oral Presentations
Evangelos Pournaras Izabela Moise 25
How to contact us
Communication
bull Discussion session in the course
bull E-mail with subject[DATA-SCIENCE-COURSE-2016]ltotherinfogtto
ndash Iza Moise imoiseethzchndash Evangelos Pournaras epournarasethzch
Supervision - strictly for issues not addressed in the course
bull Mondays 1500-1700Clausiusstrasse 50 (CLU C 4) 8092 Zurich
Evangelos Pournaras Izabela Moise 26
Proposed Literature
B Ellis
Real-Time Analytics Techniques to Analyze and Visualize Streaming Data
Wiley Publishing 1st edition 2014
J Han
Data Mining Concepts and Techniques
Morgan Kaufmann Publishers Inc San Francisco CA USA 2005
T White
Hadoop The Definitive Guide
OrsquoReilly Media Inc 2015
I H Witten E Frank and M A Hall
Data Mining Practical Machine Learning Tools and Techniques
Morgan Kaufmann Publishers Inc San Francisco CA USA 3rd edition2011
Evangelos Pournaras Izabela Moise 27
What is next
bull Seminar thesis
bull Examples and applications
Evangelos Pournaras Izabela Moise 28
What is Data Science
A collection of orchestrated methods from different scientific fieldseg statistics computer science etc that provide understanding ofdomain data and result in data-based products and services
Evangelos Pournaras Izabela Moise 4
Is Data Science about Big Data I
Evangelos Pournaras Izabela Moise 5
Is Data Science about Big Data II
Itrsquos more about using the right dataand asking the right questions
Evangelos Pournaras Izabela Moise 6
What about Techno-socio-economic Systems
Evangelos Pournaras Izabela Moise 7
ICT amp Techno-socio-economic Systems
bull Embedded ICT systems in most societal domains How
bull Internet of Things pervasiveubiquitous computing advancednetworking systems inter-operability Result
bull A new explosion of data sources Opportunities
bull Understanding improving managing amp sustaining our complexsociety Threats
bull Privacy discrimination misinterpretations over-fitting etc
Evangelos Pournaras Izabela Moise 8
Threats I
Evangelos Pournaras Izabela Moise 9
Threats II
Evangelos Pournaras Izabela Moise 10
Who is a Data Scientist
bull A statistician
bull A computer programmer
bull Both and More
TipDomain knowledge can be more valuable than machine learning datamining etc
Evangelos Pournaras Izabela Moise 11
Real-world Profile I
Evangelos Pournaras Izabela Moise 12
Real-world Profile II
Evangelos Pournaras Izabela Moise 13
More about Data Scientists
httpshbrorg201210data-scientist-the-sexiest-job-of-the-21st-century
Evangelos Pournaras Izabela Moise 14
More about Data Scientists
Data scientistsrsquo most basic universal skill is the ability to writecode This may be less true in five yearsrsquo time when many morepeople will have the title data scientist on their business cardsMore enduring will be the need for data scientists to communicate inlanguage that all their stakeholders understand-and to demonstratethe special skills involved in storytelling with data whetherverbally visually or - ideally both
Evangelos Pournaras Izabela Moise 15
More about Data Scientists
But we would say the dominant trait among data scientists is anintense curiosity-a desire to go beneath the surface of a problemfind the questions at its heart and distill them into a very clearset of hypotheses that can be tested
Evangelos Pournaras Izabela Moise 16
More about Data Scientists
A quantitative analyst can be great at analyzing data but not atsubduing a mass of unstructured data and getting it into a form inwhich it can be analyzed
A data management expert might be great at generating andorganizing data in structured form but not at turning unstructureddata into structured data-and also not at actually analyzing thedata
And while people without strong social skills might thrive intraditional data professions data scientists must have suchskills to be effective
Evangelos Pournaras Izabela Moise 17
Part 2 - Course Description
Evangelos Pournaras Izabela Moise 18
Course ObjectivesQualify you with knowledge amp skills to tackle real-world problemsusing data
1 Acquiring domain knowledge and understanding2 Better understanding and interpretation of data
3 Awareness about the applicability of different data sciencemethods
4 Development of technical skills eg programming use ofdifferent tools etc
5 Presenting scientific results both written and orally
Evangelos Pournaras Izabela Moise 19
Course Prerequisites
Some programming skills are required eg skills for the material ofthis course
1 JavaC++Python
2 UNIX
Didnrsquot you have an opportunity to practice this earlier
No problem this is a golden opportunity
TipProgramming skills will make you more flexible and efficient datascientist
Evangelos Pournaras Izabela Moise 20
Assessment
bull Seminar thesis
bull 100 of the grade no exams
bull Detailed illustration in a next lecture
TipStart early Give the opportunity for your project and your skills todevelop during the course
Evangelos Pournaras Izabela Moise 21
Lectures
bull Every Monday 1715-1900 at LFW B 1
bull Participation is not obligatory but highly recommended
bull 60 minutes lectures followed by 40 minutes interactivediscussions
bull Opportunity to discuss your projectbull Lectures at
httpwwwcossethzcheducationdatasciencehtml
Evangelos Pournaras Izabela Moise 22
Subjects I
1 Computational Social Science Applicationsndash Smart Grids geolocation traffic systems social sensingmining
privacyndash Tools amp platforms Nervousnet Twitter GDELT
2 Data Science Fundamentalsndash databases data types data collection data pre-processing
plotting visualization etcndash Tools Java AWK MySQL Gnuplot Gephi etc
3 Data Mining and Machine Learningndash classification clustering prediction neural networks etcndash Tools Weka
4 Big Data Analytics
Evangelos Pournaras Izabela Moise 23
Subjects II
ndash MapReduce parallel computing data streaming social mediaetc
ndash Tools Hadoop Spark Mahout Spark Streaming Storm etc
5 Otherndash Project presentations
Evangelos Pournaras Izabela Moise 24
Lectures OutlineLecture 01 (200216)Introduction amp Coursework OutlineLecture 02 (270216)Data Mining Machine Learning ampApplicationsLecture 03 (060316)Data Science Techniques ampApplicationsLecture 04 (130316)Data Mining Machine Learning ampApplicationsLecture 05 (200316)Data Science Techniques ampApplicationsLecture 06 (270316)Big Data Analytics amp Applications
Lecture 07 (030416)Big Data Analytics amp ApplicationsLecture 08 (100416)Data Science Techniques ampApplicationsLecture 09 (080516)Big Data Analytics amp ApplicationsLecture 10 (150516)Data Science Techniques ampApplicationsLecture 11 (220516)Oral PresentationsLecture 12 (290516)Oral Presentations
Evangelos Pournaras Izabela Moise 25
How to contact us
Communication
bull Discussion session in the course
bull E-mail with subject[DATA-SCIENCE-COURSE-2016]ltotherinfogtto
ndash Iza Moise imoiseethzchndash Evangelos Pournaras epournarasethzch
Supervision - strictly for issues not addressed in the course
bull Mondays 1500-1700Clausiusstrasse 50 (CLU C 4) 8092 Zurich
Evangelos Pournaras Izabela Moise 26
Proposed Literature
B Ellis
Real-Time Analytics Techniques to Analyze and Visualize Streaming Data
Wiley Publishing 1st edition 2014
J Han
Data Mining Concepts and Techniques
Morgan Kaufmann Publishers Inc San Francisco CA USA 2005
T White
Hadoop The Definitive Guide
OrsquoReilly Media Inc 2015
I H Witten E Frank and M A Hall
Data Mining Practical Machine Learning Tools and Techniques
Morgan Kaufmann Publishers Inc San Francisco CA USA 3rd edition2011
Evangelos Pournaras Izabela Moise 27
What is next
bull Seminar thesis
bull Examples and applications
Evangelos Pournaras Izabela Moise 28
Is Data Science about Big Data I
Evangelos Pournaras Izabela Moise 5
Is Data Science about Big Data II
Itrsquos more about using the right dataand asking the right questions
Evangelos Pournaras Izabela Moise 6
What about Techno-socio-economic Systems
Evangelos Pournaras Izabela Moise 7
ICT amp Techno-socio-economic Systems
bull Embedded ICT systems in most societal domains How
bull Internet of Things pervasiveubiquitous computing advancednetworking systems inter-operability Result
bull A new explosion of data sources Opportunities
bull Understanding improving managing amp sustaining our complexsociety Threats
bull Privacy discrimination misinterpretations over-fitting etc
Evangelos Pournaras Izabela Moise 8
Threats I
Evangelos Pournaras Izabela Moise 9
Threats II
Evangelos Pournaras Izabela Moise 10
Who is a Data Scientist
bull A statistician
bull A computer programmer
bull Both and More
TipDomain knowledge can be more valuable than machine learning datamining etc
Evangelos Pournaras Izabela Moise 11
Real-world Profile I
Evangelos Pournaras Izabela Moise 12
Real-world Profile II
Evangelos Pournaras Izabela Moise 13
More about Data Scientists
httpshbrorg201210data-scientist-the-sexiest-job-of-the-21st-century
Evangelos Pournaras Izabela Moise 14
More about Data Scientists
Data scientistsrsquo most basic universal skill is the ability to writecode This may be less true in five yearsrsquo time when many morepeople will have the title data scientist on their business cardsMore enduring will be the need for data scientists to communicate inlanguage that all their stakeholders understand-and to demonstratethe special skills involved in storytelling with data whetherverbally visually or - ideally both
Evangelos Pournaras Izabela Moise 15
More about Data Scientists
But we would say the dominant trait among data scientists is anintense curiosity-a desire to go beneath the surface of a problemfind the questions at its heart and distill them into a very clearset of hypotheses that can be tested
Evangelos Pournaras Izabela Moise 16
More about Data Scientists
A quantitative analyst can be great at analyzing data but not atsubduing a mass of unstructured data and getting it into a form inwhich it can be analyzed
A data management expert might be great at generating andorganizing data in structured form but not at turning unstructureddata into structured data-and also not at actually analyzing thedata
And while people without strong social skills might thrive intraditional data professions data scientists must have suchskills to be effective
Evangelos Pournaras Izabela Moise 17
Part 2 - Course Description
Evangelos Pournaras Izabela Moise 18
Course ObjectivesQualify you with knowledge amp skills to tackle real-world problemsusing data
1 Acquiring domain knowledge and understanding2 Better understanding and interpretation of data
3 Awareness about the applicability of different data sciencemethods
4 Development of technical skills eg programming use ofdifferent tools etc
5 Presenting scientific results both written and orally
Evangelos Pournaras Izabela Moise 19
Course Prerequisites
Some programming skills are required eg skills for the material ofthis course
1 JavaC++Python
2 UNIX
Didnrsquot you have an opportunity to practice this earlier
No problem this is a golden opportunity
TipProgramming skills will make you more flexible and efficient datascientist
Evangelos Pournaras Izabela Moise 20
Assessment
bull Seminar thesis
bull 100 of the grade no exams
bull Detailed illustration in a next lecture
TipStart early Give the opportunity for your project and your skills todevelop during the course
Evangelos Pournaras Izabela Moise 21
Lectures
bull Every Monday 1715-1900 at LFW B 1
bull Participation is not obligatory but highly recommended
bull 60 minutes lectures followed by 40 minutes interactivediscussions
bull Opportunity to discuss your projectbull Lectures at
httpwwwcossethzcheducationdatasciencehtml
Evangelos Pournaras Izabela Moise 22
Subjects I
1 Computational Social Science Applicationsndash Smart Grids geolocation traffic systems social sensingmining
privacyndash Tools amp platforms Nervousnet Twitter GDELT
2 Data Science Fundamentalsndash databases data types data collection data pre-processing
plotting visualization etcndash Tools Java AWK MySQL Gnuplot Gephi etc
3 Data Mining and Machine Learningndash classification clustering prediction neural networks etcndash Tools Weka
4 Big Data Analytics
Evangelos Pournaras Izabela Moise 23
Subjects II
ndash MapReduce parallel computing data streaming social mediaetc
ndash Tools Hadoop Spark Mahout Spark Streaming Storm etc
5 Otherndash Project presentations
Evangelos Pournaras Izabela Moise 24
Lectures OutlineLecture 01 (200216)Introduction amp Coursework OutlineLecture 02 (270216)Data Mining Machine Learning ampApplicationsLecture 03 (060316)Data Science Techniques ampApplicationsLecture 04 (130316)Data Mining Machine Learning ampApplicationsLecture 05 (200316)Data Science Techniques ampApplicationsLecture 06 (270316)Big Data Analytics amp Applications
Lecture 07 (030416)Big Data Analytics amp ApplicationsLecture 08 (100416)Data Science Techniques ampApplicationsLecture 09 (080516)Big Data Analytics amp ApplicationsLecture 10 (150516)Data Science Techniques ampApplicationsLecture 11 (220516)Oral PresentationsLecture 12 (290516)Oral Presentations
Evangelos Pournaras Izabela Moise 25
How to contact us
Communication
bull Discussion session in the course
bull E-mail with subject[DATA-SCIENCE-COURSE-2016]ltotherinfogtto
ndash Iza Moise imoiseethzchndash Evangelos Pournaras epournarasethzch
Supervision - strictly for issues not addressed in the course
bull Mondays 1500-1700Clausiusstrasse 50 (CLU C 4) 8092 Zurich
Evangelos Pournaras Izabela Moise 26
Proposed Literature
B Ellis
Real-Time Analytics Techniques to Analyze and Visualize Streaming Data
Wiley Publishing 1st edition 2014
J Han
Data Mining Concepts and Techniques
Morgan Kaufmann Publishers Inc San Francisco CA USA 2005
T White
Hadoop The Definitive Guide
OrsquoReilly Media Inc 2015
I H Witten E Frank and M A Hall
Data Mining Practical Machine Learning Tools and Techniques
Morgan Kaufmann Publishers Inc San Francisco CA USA 3rd edition2011
Evangelos Pournaras Izabela Moise 27
What is next
bull Seminar thesis
bull Examples and applications
Evangelos Pournaras Izabela Moise 28
Is Data Science about Big Data II
Itrsquos more about using the right dataand asking the right questions
Evangelos Pournaras Izabela Moise 6
What about Techno-socio-economic Systems
Evangelos Pournaras Izabela Moise 7
ICT amp Techno-socio-economic Systems
bull Embedded ICT systems in most societal domains How
bull Internet of Things pervasiveubiquitous computing advancednetworking systems inter-operability Result
bull A new explosion of data sources Opportunities
bull Understanding improving managing amp sustaining our complexsociety Threats
bull Privacy discrimination misinterpretations over-fitting etc
Evangelos Pournaras Izabela Moise 8
Threats I
Evangelos Pournaras Izabela Moise 9
Threats II
Evangelos Pournaras Izabela Moise 10
Who is a Data Scientist
bull A statistician
bull A computer programmer
bull Both and More
TipDomain knowledge can be more valuable than machine learning datamining etc
Evangelos Pournaras Izabela Moise 11
Real-world Profile I
Evangelos Pournaras Izabela Moise 12
Real-world Profile II
Evangelos Pournaras Izabela Moise 13
More about Data Scientists
httpshbrorg201210data-scientist-the-sexiest-job-of-the-21st-century
Evangelos Pournaras Izabela Moise 14
More about Data Scientists
Data scientistsrsquo most basic universal skill is the ability to writecode This may be less true in five yearsrsquo time when many morepeople will have the title data scientist on their business cardsMore enduring will be the need for data scientists to communicate inlanguage that all their stakeholders understand-and to demonstratethe special skills involved in storytelling with data whetherverbally visually or - ideally both
Evangelos Pournaras Izabela Moise 15
More about Data Scientists
But we would say the dominant trait among data scientists is anintense curiosity-a desire to go beneath the surface of a problemfind the questions at its heart and distill them into a very clearset of hypotheses that can be tested
Evangelos Pournaras Izabela Moise 16
More about Data Scientists
A quantitative analyst can be great at analyzing data but not atsubduing a mass of unstructured data and getting it into a form inwhich it can be analyzed
A data management expert might be great at generating andorganizing data in structured form but not at turning unstructureddata into structured data-and also not at actually analyzing thedata
And while people without strong social skills might thrive intraditional data professions data scientists must have suchskills to be effective
Evangelos Pournaras Izabela Moise 17
Part 2 - Course Description
Evangelos Pournaras Izabela Moise 18
Course ObjectivesQualify you with knowledge amp skills to tackle real-world problemsusing data
1 Acquiring domain knowledge and understanding2 Better understanding and interpretation of data
3 Awareness about the applicability of different data sciencemethods
4 Development of technical skills eg programming use ofdifferent tools etc
5 Presenting scientific results both written and orally
Evangelos Pournaras Izabela Moise 19
Course Prerequisites
Some programming skills are required eg skills for the material ofthis course
1 JavaC++Python
2 UNIX
Didnrsquot you have an opportunity to practice this earlier
No problem this is a golden opportunity
TipProgramming skills will make you more flexible and efficient datascientist
Evangelos Pournaras Izabela Moise 20
Assessment
bull Seminar thesis
bull 100 of the grade no exams
bull Detailed illustration in a next lecture
TipStart early Give the opportunity for your project and your skills todevelop during the course
Evangelos Pournaras Izabela Moise 21
Lectures
bull Every Monday 1715-1900 at LFW B 1
bull Participation is not obligatory but highly recommended
bull 60 minutes lectures followed by 40 minutes interactivediscussions
bull Opportunity to discuss your projectbull Lectures at
httpwwwcossethzcheducationdatasciencehtml
Evangelos Pournaras Izabela Moise 22
Subjects I
1 Computational Social Science Applicationsndash Smart Grids geolocation traffic systems social sensingmining
privacyndash Tools amp platforms Nervousnet Twitter GDELT
2 Data Science Fundamentalsndash databases data types data collection data pre-processing
plotting visualization etcndash Tools Java AWK MySQL Gnuplot Gephi etc
3 Data Mining and Machine Learningndash classification clustering prediction neural networks etcndash Tools Weka
4 Big Data Analytics
Evangelos Pournaras Izabela Moise 23
Subjects II
ndash MapReduce parallel computing data streaming social mediaetc
ndash Tools Hadoop Spark Mahout Spark Streaming Storm etc
5 Otherndash Project presentations
Evangelos Pournaras Izabela Moise 24
Lectures OutlineLecture 01 (200216)Introduction amp Coursework OutlineLecture 02 (270216)Data Mining Machine Learning ampApplicationsLecture 03 (060316)Data Science Techniques ampApplicationsLecture 04 (130316)Data Mining Machine Learning ampApplicationsLecture 05 (200316)Data Science Techniques ampApplicationsLecture 06 (270316)Big Data Analytics amp Applications
Lecture 07 (030416)Big Data Analytics amp ApplicationsLecture 08 (100416)Data Science Techniques ampApplicationsLecture 09 (080516)Big Data Analytics amp ApplicationsLecture 10 (150516)Data Science Techniques ampApplicationsLecture 11 (220516)Oral PresentationsLecture 12 (290516)Oral Presentations
Evangelos Pournaras Izabela Moise 25
How to contact us
Communication
bull Discussion session in the course
bull E-mail with subject[DATA-SCIENCE-COURSE-2016]ltotherinfogtto
ndash Iza Moise imoiseethzchndash Evangelos Pournaras epournarasethzch
Supervision - strictly for issues not addressed in the course
bull Mondays 1500-1700Clausiusstrasse 50 (CLU C 4) 8092 Zurich
Evangelos Pournaras Izabela Moise 26
Proposed Literature
B Ellis
Real-Time Analytics Techniques to Analyze and Visualize Streaming Data
Wiley Publishing 1st edition 2014
J Han
Data Mining Concepts and Techniques
Morgan Kaufmann Publishers Inc San Francisco CA USA 2005
T White
Hadoop The Definitive Guide
OrsquoReilly Media Inc 2015
I H Witten E Frank and M A Hall
Data Mining Practical Machine Learning Tools and Techniques
Morgan Kaufmann Publishers Inc San Francisco CA USA 3rd edition2011
Evangelos Pournaras Izabela Moise 27
What is next
bull Seminar thesis
bull Examples and applications
Evangelos Pournaras Izabela Moise 28
What about Techno-socio-economic Systems
Evangelos Pournaras Izabela Moise 7
ICT amp Techno-socio-economic Systems
bull Embedded ICT systems in most societal domains How
bull Internet of Things pervasiveubiquitous computing advancednetworking systems inter-operability Result
bull A new explosion of data sources Opportunities
bull Understanding improving managing amp sustaining our complexsociety Threats
bull Privacy discrimination misinterpretations over-fitting etc
Evangelos Pournaras Izabela Moise 8
Threats I
Evangelos Pournaras Izabela Moise 9
Threats II
Evangelos Pournaras Izabela Moise 10
Who is a Data Scientist
bull A statistician
bull A computer programmer
bull Both and More
TipDomain knowledge can be more valuable than machine learning datamining etc
Evangelos Pournaras Izabela Moise 11
Real-world Profile I
Evangelos Pournaras Izabela Moise 12
Real-world Profile II
Evangelos Pournaras Izabela Moise 13
More about Data Scientists
httpshbrorg201210data-scientist-the-sexiest-job-of-the-21st-century
Evangelos Pournaras Izabela Moise 14
More about Data Scientists
Data scientistsrsquo most basic universal skill is the ability to writecode This may be less true in five yearsrsquo time when many morepeople will have the title data scientist on their business cardsMore enduring will be the need for data scientists to communicate inlanguage that all their stakeholders understand-and to demonstratethe special skills involved in storytelling with data whetherverbally visually or - ideally both
Evangelos Pournaras Izabela Moise 15
More about Data Scientists
But we would say the dominant trait among data scientists is anintense curiosity-a desire to go beneath the surface of a problemfind the questions at its heart and distill them into a very clearset of hypotheses that can be tested
Evangelos Pournaras Izabela Moise 16
More about Data Scientists
A quantitative analyst can be great at analyzing data but not atsubduing a mass of unstructured data and getting it into a form inwhich it can be analyzed
A data management expert might be great at generating andorganizing data in structured form but not at turning unstructureddata into structured data-and also not at actually analyzing thedata
And while people without strong social skills might thrive intraditional data professions data scientists must have suchskills to be effective
Evangelos Pournaras Izabela Moise 17
Part 2 - Course Description
Evangelos Pournaras Izabela Moise 18
Course ObjectivesQualify you with knowledge amp skills to tackle real-world problemsusing data
1 Acquiring domain knowledge and understanding2 Better understanding and interpretation of data
3 Awareness about the applicability of different data sciencemethods
4 Development of technical skills eg programming use ofdifferent tools etc
5 Presenting scientific results both written and orally
Evangelos Pournaras Izabela Moise 19
Course Prerequisites
Some programming skills are required eg skills for the material ofthis course
1 JavaC++Python
2 UNIX
Didnrsquot you have an opportunity to practice this earlier
No problem this is a golden opportunity
TipProgramming skills will make you more flexible and efficient datascientist
Evangelos Pournaras Izabela Moise 20
Assessment
bull Seminar thesis
bull 100 of the grade no exams
bull Detailed illustration in a next lecture
TipStart early Give the opportunity for your project and your skills todevelop during the course
Evangelos Pournaras Izabela Moise 21
Lectures
bull Every Monday 1715-1900 at LFW B 1
bull Participation is not obligatory but highly recommended
bull 60 minutes lectures followed by 40 minutes interactivediscussions
bull Opportunity to discuss your projectbull Lectures at
httpwwwcossethzcheducationdatasciencehtml
Evangelos Pournaras Izabela Moise 22
Subjects I
1 Computational Social Science Applicationsndash Smart Grids geolocation traffic systems social sensingmining
privacyndash Tools amp platforms Nervousnet Twitter GDELT
2 Data Science Fundamentalsndash databases data types data collection data pre-processing
plotting visualization etcndash Tools Java AWK MySQL Gnuplot Gephi etc
3 Data Mining and Machine Learningndash classification clustering prediction neural networks etcndash Tools Weka
4 Big Data Analytics
Evangelos Pournaras Izabela Moise 23
Subjects II
ndash MapReduce parallel computing data streaming social mediaetc
ndash Tools Hadoop Spark Mahout Spark Streaming Storm etc
5 Otherndash Project presentations
Evangelos Pournaras Izabela Moise 24
Lectures OutlineLecture 01 (200216)Introduction amp Coursework OutlineLecture 02 (270216)Data Mining Machine Learning ampApplicationsLecture 03 (060316)Data Science Techniques ampApplicationsLecture 04 (130316)Data Mining Machine Learning ampApplicationsLecture 05 (200316)Data Science Techniques ampApplicationsLecture 06 (270316)Big Data Analytics amp Applications
Lecture 07 (030416)Big Data Analytics amp ApplicationsLecture 08 (100416)Data Science Techniques ampApplicationsLecture 09 (080516)Big Data Analytics amp ApplicationsLecture 10 (150516)Data Science Techniques ampApplicationsLecture 11 (220516)Oral PresentationsLecture 12 (290516)Oral Presentations
Evangelos Pournaras Izabela Moise 25
How to contact us
Communication
bull Discussion session in the course
bull E-mail with subject[DATA-SCIENCE-COURSE-2016]ltotherinfogtto
ndash Iza Moise imoiseethzchndash Evangelos Pournaras epournarasethzch
Supervision - strictly for issues not addressed in the course
bull Mondays 1500-1700Clausiusstrasse 50 (CLU C 4) 8092 Zurich
Evangelos Pournaras Izabela Moise 26
Proposed Literature
B Ellis
Real-Time Analytics Techniques to Analyze and Visualize Streaming Data
Wiley Publishing 1st edition 2014
J Han
Data Mining Concepts and Techniques
Morgan Kaufmann Publishers Inc San Francisco CA USA 2005
T White
Hadoop The Definitive Guide
OrsquoReilly Media Inc 2015
I H Witten E Frank and M A Hall
Data Mining Practical Machine Learning Tools and Techniques
Morgan Kaufmann Publishers Inc San Francisco CA USA 3rd edition2011
Evangelos Pournaras Izabela Moise 27
What is next
bull Seminar thesis
bull Examples and applications
Evangelos Pournaras Izabela Moise 28
ICT amp Techno-socio-economic Systems
bull Embedded ICT systems in most societal domains How
bull Internet of Things pervasiveubiquitous computing advancednetworking systems inter-operability Result
bull A new explosion of data sources Opportunities
bull Understanding improving managing amp sustaining our complexsociety Threats
bull Privacy discrimination misinterpretations over-fitting etc
Evangelos Pournaras Izabela Moise 8
Threats I
Evangelos Pournaras Izabela Moise 9
Threats II
Evangelos Pournaras Izabela Moise 10
Who is a Data Scientist
bull A statistician
bull A computer programmer
bull Both and More
TipDomain knowledge can be more valuable than machine learning datamining etc
Evangelos Pournaras Izabela Moise 11
Real-world Profile I
Evangelos Pournaras Izabela Moise 12
Real-world Profile II
Evangelos Pournaras Izabela Moise 13
More about Data Scientists
httpshbrorg201210data-scientist-the-sexiest-job-of-the-21st-century
Evangelos Pournaras Izabela Moise 14
More about Data Scientists
Data scientistsrsquo most basic universal skill is the ability to writecode This may be less true in five yearsrsquo time when many morepeople will have the title data scientist on their business cardsMore enduring will be the need for data scientists to communicate inlanguage that all their stakeholders understand-and to demonstratethe special skills involved in storytelling with data whetherverbally visually or - ideally both
Evangelos Pournaras Izabela Moise 15
More about Data Scientists
But we would say the dominant trait among data scientists is anintense curiosity-a desire to go beneath the surface of a problemfind the questions at its heart and distill them into a very clearset of hypotheses that can be tested
Evangelos Pournaras Izabela Moise 16
More about Data Scientists
A quantitative analyst can be great at analyzing data but not atsubduing a mass of unstructured data and getting it into a form inwhich it can be analyzed
A data management expert might be great at generating andorganizing data in structured form but not at turning unstructureddata into structured data-and also not at actually analyzing thedata
And while people without strong social skills might thrive intraditional data professions data scientists must have suchskills to be effective
Evangelos Pournaras Izabela Moise 17
Part 2 - Course Description
Evangelos Pournaras Izabela Moise 18
Course ObjectivesQualify you with knowledge amp skills to tackle real-world problemsusing data
1 Acquiring domain knowledge and understanding2 Better understanding and interpretation of data
3 Awareness about the applicability of different data sciencemethods
4 Development of technical skills eg programming use ofdifferent tools etc
5 Presenting scientific results both written and orally
Evangelos Pournaras Izabela Moise 19
Course Prerequisites
Some programming skills are required eg skills for the material ofthis course
1 JavaC++Python
2 UNIX
Didnrsquot you have an opportunity to practice this earlier
No problem this is a golden opportunity
TipProgramming skills will make you more flexible and efficient datascientist
Evangelos Pournaras Izabela Moise 20
Assessment
bull Seminar thesis
bull 100 of the grade no exams
bull Detailed illustration in a next lecture
TipStart early Give the opportunity for your project and your skills todevelop during the course
Evangelos Pournaras Izabela Moise 21
Lectures
bull Every Monday 1715-1900 at LFW B 1
bull Participation is not obligatory but highly recommended
bull 60 minutes lectures followed by 40 minutes interactivediscussions
bull Opportunity to discuss your projectbull Lectures at
httpwwwcossethzcheducationdatasciencehtml
Evangelos Pournaras Izabela Moise 22
Subjects I
1 Computational Social Science Applicationsndash Smart Grids geolocation traffic systems social sensingmining
privacyndash Tools amp platforms Nervousnet Twitter GDELT
2 Data Science Fundamentalsndash databases data types data collection data pre-processing
plotting visualization etcndash Tools Java AWK MySQL Gnuplot Gephi etc
3 Data Mining and Machine Learningndash classification clustering prediction neural networks etcndash Tools Weka
4 Big Data Analytics
Evangelos Pournaras Izabela Moise 23
Subjects II
ndash MapReduce parallel computing data streaming social mediaetc
ndash Tools Hadoop Spark Mahout Spark Streaming Storm etc
5 Otherndash Project presentations
Evangelos Pournaras Izabela Moise 24
Lectures OutlineLecture 01 (200216)Introduction amp Coursework OutlineLecture 02 (270216)Data Mining Machine Learning ampApplicationsLecture 03 (060316)Data Science Techniques ampApplicationsLecture 04 (130316)Data Mining Machine Learning ampApplicationsLecture 05 (200316)Data Science Techniques ampApplicationsLecture 06 (270316)Big Data Analytics amp Applications
Lecture 07 (030416)Big Data Analytics amp ApplicationsLecture 08 (100416)Data Science Techniques ampApplicationsLecture 09 (080516)Big Data Analytics amp ApplicationsLecture 10 (150516)Data Science Techniques ampApplicationsLecture 11 (220516)Oral PresentationsLecture 12 (290516)Oral Presentations
Evangelos Pournaras Izabela Moise 25
How to contact us
Communication
bull Discussion session in the course
bull E-mail with subject[DATA-SCIENCE-COURSE-2016]ltotherinfogtto
ndash Iza Moise imoiseethzchndash Evangelos Pournaras epournarasethzch
Supervision - strictly for issues not addressed in the course
bull Mondays 1500-1700Clausiusstrasse 50 (CLU C 4) 8092 Zurich
Evangelos Pournaras Izabela Moise 26
Proposed Literature
B Ellis
Real-Time Analytics Techniques to Analyze and Visualize Streaming Data
Wiley Publishing 1st edition 2014
J Han
Data Mining Concepts and Techniques
Morgan Kaufmann Publishers Inc San Francisco CA USA 2005
T White
Hadoop The Definitive Guide
OrsquoReilly Media Inc 2015
I H Witten E Frank and M A Hall
Data Mining Practical Machine Learning Tools and Techniques
Morgan Kaufmann Publishers Inc San Francisco CA USA 3rd edition2011
Evangelos Pournaras Izabela Moise 27
What is next
bull Seminar thesis
bull Examples and applications
Evangelos Pournaras Izabela Moise 28
Threats I
Evangelos Pournaras Izabela Moise 9
Threats II
Evangelos Pournaras Izabela Moise 10
Who is a Data Scientist
bull A statistician
bull A computer programmer
bull Both and More
TipDomain knowledge can be more valuable than machine learning datamining etc
Evangelos Pournaras Izabela Moise 11
Real-world Profile I
Evangelos Pournaras Izabela Moise 12
Real-world Profile II
Evangelos Pournaras Izabela Moise 13
More about Data Scientists
httpshbrorg201210data-scientist-the-sexiest-job-of-the-21st-century
Evangelos Pournaras Izabela Moise 14
More about Data Scientists
Data scientistsrsquo most basic universal skill is the ability to writecode This may be less true in five yearsrsquo time when many morepeople will have the title data scientist on their business cardsMore enduring will be the need for data scientists to communicate inlanguage that all their stakeholders understand-and to demonstratethe special skills involved in storytelling with data whetherverbally visually or - ideally both
Evangelos Pournaras Izabela Moise 15
More about Data Scientists
But we would say the dominant trait among data scientists is anintense curiosity-a desire to go beneath the surface of a problemfind the questions at its heart and distill them into a very clearset of hypotheses that can be tested
Evangelos Pournaras Izabela Moise 16
More about Data Scientists
A quantitative analyst can be great at analyzing data but not atsubduing a mass of unstructured data and getting it into a form inwhich it can be analyzed
A data management expert might be great at generating andorganizing data in structured form but not at turning unstructureddata into structured data-and also not at actually analyzing thedata
And while people without strong social skills might thrive intraditional data professions data scientists must have suchskills to be effective
Evangelos Pournaras Izabela Moise 17
Part 2 - Course Description
Evangelos Pournaras Izabela Moise 18
Course ObjectivesQualify you with knowledge amp skills to tackle real-world problemsusing data
1 Acquiring domain knowledge and understanding2 Better understanding and interpretation of data
3 Awareness about the applicability of different data sciencemethods
4 Development of technical skills eg programming use ofdifferent tools etc
5 Presenting scientific results both written and orally
Evangelos Pournaras Izabela Moise 19
Course Prerequisites
Some programming skills are required eg skills for the material ofthis course
1 JavaC++Python
2 UNIX
Didnrsquot you have an opportunity to practice this earlier
No problem this is a golden opportunity
TipProgramming skills will make you more flexible and efficient datascientist
Evangelos Pournaras Izabela Moise 20
Assessment
bull Seminar thesis
bull 100 of the grade no exams
bull Detailed illustration in a next lecture
TipStart early Give the opportunity for your project and your skills todevelop during the course
Evangelos Pournaras Izabela Moise 21
Lectures
bull Every Monday 1715-1900 at LFW B 1
bull Participation is not obligatory but highly recommended
bull 60 minutes lectures followed by 40 minutes interactivediscussions
bull Opportunity to discuss your projectbull Lectures at
httpwwwcossethzcheducationdatasciencehtml
Evangelos Pournaras Izabela Moise 22
Subjects I
1 Computational Social Science Applicationsndash Smart Grids geolocation traffic systems social sensingmining
privacyndash Tools amp platforms Nervousnet Twitter GDELT
2 Data Science Fundamentalsndash databases data types data collection data pre-processing
plotting visualization etcndash Tools Java AWK MySQL Gnuplot Gephi etc
3 Data Mining and Machine Learningndash classification clustering prediction neural networks etcndash Tools Weka
4 Big Data Analytics
Evangelos Pournaras Izabela Moise 23
Subjects II
ndash MapReduce parallel computing data streaming social mediaetc
ndash Tools Hadoop Spark Mahout Spark Streaming Storm etc
5 Otherndash Project presentations
Evangelos Pournaras Izabela Moise 24
Lectures OutlineLecture 01 (200216)Introduction amp Coursework OutlineLecture 02 (270216)Data Mining Machine Learning ampApplicationsLecture 03 (060316)Data Science Techniques ampApplicationsLecture 04 (130316)Data Mining Machine Learning ampApplicationsLecture 05 (200316)Data Science Techniques ampApplicationsLecture 06 (270316)Big Data Analytics amp Applications
Lecture 07 (030416)Big Data Analytics amp ApplicationsLecture 08 (100416)Data Science Techniques ampApplicationsLecture 09 (080516)Big Data Analytics amp ApplicationsLecture 10 (150516)Data Science Techniques ampApplicationsLecture 11 (220516)Oral PresentationsLecture 12 (290516)Oral Presentations
Evangelos Pournaras Izabela Moise 25
How to contact us
Communication
bull Discussion session in the course
bull E-mail with subject[DATA-SCIENCE-COURSE-2016]ltotherinfogtto
ndash Iza Moise imoiseethzchndash Evangelos Pournaras epournarasethzch
Supervision - strictly for issues not addressed in the course
bull Mondays 1500-1700Clausiusstrasse 50 (CLU C 4) 8092 Zurich
Evangelos Pournaras Izabela Moise 26
Proposed Literature
B Ellis
Real-Time Analytics Techniques to Analyze and Visualize Streaming Data
Wiley Publishing 1st edition 2014
J Han
Data Mining Concepts and Techniques
Morgan Kaufmann Publishers Inc San Francisco CA USA 2005
T White
Hadoop The Definitive Guide
OrsquoReilly Media Inc 2015
I H Witten E Frank and M A Hall
Data Mining Practical Machine Learning Tools and Techniques
Morgan Kaufmann Publishers Inc San Francisco CA USA 3rd edition2011
Evangelos Pournaras Izabela Moise 27
What is next
bull Seminar thesis
bull Examples and applications
Evangelos Pournaras Izabela Moise 28
Threats II
Evangelos Pournaras Izabela Moise 10
Who is a Data Scientist
bull A statistician
bull A computer programmer
bull Both and More
TipDomain knowledge can be more valuable than machine learning datamining etc
Evangelos Pournaras Izabela Moise 11
Real-world Profile I
Evangelos Pournaras Izabela Moise 12
Real-world Profile II
Evangelos Pournaras Izabela Moise 13
More about Data Scientists
httpshbrorg201210data-scientist-the-sexiest-job-of-the-21st-century
Evangelos Pournaras Izabela Moise 14
More about Data Scientists
Data scientistsrsquo most basic universal skill is the ability to writecode This may be less true in five yearsrsquo time when many morepeople will have the title data scientist on their business cardsMore enduring will be the need for data scientists to communicate inlanguage that all their stakeholders understand-and to demonstratethe special skills involved in storytelling with data whetherverbally visually or - ideally both
Evangelos Pournaras Izabela Moise 15
More about Data Scientists
But we would say the dominant trait among data scientists is anintense curiosity-a desire to go beneath the surface of a problemfind the questions at its heart and distill them into a very clearset of hypotheses that can be tested
Evangelos Pournaras Izabela Moise 16
More about Data Scientists
A quantitative analyst can be great at analyzing data but not atsubduing a mass of unstructured data and getting it into a form inwhich it can be analyzed
A data management expert might be great at generating andorganizing data in structured form but not at turning unstructureddata into structured data-and also not at actually analyzing thedata
And while people without strong social skills might thrive intraditional data professions data scientists must have suchskills to be effective
Evangelos Pournaras Izabela Moise 17
Part 2 - Course Description
Evangelos Pournaras Izabela Moise 18
Course ObjectivesQualify you with knowledge amp skills to tackle real-world problemsusing data
1 Acquiring domain knowledge and understanding2 Better understanding and interpretation of data
3 Awareness about the applicability of different data sciencemethods
4 Development of technical skills eg programming use ofdifferent tools etc
5 Presenting scientific results both written and orally
Evangelos Pournaras Izabela Moise 19
Course Prerequisites
Some programming skills are required eg skills for the material ofthis course
1 JavaC++Python
2 UNIX
Didnrsquot you have an opportunity to practice this earlier
No problem this is a golden opportunity
TipProgramming skills will make you more flexible and efficient datascientist
Evangelos Pournaras Izabela Moise 20
Assessment
bull Seminar thesis
bull 100 of the grade no exams
bull Detailed illustration in a next lecture
TipStart early Give the opportunity for your project and your skills todevelop during the course
Evangelos Pournaras Izabela Moise 21
Lectures
bull Every Monday 1715-1900 at LFW B 1
bull Participation is not obligatory but highly recommended
bull 60 minutes lectures followed by 40 minutes interactivediscussions
bull Opportunity to discuss your projectbull Lectures at
httpwwwcossethzcheducationdatasciencehtml
Evangelos Pournaras Izabela Moise 22
Subjects I
1 Computational Social Science Applicationsndash Smart Grids geolocation traffic systems social sensingmining
privacyndash Tools amp platforms Nervousnet Twitter GDELT
2 Data Science Fundamentalsndash databases data types data collection data pre-processing
plotting visualization etcndash Tools Java AWK MySQL Gnuplot Gephi etc
3 Data Mining and Machine Learningndash classification clustering prediction neural networks etcndash Tools Weka
4 Big Data Analytics
Evangelos Pournaras Izabela Moise 23
Subjects II
ndash MapReduce parallel computing data streaming social mediaetc
ndash Tools Hadoop Spark Mahout Spark Streaming Storm etc
5 Otherndash Project presentations
Evangelos Pournaras Izabela Moise 24
Lectures OutlineLecture 01 (200216)Introduction amp Coursework OutlineLecture 02 (270216)Data Mining Machine Learning ampApplicationsLecture 03 (060316)Data Science Techniques ampApplicationsLecture 04 (130316)Data Mining Machine Learning ampApplicationsLecture 05 (200316)Data Science Techniques ampApplicationsLecture 06 (270316)Big Data Analytics amp Applications
Lecture 07 (030416)Big Data Analytics amp ApplicationsLecture 08 (100416)Data Science Techniques ampApplicationsLecture 09 (080516)Big Data Analytics amp ApplicationsLecture 10 (150516)Data Science Techniques ampApplicationsLecture 11 (220516)Oral PresentationsLecture 12 (290516)Oral Presentations
Evangelos Pournaras Izabela Moise 25
How to contact us
Communication
bull Discussion session in the course
bull E-mail with subject[DATA-SCIENCE-COURSE-2016]ltotherinfogtto
ndash Iza Moise imoiseethzchndash Evangelos Pournaras epournarasethzch
Supervision - strictly for issues not addressed in the course
bull Mondays 1500-1700Clausiusstrasse 50 (CLU C 4) 8092 Zurich
Evangelos Pournaras Izabela Moise 26
Proposed Literature
B Ellis
Real-Time Analytics Techniques to Analyze and Visualize Streaming Data
Wiley Publishing 1st edition 2014
J Han
Data Mining Concepts and Techniques
Morgan Kaufmann Publishers Inc San Francisco CA USA 2005
T White
Hadoop The Definitive Guide
OrsquoReilly Media Inc 2015
I H Witten E Frank and M A Hall
Data Mining Practical Machine Learning Tools and Techniques
Morgan Kaufmann Publishers Inc San Francisco CA USA 3rd edition2011
Evangelos Pournaras Izabela Moise 27
What is next
bull Seminar thesis
bull Examples and applications
Evangelos Pournaras Izabela Moise 28
Who is a Data Scientist
bull A statistician
bull A computer programmer
bull Both and More
TipDomain knowledge can be more valuable than machine learning datamining etc
Evangelos Pournaras Izabela Moise 11
Real-world Profile I
Evangelos Pournaras Izabela Moise 12
Real-world Profile II
Evangelos Pournaras Izabela Moise 13
More about Data Scientists
httpshbrorg201210data-scientist-the-sexiest-job-of-the-21st-century
Evangelos Pournaras Izabela Moise 14
More about Data Scientists
Data scientistsrsquo most basic universal skill is the ability to writecode This may be less true in five yearsrsquo time when many morepeople will have the title data scientist on their business cardsMore enduring will be the need for data scientists to communicate inlanguage that all their stakeholders understand-and to demonstratethe special skills involved in storytelling with data whetherverbally visually or - ideally both
Evangelos Pournaras Izabela Moise 15
More about Data Scientists
But we would say the dominant trait among data scientists is anintense curiosity-a desire to go beneath the surface of a problemfind the questions at its heart and distill them into a very clearset of hypotheses that can be tested
Evangelos Pournaras Izabela Moise 16
More about Data Scientists
A quantitative analyst can be great at analyzing data but not atsubduing a mass of unstructured data and getting it into a form inwhich it can be analyzed
A data management expert might be great at generating andorganizing data in structured form but not at turning unstructureddata into structured data-and also not at actually analyzing thedata
And while people without strong social skills might thrive intraditional data professions data scientists must have suchskills to be effective
Evangelos Pournaras Izabela Moise 17
Part 2 - Course Description
Evangelos Pournaras Izabela Moise 18
Course ObjectivesQualify you with knowledge amp skills to tackle real-world problemsusing data
1 Acquiring domain knowledge and understanding2 Better understanding and interpretation of data
3 Awareness about the applicability of different data sciencemethods
4 Development of technical skills eg programming use ofdifferent tools etc
5 Presenting scientific results both written and orally
Evangelos Pournaras Izabela Moise 19
Course Prerequisites
Some programming skills are required eg skills for the material ofthis course
1 JavaC++Python
2 UNIX
Didnrsquot you have an opportunity to practice this earlier
No problem this is a golden opportunity
TipProgramming skills will make you more flexible and efficient datascientist
Evangelos Pournaras Izabela Moise 20
Assessment
bull Seminar thesis
bull 100 of the grade no exams
bull Detailed illustration in a next lecture
TipStart early Give the opportunity for your project and your skills todevelop during the course
Evangelos Pournaras Izabela Moise 21
Lectures
bull Every Monday 1715-1900 at LFW B 1
bull Participation is not obligatory but highly recommended
bull 60 minutes lectures followed by 40 minutes interactivediscussions
bull Opportunity to discuss your projectbull Lectures at
httpwwwcossethzcheducationdatasciencehtml
Evangelos Pournaras Izabela Moise 22
Subjects I
1 Computational Social Science Applicationsndash Smart Grids geolocation traffic systems social sensingmining
privacyndash Tools amp platforms Nervousnet Twitter GDELT
2 Data Science Fundamentalsndash databases data types data collection data pre-processing
plotting visualization etcndash Tools Java AWK MySQL Gnuplot Gephi etc
3 Data Mining and Machine Learningndash classification clustering prediction neural networks etcndash Tools Weka
4 Big Data Analytics
Evangelos Pournaras Izabela Moise 23
Subjects II
ndash MapReduce parallel computing data streaming social mediaetc
ndash Tools Hadoop Spark Mahout Spark Streaming Storm etc
5 Otherndash Project presentations
Evangelos Pournaras Izabela Moise 24
Lectures OutlineLecture 01 (200216)Introduction amp Coursework OutlineLecture 02 (270216)Data Mining Machine Learning ampApplicationsLecture 03 (060316)Data Science Techniques ampApplicationsLecture 04 (130316)Data Mining Machine Learning ampApplicationsLecture 05 (200316)Data Science Techniques ampApplicationsLecture 06 (270316)Big Data Analytics amp Applications
Lecture 07 (030416)Big Data Analytics amp ApplicationsLecture 08 (100416)Data Science Techniques ampApplicationsLecture 09 (080516)Big Data Analytics amp ApplicationsLecture 10 (150516)Data Science Techniques ampApplicationsLecture 11 (220516)Oral PresentationsLecture 12 (290516)Oral Presentations
Evangelos Pournaras Izabela Moise 25
How to contact us
Communication
bull Discussion session in the course
bull E-mail with subject[DATA-SCIENCE-COURSE-2016]ltotherinfogtto
ndash Iza Moise imoiseethzchndash Evangelos Pournaras epournarasethzch
Supervision - strictly for issues not addressed in the course
bull Mondays 1500-1700Clausiusstrasse 50 (CLU C 4) 8092 Zurich
Evangelos Pournaras Izabela Moise 26
Proposed Literature
B Ellis
Real-Time Analytics Techniques to Analyze and Visualize Streaming Data
Wiley Publishing 1st edition 2014
J Han
Data Mining Concepts and Techniques
Morgan Kaufmann Publishers Inc San Francisco CA USA 2005
T White
Hadoop The Definitive Guide
OrsquoReilly Media Inc 2015
I H Witten E Frank and M A Hall
Data Mining Practical Machine Learning Tools and Techniques
Morgan Kaufmann Publishers Inc San Francisco CA USA 3rd edition2011
Evangelos Pournaras Izabela Moise 27
What is next
bull Seminar thesis
bull Examples and applications
Evangelos Pournaras Izabela Moise 28
Real-world Profile I
Evangelos Pournaras Izabela Moise 12
Real-world Profile II
Evangelos Pournaras Izabela Moise 13
More about Data Scientists
httpshbrorg201210data-scientist-the-sexiest-job-of-the-21st-century
Evangelos Pournaras Izabela Moise 14
More about Data Scientists
Data scientistsrsquo most basic universal skill is the ability to writecode This may be less true in five yearsrsquo time when many morepeople will have the title data scientist on their business cardsMore enduring will be the need for data scientists to communicate inlanguage that all their stakeholders understand-and to demonstratethe special skills involved in storytelling with data whetherverbally visually or - ideally both
Evangelos Pournaras Izabela Moise 15
More about Data Scientists
But we would say the dominant trait among data scientists is anintense curiosity-a desire to go beneath the surface of a problemfind the questions at its heart and distill them into a very clearset of hypotheses that can be tested
Evangelos Pournaras Izabela Moise 16
More about Data Scientists
A quantitative analyst can be great at analyzing data but not atsubduing a mass of unstructured data and getting it into a form inwhich it can be analyzed
A data management expert might be great at generating andorganizing data in structured form but not at turning unstructureddata into structured data-and also not at actually analyzing thedata
And while people without strong social skills might thrive intraditional data professions data scientists must have suchskills to be effective
Evangelos Pournaras Izabela Moise 17
Part 2 - Course Description
Evangelos Pournaras Izabela Moise 18
Course ObjectivesQualify you with knowledge amp skills to tackle real-world problemsusing data
1 Acquiring domain knowledge and understanding2 Better understanding and interpretation of data
3 Awareness about the applicability of different data sciencemethods
4 Development of technical skills eg programming use ofdifferent tools etc
5 Presenting scientific results both written and orally
Evangelos Pournaras Izabela Moise 19
Course Prerequisites
Some programming skills are required eg skills for the material ofthis course
1 JavaC++Python
2 UNIX
Didnrsquot you have an opportunity to practice this earlier
No problem this is a golden opportunity
TipProgramming skills will make you more flexible and efficient datascientist
Evangelos Pournaras Izabela Moise 20
Assessment
bull Seminar thesis
bull 100 of the grade no exams
bull Detailed illustration in a next lecture
TipStart early Give the opportunity for your project and your skills todevelop during the course
Evangelos Pournaras Izabela Moise 21
Lectures
bull Every Monday 1715-1900 at LFW B 1
bull Participation is not obligatory but highly recommended
bull 60 minutes lectures followed by 40 minutes interactivediscussions
bull Opportunity to discuss your projectbull Lectures at
httpwwwcossethzcheducationdatasciencehtml
Evangelos Pournaras Izabela Moise 22
Subjects I
1 Computational Social Science Applicationsndash Smart Grids geolocation traffic systems social sensingmining
privacyndash Tools amp platforms Nervousnet Twitter GDELT
2 Data Science Fundamentalsndash databases data types data collection data pre-processing
plotting visualization etcndash Tools Java AWK MySQL Gnuplot Gephi etc
3 Data Mining and Machine Learningndash classification clustering prediction neural networks etcndash Tools Weka
4 Big Data Analytics
Evangelos Pournaras Izabela Moise 23
Subjects II
ndash MapReduce parallel computing data streaming social mediaetc
ndash Tools Hadoop Spark Mahout Spark Streaming Storm etc
5 Otherndash Project presentations
Evangelos Pournaras Izabela Moise 24
Lectures OutlineLecture 01 (200216)Introduction amp Coursework OutlineLecture 02 (270216)Data Mining Machine Learning ampApplicationsLecture 03 (060316)Data Science Techniques ampApplicationsLecture 04 (130316)Data Mining Machine Learning ampApplicationsLecture 05 (200316)Data Science Techniques ampApplicationsLecture 06 (270316)Big Data Analytics amp Applications
Lecture 07 (030416)Big Data Analytics amp ApplicationsLecture 08 (100416)Data Science Techniques ampApplicationsLecture 09 (080516)Big Data Analytics amp ApplicationsLecture 10 (150516)Data Science Techniques ampApplicationsLecture 11 (220516)Oral PresentationsLecture 12 (290516)Oral Presentations
Evangelos Pournaras Izabela Moise 25
How to contact us
Communication
bull Discussion session in the course
bull E-mail with subject[DATA-SCIENCE-COURSE-2016]ltotherinfogtto
ndash Iza Moise imoiseethzchndash Evangelos Pournaras epournarasethzch
Supervision - strictly for issues not addressed in the course
bull Mondays 1500-1700Clausiusstrasse 50 (CLU C 4) 8092 Zurich
Evangelos Pournaras Izabela Moise 26
Proposed Literature
B Ellis
Real-Time Analytics Techniques to Analyze and Visualize Streaming Data
Wiley Publishing 1st edition 2014
J Han
Data Mining Concepts and Techniques
Morgan Kaufmann Publishers Inc San Francisco CA USA 2005
T White
Hadoop The Definitive Guide
OrsquoReilly Media Inc 2015
I H Witten E Frank and M A Hall
Data Mining Practical Machine Learning Tools and Techniques
Morgan Kaufmann Publishers Inc San Francisco CA USA 3rd edition2011
Evangelos Pournaras Izabela Moise 27
What is next
bull Seminar thesis
bull Examples and applications
Evangelos Pournaras Izabela Moise 28
Real-world Profile II
Evangelos Pournaras Izabela Moise 13
More about Data Scientists
httpshbrorg201210data-scientist-the-sexiest-job-of-the-21st-century
Evangelos Pournaras Izabela Moise 14
More about Data Scientists
Data scientistsrsquo most basic universal skill is the ability to writecode This may be less true in five yearsrsquo time when many morepeople will have the title data scientist on their business cardsMore enduring will be the need for data scientists to communicate inlanguage that all their stakeholders understand-and to demonstratethe special skills involved in storytelling with data whetherverbally visually or - ideally both
Evangelos Pournaras Izabela Moise 15
More about Data Scientists
But we would say the dominant trait among data scientists is anintense curiosity-a desire to go beneath the surface of a problemfind the questions at its heart and distill them into a very clearset of hypotheses that can be tested
Evangelos Pournaras Izabela Moise 16
More about Data Scientists
A quantitative analyst can be great at analyzing data but not atsubduing a mass of unstructured data and getting it into a form inwhich it can be analyzed
A data management expert might be great at generating andorganizing data in structured form but not at turning unstructureddata into structured data-and also not at actually analyzing thedata
And while people without strong social skills might thrive intraditional data professions data scientists must have suchskills to be effective
Evangelos Pournaras Izabela Moise 17
Part 2 - Course Description
Evangelos Pournaras Izabela Moise 18
Course ObjectivesQualify you with knowledge amp skills to tackle real-world problemsusing data
1 Acquiring domain knowledge and understanding2 Better understanding and interpretation of data
3 Awareness about the applicability of different data sciencemethods
4 Development of technical skills eg programming use ofdifferent tools etc
5 Presenting scientific results both written and orally
Evangelos Pournaras Izabela Moise 19
Course Prerequisites
Some programming skills are required eg skills for the material ofthis course
1 JavaC++Python
2 UNIX
Didnrsquot you have an opportunity to practice this earlier
No problem this is a golden opportunity
TipProgramming skills will make you more flexible and efficient datascientist
Evangelos Pournaras Izabela Moise 20
Assessment
bull Seminar thesis
bull 100 of the grade no exams
bull Detailed illustration in a next lecture
TipStart early Give the opportunity for your project and your skills todevelop during the course
Evangelos Pournaras Izabela Moise 21
Lectures
bull Every Monday 1715-1900 at LFW B 1
bull Participation is not obligatory but highly recommended
bull 60 minutes lectures followed by 40 minutes interactivediscussions
bull Opportunity to discuss your projectbull Lectures at
httpwwwcossethzcheducationdatasciencehtml
Evangelos Pournaras Izabela Moise 22
Subjects I
1 Computational Social Science Applicationsndash Smart Grids geolocation traffic systems social sensingmining
privacyndash Tools amp platforms Nervousnet Twitter GDELT
2 Data Science Fundamentalsndash databases data types data collection data pre-processing
plotting visualization etcndash Tools Java AWK MySQL Gnuplot Gephi etc
3 Data Mining and Machine Learningndash classification clustering prediction neural networks etcndash Tools Weka
4 Big Data Analytics
Evangelos Pournaras Izabela Moise 23
Subjects II
ndash MapReduce parallel computing data streaming social mediaetc
ndash Tools Hadoop Spark Mahout Spark Streaming Storm etc
5 Otherndash Project presentations
Evangelos Pournaras Izabela Moise 24
Lectures OutlineLecture 01 (200216)Introduction amp Coursework OutlineLecture 02 (270216)Data Mining Machine Learning ampApplicationsLecture 03 (060316)Data Science Techniques ampApplicationsLecture 04 (130316)Data Mining Machine Learning ampApplicationsLecture 05 (200316)Data Science Techniques ampApplicationsLecture 06 (270316)Big Data Analytics amp Applications
Lecture 07 (030416)Big Data Analytics amp ApplicationsLecture 08 (100416)Data Science Techniques ampApplicationsLecture 09 (080516)Big Data Analytics amp ApplicationsLecture 10 (150516)Data Science Techniques ampApplicationsLecture 11 (220516)Oral PresentationsLecture 12 (290516)Oral Presentations
Evangelos Pournaras Izabela Moise 25
How to contact us
Communication
bull Discussion session in the course
bull E-mail with subject[DATA-SCIENCE-COURSE-2016]ltotherinfogtto
ndash Iza Moise imoiseethzchndash Evangelos Pournaras epournarasethzch
Supervision - strictly for issues not addressed in the course
bull Mondays 1500-1700Clausiusstrasse 50 (CLU C 4) 8092 Zurich
Evangelos Pournaras Izabela Moise 26
Proposed Literature
B Ellis
Real-Time Analytics Techniques to Analyze and Visualize Streaming Data
Wiley Publishing 1st edition 2014
J Han
Data Mining Concepts and Techniques
Morgan Kaufmann Publishers Inc San Francisco CA USA 2005
T White
Hadoop The Definitive Guide
OrsquoReilly Media Inc 2015
I H Witten E Frank and M A Hall
Data Mining Practical Machine Learning Tools and Techniques
Morgan Kaufmann Publishers Inc San Francisco CA USA 3rd edition2011
Evangelos Pournaras Izabela Moise 27
What is next
bull Seminar thesis
bull Examples and applications
Evangelos Pournaras Izabela Moise 28
More about Data Scientists
httpshbrorg201210data-scientist-the-sexiest-job-of-the-21st-century
Evangelos Pournaras Izabela Moise 14
More about Data Scientists
Data scientistsrsquo most basic universal skill is the ability to writecode This may be less true in five yearsrsquo time when many morepeople will have the title data scientist on their business cardsMore enduring will be the need for data scientists to communicate inlanguage that all their stakeholders understand-and to demonstratethe special skills involved in storytelling with data whetherverbally visually or - ideally both
Evangelos Pournaras Izabela Moise 15
More about Data Scientists
But we would say the dominant trait among data scientists is anintense curiosity-a desire to go beneath the surface of a problemfind the questions at its heart and distill them into a very clearset of hypotheses that can be tested
Evangelos Pournaras Izabela Moise 16
More about Data Scientists
A quantitative analyst can be great at analyzing data but not atsubduing a mass of unstructured data and getting it into a form inwhich it can be analyzed
A data management expert might be great at generating andorganizing data in structured form but not at turning unstructureddata into structured data-and also not at actually analyzing thedata
And while people without strong social skills might thrive intraditional data professions data scientists must have suchskills to be effective
Evangelos Pournaras Izabela Moise 17
Part 2 - Course Description
Evangelos Pournaras Izabela Moise 18
Course ObjectivesQualify you with knowledge amp skills to tackle real-world problemsusing data
1 Acquiring domain knowledge and understanding2 Better understanding and interpretation of data
3 Awareness about the applicability of different data sciencemethods
4 Development of technical skills eg programming use ofdifferent tools etc
5 Presenting scientific results both written and orally
Evangelos Pournaras Izabela Moise 19
Course Prerequisites
Some programming skills are required eg skills for the material ofthis course
1 JavaC++Python
2 UNIX
Didnrsquot you have an opportunity to practice this earlier
No problem this is a golden opportunity
TipProgramming skills will make you more flexible and efficient datascientist
Evangelos Pournaras Izabela Moise 20
Assessment
bull Seminar thesis
bull 100 of the grade no exams
bull Detailed illustration in a next lecture
TipStart early Give the opportunity for your project and your skills todevelop during the course
Evangelos Pournaras Izabela Moise 21
Lectures
bull Every Monday 1715-1900 at LFW B 1
bull Participation is not obligatory but highly recommended
bull 60 minutes lectures followed by 40 minutes interactivediscussions
bull Opportunity to discuss your projectbull Lectures at
httpwwwcossethzcheducationdatasciencehtml
Evangelos Pournaras Izabela Moise 22
Subjects I
1 Computational Social Science Applicationsndash Smart Grids geolocation traffic systems social sensingmining
privacyndash Tools amp platforms Nervousnet Twitter GDELT
2 Data Science Fundamentalsndash databases data types data collection data pre-processing
plotting visualization etcndash Tools Java AWK MySQL Gnuplot Gephi etc
3 Data Mining and Machine Learningndash classification clustering prediction neural networks etcndash Tools Weka
4 Big Data Analytics
Evangelos Pournaras Izabela Moise 23
Subjects II
ndash MapReduce parallel computing data streaming social mediaetc
ndash Tools Hadoop Spark Mahout Spark Streaming Storm etc
5 Otherndash Project presentations
Evangelos Pournaras Izabela Moise 24
Lectures OutlineLecture 01 (200216)Introduction amp Coursework OutlineLecture 02 (270216)Data Mining Machine Learning ampApplicationsLecture 03 (060316)Data Science Techniques ampApplicationsLecture 04 (130316)Data Mining Machine Learning ampApplicationsLecture 05 (200316)Data Science Techniques ampApplicationsLecture 06 (270316)Big Data Analytics amp Applications
Lecture 07 (030416)Big Data Analytics amp ApplicationsLecture 08 (100416)Data Science Techniques ampApplicationsLecture 09 (080516)Big Data Analytics amp ApplicationsLecture 10 (150516)Data Science Techniques ampApplicationsLecture 11 (220516)Oral PresentationsLecture 12 (290516)Oral Presentations
Evangelos Pournaras Izabela Moise 25
How to contact us
Communication
bull Discussion session in the course
bull E-mail with subject[DATA-SCIENCE-COURSE-2016]ltotherinfogtto
ndash Iza Moise imoiseethzchndash Evangelos Pournaras epournarasethzch
Supervision - strictly for issues not addressed in the course
bull Mondays 1500-1700Clausiusstrasse 50 (CLU C 4) 8092 Zurich
Evangelos Pournaras Izabela Moise 26
Proposed Literature
B Ellis
Real-Time Analytics Techniques to Analyze and Visualize Streaming Data
Wiley Publishing 1st edition 2014
J Han
Data Mining Concepts and Techniques
Morgan Kaufmann Publishers Inc San Francisco CA USA 2005
T White
Hadoop The Definitive Guide
OrsquoReilly Media Inc 2015
I H Witten E Frank and M A Hall
Data Mining Practical Machine Learning Tools and Techniques
Morgan Kaufmann Publishers Inc San Francisco CA USA 3rd edition2011
Evangelos Pournaras Izabela Moise 27
What is next
bull Seminar thesis
bull Examples and applications
Evangelos Pournaras Izabela Moise 28
More about Data Scientists
Data scientistsrsquo most basic universal skill is the ability to writecode This may be less true in five yearsrsquo time when many morepeople will have the title data scientist on their business cardsMore enduring will be the need for data scientists to communicate inlanguage that all their stakeholders understand-and to demonstratethe special skills involved in storytelling with data whetherverbally visually or - ideally both
Evangelos Pournaras Izabela Moise 15
More about Data Scientists
But we would say the dominant trait among data scientists is anintense curiosity-a desire to go beneath the surface of a problemfind the questions at its heart and distill them into a very clearset of hypotheses that can be tested
Evangelos Pournaras Izabela Moise 16
More about Data Scientists
A quantitative analyst can be great at analyzing data but not atsubduing a mass of unstructured data and getting it into a form inwhich it can be analyzed
A data management expert might be great at generating andorganizing data in structured form but not at turning unstructureddata into structured data-and also not at actually analyzing thedata
And while people without strong social skills might thrive intraditional data professions data scientists must have suchskills to be effective
Evangelos Pournaras Izabela Moise 17
Part 2 - Course Description
Evangelos Pournaras Izabela Moise 18
Course ObjectivesQualify you with knowledge amp skills to tackle real-world problemsusing data
1 Acquiring domain knowledge and understanding2 Better understanding and interpretation of data
3 Awareness about the applicability of different data sciencemethods
4 Development of technical skills eg programming use ofdifferent tools etc
5 Presenting scientific results both written and orally
Evangelos Pournaras Izabela Moise 19
Course Prerequisites
Some programming skills are required eg skills for the material ofthis course
1 JavaC++Python
2 UNIX
Didnrsquot you have an opportunity to practice this earlier
No problem this is a golden opportunity
TipProgramming skills will make you more flexible and efficient datascientist
Evangelos Pournaras Izabela Moise 20
Assessment
bull Seminar thesis
bull 100 of the grade no exams
bull Detailed illustration in a next lecture
TipStart early Give the opportunity for your project and your skills todevelop during the course
Evangelos Pournaras Izabela Moise 21
Lectures
bull Every Monday 1715-1900 at LFW B 1
bull Participation is not obligatory but highly recommended
bull 60 minutes lectures followed by 40 minutes interactivediscussions
bull Opportunity to discuss your projectbull Lectures at
httpwwwcossethzcheducationdatasciencehtml
Evangelos Pournaras Izabela Moise 22
Subjects I
1 Computational Social Science Applicationsndash Smart Grids geolocation traffic systems social sensingmining
privacyndash Tools amp platforms Nervousnet Twitter GDELT
2 Data Science Fundamentalsndash databases data types data collection data pre-processing
plotting visualization etcndash Tools Java AWK MySQL Gnuplot Gephi etc
3 Data Mining and Machine Learningndash classification clustering prediction neural networks etcndash Tools Weka
4 Big Data Analytics
Evangelos Pournaras Izabela Moise 23
Subjects II
ndash MapReduce parallel computing data streaming social mediaetc
ndash Tools Hadoop Spark Mahout Spark Streaming Storm etc
5 Otherndash Project presentations
Evangelos Pournaras Izabela Moise 24
Lectures OutlineLecture 01 (200216)Introduction amp Coursework OutlineLecture 02 (270216)Data Mining Machine Learning ampApplicationsLecture 03 (060316)Data Science Techniques ampApplicationsLecture 04 (130316)Data Mining Machine Learning ampApplicationsLecture 05 (200316)Data Science Techniques ampApplicationsLecture 06 (270316)Big Data Analytics amp Applications
Lecture 07 (030416)Big Data Analytics amp ApplicationsLecture 08 (100416)Data Science Techniques ampApplicationsLecture 09 (080516)Big Data Analytics amp ApplicationsLecture 10 (150516)Data Science Techniques ampApplicationsLecture 11 (220516)Oral PresentationsLecture 12 (290516)Oral Presentations
Evangelos Pournaras Izabela Moise 25
How to contact us
Communication
bull Discussion session in the course
bull E-mail with subject[DATA-SCIENCE-COURSE-2016]ltotherinfogtto
ndash Iza Moise imoiseethzchndash Evangelos Pournaras epournarasethzch
Supervision - strictly for issues not addressed in the course
bull Mondays 1500-1700Clausiusstrasse 50 (CLU C 4) 8092 Zurich
Evangelos Pournaras Izabela Moise 26
Proposed Literature
B Ellis
Real-Time Analytics Techniques to Analyze and Visualize Streaming Data
Wiley Publishing 1st edition 2014
J Han
Data Mining Concepts and Techniques
Morgan Kaufmann Publishers Inc San Francisco CA USA 2005
T White
Hadoop The Definitive Guide
OrsquoReilly Media Inc 2015
I H Witten E Frank and M A Hall
Data Mining Practical Machine Learning Tools and Techniques
Morgan Kaufmann Publishers Inc San Francisco CA USA 3rd edition2011
Evangelos Pournaras Izabela Moise 27
What is next
bull Seminar thesis
bull Examples and applications
Evangelos Pournaras Izabela Moise 28
More about Data Scientists
But we would say the dominant trait among data scientists is anintense curiosity-a desire to go beneath the surface of a problemfind the questions at its heart and distill them into a very clearset of hypotheses that can be tested
Evangelos Pournaras Izabela Moise 16
More about Data Scientists
A quantitative analyst can be great at analyzing data but not atsubduing a mass of unstructured data and getting it into a form inwhich it can be analyzed
A data management expert might be great at generating andorganizing data in structured form but not at turning unstructureddata into structured data-and also not at actually analyzing thedata
And while people without strong social skills might thrive intraditional data professions data scientists must have suchskills to be effective
Evangelos Pournaras Izabela Moise 17
Part 2 - Course Description
Evangelos Pournaras Izabela Moise 18
Course ObjectivesQualify you with knowledge amp skills to tackle real-world problemsusing data
1 Acquiring domain knowledge and understanding2 Better understanding and interpretation of data
3 Awareness about the applicability of different data sciencemethods
4 Development of technical skills eg programming use ofdifferent tools etc
5 Presenting scientific results both written and orally
Evangelos Pournaras Izabela Moise 19
Course Prerequisites
Some programming skills are required eg skills for the material ofthis course
1 JavaC++Python
2 UNIX
Didnrsquot you have an opportunity to practice this earlier
No problem this is a golden opportunity
TipProgramming skills will make you more flexible and efficient datascientist
Evangelos Pournaras Izabela Moise 20
Assessment
bull Seminar thesis
bull 100 of the grade no exams
bull Detailed illustration in a next lecture
TipStart early Give the opportunity for your project and your skills todevelop during the course
Evangelos Pournaras Izabela Moise 21
Lectures
bull Every Monday 1715-1900 at LFW B 1
bull Participation is not obligatory but highly recommended
bull 60 minutes lectures followed by 40 minutes interactivediscussions
bull Opportunity to discuss your projectbull Lectures at
httpwwwcossethzcheducationdatasciencehtml
Evangelos Pournaras Izabela Moise 22
Subjects I
1 Computational Social Science Applicationsndash Smart Grids geolocation traffic systems social sensingmining
privacyndash Tools amp platforms Nervousnet Twitter GDELT
2 Data Science Fundamentalsndash databases data types data collection data pre-processing
plotting visualization etcndash Tools Java AWK MySQL Gnuplot Gephi etc
3 Data Mining and Machine Learningndash classification clustering prediction neural networks etcndash Tools Weka
4 Big Data Analytics
Evangelos Pournaras Izabela Moise 23
Subjects II
ndash MapReduce parallel computing data streaming social mediaetc
ndash Tools Hadoop Spark Mahout Spark Streaming Storm etc
5 Otherndash Project presentations
Evangelos Pournaras Izabela Moise 24
Lectures OutlineLecture 01 (200216)Introduction amp Coursework OutlineLecture 02 (270216)Data Mining Machine Learning ampApplicationsLecture 03 (060316)Data Science Techniques ampApplicationsLecture 04 (130316)Data Mining Machine Learning ampApplicationsLecture 05 (200316)Data Science Techniques ampApplicationsLecture 06 (270316)Big Data Analytics amp Applications
Lecture 07 (030416)Big Data Analytics amp ApplicationsLecture 08 (100416)Data Science Techniques ampApplicationsLecture 09 (080516)Big Data Analytics amp ApplicationsLecture 10 (150516)Data Science Techniques ampApplicationsLecture 11 (220516)Oral PresentationsLecture 12 (290516)Oral Presentations
Evangelos Pournaras Izabela Moise 25
How to contact us
Communication
bull Discussion session in the course
bull E-mail with subject[DATA-SCIENCE-COURSE-2016]ltotherinfogtto
ndash Iza Moise imoiseethzchndash Evangelos Pournaras epournarasethzch
Supervision - strictly for issues not addressed in the course
bull Mondays 1500-1700Clausiusstrasse 50 (CLU C 4) 8092 Zurich
Evangelos Pournaras Izabela Moise 26
Proposed Literature
B Ellis
Real-Time Analytics Techniques to Analyze and Visualize Streaming Data
Wiley Publishing 1st edition 2014
J Han
Data Mining Concepts and Techniques
Morgan Kaufmann Publishers Inc San Francisco CA USA 2005
T White
Hadoop The Definitive Guide
OrsquoReilly Media Inc 2015
I H Witten E Frank and M A Hall
Data Mining Practical Machine Learning Tools and Techniques
Morgan Kaufmann Publishers Inc San Francisco CA USA 3rd edition2011
Evangelos Pournaras Izabela Moise 27
What is next
bull Seminar thesis
bull Examples and applications
Evangelos Pournaras Izabela Moise 28
More about Data Scientists
A quantitative analyst can be great at analyzing data but not atsubduing a mass of unstructured data and getting it into a form inwhich it can be analyzed
A data management expert might be great at generating andorganizing data in structured form but not at turning unstructureddata into structured data-and also not at actually analyzing thedata
And while people without strong social skills might thrive intraditional data professions data scientists must have suchskills to be effective
Evangelos Pournaras Izabela Moise 17
Part 2 - Course Description
Evangelos Pournaras Izabela Moise 18
Course ObjectivesQualify you with knowledge amp skills to tackle real-world problemsusing data
1 Acquiring domain knowledge and understanding2 Better understanding and interpretation of data
3 Awareness about the applicability of different data sciencemethods
4 Development of technical skills eg programming use ofdifferent tools etc
5 Presenting scientific results both written and orally
Evangelos Pournaras Izabela Moise 19
Course Prerequisites
Some programming skills are required eg skills for the material ofthis course
1 JavaC++Python
2 UNIX
Didnrsquot you have an opportunity to practice this earlier
No problem this is a golden opportunity
TipProgramming skills will make you more flexible and efficient datascientist
Evangelos Pournaras Izabela Moise 20
Assessment
bull Seminar thesis
bull 100 of the grade no exams
bull Detailed illustration in a next lecture
TipStart early Give the opportunity for your project and your skills todevelop during the course
Evangelos Pournaras Izabela Moise 21
Lectures
bull Every Monday 1715-1900 at LFW B 1
bull Participation is not obligatory but highly recommended
bull 60 minutes lectures followed by 40 minutes interactivediscussions
bull Opportunity to discuss your projectbull Lectures at
httpwwwcossethzcheducationdatasciencehtml
Evangelos Pournaras Izabela Moise 22
Subjects I
1 Computational Social Science Applicationsndash Smart Grids geolocation traffic systems social sensingmining
privacyndash Tools amp platforms Nervousnet Twitter GDELT
2 Data Science Fundamentalsndash databases data types data collection data pre-processing
plotting visualization etcndash Tools Java AWK MySQL Gnuplot Gephi etc
3 Data Mining and Machine Learningndash classification clustering prediction neural networks etcndash Tools Weka
4 Big Data Analytics
Evangelos Pournaras Izabela Moise 23
Subjects II
ndash MapReduce parallel computing data streaming social mediaetc
ndash Tools Hadoop Spark Mahout Spark Streaming Storm etc
5 Otherndash Project presentations
Evangelos Pournaras Izabela Moise 24
Lectures OutlineLecture 01 (200216)Introduction amp Coursework OutlineLecture 02 (270216)Data Mining Machine Learning ampApplicationsLecture 03 (060316)Data Science Techniques ampApplicationsLecture 04 (130316)Data Mining Machine Learning ampApplicationsLecture 05 (200316)Data Science Techniques ampApplicationsLecture 06 (270316)Big Data Analytics amp Applications
Lecture 07 (030416)Big Data Analytics amp ApplicationsLecture 08 (100416)Data Science Techniques ampApplicationsLecture 09 (080516)Big Data Analytics amp ApplicationsLecture 10 (150516)Data Science Techniques ampApplicationsLecture 11 (220516)Oral PresentationsLecture 12 (290516)Oral Presentations
Evangelos Pournaras Izabela Moise 25
How to contact us
Communication
bull Discussion session in the course
bull E-mail with subject[DATA-SCIENCE-COURSE-2016]ltotherinfogtto
ndash Iza Moise imoiseethzchndash Evangelos Pournaras epournarasethzch
Supervision - strictly for issues not addressed in the course
bull Mondays 1500-1700Clausiusstrasse 50 (CLU C 4) 8092 Zurich
Evangelos Pournaras Izabela Moise 26
Proposed Literature
B Ellis
Real-Time Analytics Techniques to Analyze and Visualize Streaming Data
Wiley Publishing 1st edition 2014
J Han
Data Mining Concepts and Techniques
Morgan Kaufmann Publishers Inc San Francisco CA USA 2005
T White
Hadoop The Definitive Guide
OrsquoReilly Media Inc 2015
I H Witten E Frank and M A Hall
Data Mining Practical Machine Learning Tools and Techniques
Morgan Kaufmann Publishers Inc San Francisco CA USA 3rd edition2011
Evangelos Pournaras Izabela Moise 27
What is next
bull Seminar thesis
bull Examples and applications
Evangelos Pournaras Izabela Moise 28
Part 2 - Course Description
Evangelos Pournaras Izabela Moise 18
Course ObjectivesQualify you with knowledge amp skills to tackle real-world problemsusing data
1 Acquiring domain knowledge and understanding2 Better understanding and interpretation of data
3 Awareness about the applicability of different data sciencemethods
4 Development of technical skills eg programming use ofdifferent tools etc
5 Presenting scientific results both written and orally
Evangelos Pournaras Izabela Moise 19
Course Prerequisites
Some programming skills are required eg skills for the material ofthis course
1 JavaC++Python
2 UNIX
Didnrsquot you have an opportunity to practice this earlier
No problem this is a golden opportunity
TipProgramming skills will make you more flexible and efficient datascientist
Evangelos Pournaras Izabela Moise 20
Assessment
bull Seminar thesis
bull 100 of the grade no exams
bull Detailed illustration in a next lecture
TipStart early Give the opportunity for your project and your skills todevelop during the course
Evangelos Pournaras Izabela Moise 21
Lectures
bull Every Monday 1715-1900 at LFW B 1
bull Participation is not obligatory but highly recommended
bull 60 minutes lectures followed by 40 minutes interactivediscussions
bull Opportunity to discuss your projectbull Lectures at
httpwwwcossethzcheducationdatasciencehtml
Evangelos Pournaras Izabela Moise 22
Subjects I
1 Computational Social Science Applicationsndash Smart Grids geolocation traffic systems social sensingmining
privacyndash Tools amp platforms Nervousnet Twitter GDELT
2 Data Science Fundamentalsndash databases data types data collection data pre-processing
plotting visualization etcndash Tools Java AWK MySQL Gnuplot Gephi etc
3 Data Mining and Machine Learningndash classification clustering prediction neural networks etcndash Tools Weka
4 Big Data Analytics
Evangelos Pournaras Izabela Moise 23
Subjects II
ndash MapReduce parallel computing data streaming social mediaetc
ndash Tools Hadoop Spark Mahout Spark Streaming Storm etc
5 Otherndash Project presentations
Evangelos Pournaras Izabela Moise 24
Lectures OutlineLecture 01 (200216)Introduction amp Coursework OutlineLecture 02 (270216)Data Mining Machine Learning ampApplicationsLecture 03 (060316)Data Science Techniques ampApplicationsLecture 04 (130316)Data Mining Machine Learning ampApplicationsLecture 05 (200316)Data Science Techniques ampApplicationsLecture 06 (270316)Big Data Analytics amp Applications
Lecture 07 (030416)Big Data Analytics amp ApplicationsLecture 08 (100416)Data Science Techniques ampApplicationsLecture 09 (080516)Big Data Analytics amp ApplicationsLecture 10 (150516)Data Science Techniques ampApplicationsLecture 11 (220516)Oral PresentationsLecture 12 (290516)Oral Presentations
Evangelos Pournaras Izabela Moise 25
How to contact us
Communication
bull Discussion session in the course
bull E-mail with subject[DATA-SCIENCE-COURSE-2016]ltotherinfogtto
ndash Iza Moise imoiseethzchndash Evangelos Pournaras epournarasethzch
Supervision - strictly for issues not addressed in the course
bull Mondays 1500-1700Clausiusstrasse 50 (CLU C 4) 8092 Zurich
Evangelos Pournaras Izabela Moise 26
Proposed Literature
B Ellis
Real-Time Analytics Techniques to Analyze and Visualize Streaming Data
Wiley Publishing 1st edition 2014
J Han
Data Mining Concepts and Techniques
Morgan Kaufmann Publishers Inc San Francisco CA USA 2005
T White
Hadoop The Definitive Guide
OrsquoReilly Media Inc 2015
I H Witten E Frank and M A Hall
Data Mining Practical Machine Learning Tools and Techniques
Morgan Kaufmann Publishers Inc San Francisco CA USA 3rd edition2011
Evangelos Pournaras Izabela Moise 27
What is next
bull Seminar thesis
bull Examples and applications
Evangelos Pournaras Izabela Moise 28
Course ObjectivesQualify you with knowledge amp skills to tackle real-world problemsusing data
1 Acquiring domain knowledge and understanding2 Better understanding and interpretation of data
3 Awareness about the applicability of different data sciencemethods
4 Development of technical skills eg programming use ofdifferent tools etc
5 Presenting scientific results both written and orally
Evangelos Pournaras Izabela Moise 19
Course Prerequisites
Some programming skills are required eg skills for the material ofthis course
1 JavaC++Python
2 UNIX
Didnrsquot you have an opportunity to practice this earlier
No problem this is a golden opportunity
TipProgramming skills will make you more flexible and efficient datascientist
Evangelos Pournaras Izabela Moise 20
Assessment
bull Seminar thesis
bull 100 of the grade no exams
bull Detailed illustration in a next lecture
TipStart early Give the opportunity for your project and your skills todevelop during the course
Evangelos Pournaras Izabela Moise 21
Lectures
bull Every Monday 1715-1900 at LFW B 1
bull Participation is not obligatory but highly recommended
bull 60 minutes lectures followed by 40 minutes interactivediscussions
bull Opportunity to discuss your projectbull Lectures at
httpwwwcossethzcheducationdatasciencehtml
Evangelos Pournaras Izabela Moise 22
Subjects I
1 Computational Social Science Applicationsndash Smart Grids geolocation traffic systems social sensingmining
privacyndash Tools amp platforms Nervousnet Twitter GDELT
2 Data Science Fundamentalsndash databases data types data collection data pre-processing
plotting visualization etcndash Tools Java AWK MySQL Gnuplot Gephi etc
3 Data Mining and Machine Learningndash classification clustering prediction neural networks etcndash Tools Weka
4 Big Data Analytics
Evangelos Pournaras Izabela Moise 23
Subjects II
ndash MapReduce parallel computing data streaming social mediaetc
ndash Tools Hadoop Spark Mahout Spark Streaming Storm etc
5 Otherndash Project presentations
Evangelos Pournaras Izabela Moise 24
Lectures OutlineLecture 01 (200216)Introduction amp Coursework OutlineLecture 02 (270216)Data Mining Machine Learning ampApplicationsLecture 03 (060316)Data Science Techniques ampApplicationsLecture 04 (130316)Data Mining Machine Learning ampApplicationsLecture 05 (200316)Data Science Techniques ampApplicationsLecture 06 (270316)Big Data Analytics amp Applications
Lecture 07 (030416)Big Data Analytics amp ApplicationsLecture 08 (100416)Data Science Techniques ampApplicationsLecture 09 (080516)Big Data Analytics amp ApplicationsLecture 10 (150516)Data Science Techniques ampApplicationsLecture 11 (220516)Oral PresentationsLecture 12 (290516)Oral Presentations
Evangelos Pournaras Izabela Moise 25
How to contact us
Communication
bull Discussion session in the course
bull E-mail with subject[DATA-SCIENCE-COURSE-2016]ltotherinfogtto
ndash Iza Moise imoiseethzchndash Evangelos Pournaras epournarasethzch
Supervision - strictly for issues not addressed in the course
bull Mondays 1500-1700Clausiusstrasse 50 (CLU C 4) 8092 Zurich
Evangelos Pournaras Izabela Moise 26
Proposed Literature
B Ellis
Real-Time Analytics Techniques to Analyze and Visualize Streaming Data
Wiley Publishing 1st edition 2014
J Han
Data Mining Concepts and Techniques
Morgan Kaufmann Publishers Inc San Francisco CA USA 2005
T White
Hadoop The Definitive Guide
OrsquoReilly Media Inc 2015
I H Witten E Frank and M A Hall
Data Mining Practical Machine Learning Tools and Techniques
Morgan Kaufmann Publishers Inc San Francisco CA USA 3rd edition2011
Evangelos Pournaras Izabela Moise 27
What is next
bull Seminar thesis
bull Examples and applications
Evangelos Pournaras Izabela Moise 28
Course Prerequisites
Some programming skills are required eg skills for the material ofthis course
1 JavaC++Python
2 UNIX
Didnrsquot you have an opportunity to practice this earlier
No problem this is a golden opportunity
TipProgramming skills will make you more flexible and efficient datascientist
Evangelos Pournaras Izabela Moise 20
Assessment
bull Seminar thesis
bull 100 of the grade no exams
bull Detailed illustration in a next lecture
TipStart early Give the opportunity for your project and your skills todevelop during the course
Evangelos Pournaras Izabela Moise 21
Lectures
bull Every Monday 1715-1900 at LFW B 1
bull Participation is not obligatory but highly recommended
bull 60 minutes lectures followed by 40 minutes interactivediscussions
bull Opportunity to discuss your projectbull Lectures at
httpwwwcossethzcheducationdatasciencehtml
Evangelos Pournaras Izabela Moise 22
Subjects I
1 Computational Social Science Applicationsndash Smart Grids geolocation traffic systems social sensingmining
privacyndash Tools amp platforms Nervousnet Twitter GDELT
2 Data Science Fundamentalsndash databases data types data collection data pre-processing
plotting visualization etcndash Tools Java AWK MySQL Gnuplot Gephi etc
3 Data Mining and Machine Learningndash classification clustering prediction neural networks etcndash Tools Weka
4 Big Data Analytics
Evangelos Pournaras Izabela Moise 23
Subjects II
ndash MapReduce parallel computing data streaming social mediaetc
ndash Tools Hadoop Spark Mahout Spark Streaming Storm etc
5 Otherndash Project presentations
Evangelos Pournaras Izabela Moise 24
Lectures OutlineLecture 01 (200216)Introduction amp Coursework OutlineLecture 02 (270216)Data Mining Machine Learning ampApplicationsLecture 03 (060316)Data Science Techniques ampApplicationsLecture 04 (130316)Data Mining Machine Learning ampApplicationsLecture 05 (200316)Data Science Techniques ampApplicationsLecture 06 (270316)Big Data Analytics amp Applications
Lecture 07 (030416)Big Data Analytics amp ApplicationsLecture 08 (100416)Data Science Techniques ampApplicationsLecture 09 (080516)Big Data Analytics amp ApplicationsLecture 10 (150516)Data Science Techniques ampApplicationsLecture 11 (220516)Oral PresentationsLecture 12 (290516)Oral Presentations
Evangelos Pournaras Izabela Moise 25
How to contact us
Communication
bull Discussion session in the course
bull E-mail with subject[DATA-SCIENCE-COURSE-2016]ltotherinfogtto
ndash Iza Moise imoiseethzchndash Evangelos Pournaras epournarasethzch
Supervision - strictly for issues not addressed in the course
bull Mondays 1500-1700Clausiusstrasse 50 (CLU C 4) 8092 Zurich
Evangelos Pournaras Izabela Moise 26
Proposed Literature
B Ellis
Real-Time Analytics Techniques to Analyze and Visualize Streaming Data
Wiley Publishing 1st edition 2014
J Han
Data Mining Concepts and Techniques
Morgan Kaufmann Publishers Inc San Francisco CA USA 2005
T White
Hadoop The Definitive Guide
OrsquoReilly Media Inc 2015
I H Witten E Frank and M A Hall
Data Mining Practical Machine Learning Tools and Techniques
Morgan Kaufmann Publishers Inc San Francisco CA USA 3rd edition2011
Evangelos Pournaras Izabela Moise 27
What is next
bull Seminar thesis
bull Examples and applications
Evangelos Pournaras Izabela Moise 28
Assessment
bull Seminar thesis
bull 100 of the grade no exams
bull Detailed illustration in a next lecture
TipStart early Give the opportunity for your project and your skills todevelop during the course
Evangelos Pournaras Izabela Moise 21
Lectures
bull Every Monday 1715-1900 at LFW B 1
bull Participation is not obligatory but highly recommended
bull 60 minutes lectures followed by 40 minutes interactivediscussions
bull Opportunity to discuss your projectbull Lectures at
httpwwwcossethzcheducationdatasciencehtml
Evangelos Pournaras Izabela Moise 22
Subjects I
1 Computational Social Science Applicationsndash Smart Grids geolocation traffic systems social sensingmining
privacyndash Tools amp platforms Nervousnet Twitter GDELT
2 Data Science Fundamentalsndash databases data types data collection data pre-processing
plotting visualization etcndash Tools Java AWK MySQL Gnuplot Gephi etc
3 Data Mining and Machine Learningndash classification clustering prediction neural networks etcndash Tools Weka
4 Big Data Analytics
Evangelos Pournaras Izabela Moise 23
Subjects II
ndash MapReduce parallel computing data streaming social mediaetc
ndash Tools Hadoop Spark Mahout Spark Streaming Storm etc
5 Otherndash Project presentations
Evangelos Pournaras Izabela Moise 24
Lectures OutlineLecture 01 (200216)Introduction amp Coursework OutlineLecture 02 (270216)Data Mining Machine Learning ampApplicationsLecture 03 (060316)Data Science Techniques ampApplicationsLecture 04 (130316)Data Mining Machine Learning ampApplicationsLecture 05 (200316)Data Science Techniques ampApplicationsLecture 06 (270316)Big Data Analytics amp Applications
Lecture 07 (030416)Big Data Analytics amp ApplicationsLecture 08 (100416)Data Science Techniques ampApplicationsLecture 09 (080516)Big Data Analytics amp ApplicationsLecture 10 (150516)Data Science Techniques ampApplicationsLecture 11 (220516)Oral PresentationsLecture 12 (290516)Oral Presentations
Evangelos Pournaras Izabela Moise 25
How to contact us
Communication
bull Discussion session in the course
bull E-mail with subject[DATA-SCIENCE-COURSE-2016]ltotherinfogtto
ndash Iza Moise imoiseethzchndash Evangelos Pournaras epournarasethzch
Supervision - strictly for issues not addressed in the course
bull Mondays 1500-1700Clausiusstrasse 50 (CLU C 4) 8092 Zurich
Evangelos Pournaras Izabela Moise 26
Proposed Literature
B Ellis
Real-Time Analytics Techniques to Analyze and Visualize Streaming Data
Wiley Publishing 1st edition 2014
J Han
Data Mining Concepts and Techniques
Morgan Kaufmann Publishers Inc San Francisco CA USA 2005
T White
Hadoop The Definitive Guide
OrsquoReilly Media Inc 2015
I H Witten E Frank and M A Hall
Data Mining Practical Machine Learning Tools and Techniques
Morgan Kaufmann Publishers Inc San Francisco CA USA 3rd edition2011
Evangelos Pournaras Izabela Moise 27
What is next
bull Seminar thesis
bull Examples and applications
Evangelos Pournaras Izabela Moise 28
Lectures
bull Every Monday 1715-1900 at LFW B 1
bull Participation is not obligatory but highly recommended
bull 60 minutes lectures followed by 40 minutes interactivediscussions
bull Opportunity to discuss your projectbull Lectures at
httpwwwcossethzcheducationdatasciencehtml
Evangelos Pournaras Izabela Moise 22
Subjects I
1 Computational Social Science Applicationsndash Smart Grids geolocation traffic systems social sensingmining
privacyndash Tools amp platforms Nervousnet Twitter GDELT
2 Data Science Fundamentalsndash databases data types data collection data pre-processing
plotting visualization etcndash Tools Java AWK MySQL Gnuplot Gephi etc
3 Data Mining and Machine Learningndash classification clustering prediction neural networks etcndash Tools Weka
4 Big Data Analytics
Evangelos Pournaras Izabela Moise 23
Subjects II
ndash MapReduce parallel computing data streaming social mediaetc
ndash Tools Hadoop Spark Mahout Spark Streaming Storm etc
5 Otherndash Project presentations
Evangelos Pournaras Izabela Moise 24
Lectures OutlineLecture 01 (200216)Introduction amp Coursework OutlineLecture 02 (270216)Data Mining Machine Learning ampApplicationsLecture 03 (060316)Data Science Techniques ampApplicationsLecture 04 (130316)Data Mining Machine Learning ampApplicationsLecture 05 (200316)Data Science Techniques ampApplicationsLecture 06 (270316)Big Data Analytics amp Applications
Lecture 07 (030416)Big Data Analytics amp ApplicationsLecture 08 (100416)Data Science Techniques ampApplicationsLecture 09 (080516)Big Data Analytics amp ApplicationsLecture 10 (150516)Data Science Techniques ampApplicationsLecture 11 (220516)Oral PresentationsLecture 12 (290516)Oral Presentations
Evangelos Pournaras Izabela Moise 25
How to contact us
Communication
bull Discussion session in the course
bull E-mail with subject[DATA-SCIENCE-COURSE-2016]ltotherinfogtto
ndash Iza Moise imoiseethzchndash Evangelos Pournaras epournarasethzch
Supervision - strictly for issues not addressed in the course
bull Mondays 1500-1700Clausiusstrasse 50 (CLU C 4) 8092 Zurich
Evangelos Pournaras Izabela Moise 26
Proposed Literature
B Ellis
Real-Time Analytics Techniques to Analyze and Visualize Streaming Data
Wiley Publishing 1st edition 2014
J Han
Data Mining Concepts and Techniques
Morgan Kaufmann Publishers Inc San Francisco CA USA 2005
T White
Hadoop The Definitive Guide
OrsquoReilly Media Inc 2015
I H Witten E Frank and M A Hall
Data Mining Practical Machine Learning Tools and Techniques
Morgan Kaufmann Publishers Inc San Francisco CA USA 3rd edition2011
Evangelos Pournaras Izabela Moise 27
What is next
bull Seminar thesis
bull Examples and applications
Evangelos Pournaras Izabela Moise 28
Subjects I
1 Computational Social Science Applicationsndash Smart Grids geolocation traffic systems social sensingmining
privacyndash Tools amp platforms Nervousnet Twitter GDELT
2 Data Science Fundamentalsndash databases data types data collection data pre-processing
plotting visualization etcndash Tools Java AWK MySQL Gnuplot Gephi etc
3 Data Mining and Machine Learningndash classification clustering prediction neural networks etcndash Tools Weka
4 Big Data Analytics
Evangelos Pournaras Izabela Moise 23
Subjects II
ndash MapReduce parallel computing data streaming social mediaetc
ndash Tools Hadoop Spark Mahout Spark Streaming Storm etc
5 Otherndash Project presentations
Evangelos Pournaras Izabela Moise 24
Lectures OutlineLecture 01 (200216)Introduction amp Coursework OutlineLecture 02 (270216)Data Mining Machine Learning ampApplicationsLecture 03 (060316)Data Science Techniques ampApplicationsLecture 04 (130316)Data Mining Machine Learning ampApplicationsLecture 05 (200316)Data Science Techniques ampApplicationsLecture 06 (270316)Big Data Analytics amp Applications
Lecture 07 (030416)Big Data Analytics amp ApplicationsLecture 08 (100416)Data Science Techniques ampApplicationsLecture 09 (080516)Big Data Analytics amp ApplicationsLecture 10 (150516)Data Science Techniques ampApplicationsLecture 11 (220516)Oral PresentationsLecture 12 (290516)Oral Presentations
Evangelos Pournaras Izabela Moise 25
How to contact us
Communication
bull Discussion session in the course
bull E-mail with subject[DATA-SCIENCE-COURSE-2016]ltotherinfogtto
ndash Iza Moise imoiseethzchndash Evangelos Pournaras epournarasethzch
Supervision - strictly for issues not addressed in the course
bull Mondays 1500-1700Clausiusstrasse 50 (CLU C 4) 8092 Zurich
Evangelos Pournaras Izabela Moise 26
Proposed Literature
B Ellis
Real-Time Analytics Techniques to Analyze and Visualize Streaming Data
Wiley Publishing 1st edition 2014
J Han
Data Mining Concepts and Techniques
Morgan Kaufmann Publishers Inc San Francisco CA USA 2005
T White
Hadoop The Definitive Guide
OrsquoReilly Media Inc 2015
I H Witten E Frank and M A Hall
Data Mining Practical Machine Learning Tools and Techniques
Morgan Kaufmann Publishers Inc San Francisco CA USA 3rd edition2011
Evangelos Pournaras Izabela Moise 27
What is next
bull Seminar thesis
bull Examples and applications
Evangelos Pournaras Izabela Moise 28
Subjects II
ndash MapReduce parallel computing data streaming social mediaetc
ndash Tools Hadoop Spark Mahout Spark Streaming Storm etc
5 Otherndash Project presentations
Evangelos Pournaras Izabela Moise 24
Lectures OutlineLecture 01 (200216)Introduction amp Coursework OutlineLecture 02 (270216)Data Mining Machine Learning ampApplicationsLecture 03 (060316)Data Science Techniques ampApplicationsLecture 04 (130316)Data Mining Machine Learning ampApplicationsLecture 05 (200316)Data Science Techniques ampApplicationsLecture 06 (270316)Big Data Analytics amp Applications
Lecture 07 (030416)Big Data Analytics amp ApplicationsLecture 08 (100416)Data Science Techniques ampApplicationsLecture 09 (080516)Big Data Analytics amp ApplicationsLecture 10 (150516)Data Science Techniques ampApplicationsLecture 11 (220516)Oral PresentationsLecture 12 (290516)Oral Presentations
Evangelos Pournaras Izabela Moise 25
How to contact us
Communication
bull Discussion session in the course
bull E-mail with subject[DATA-SCIENCE-COURSE-2016]ltotherinfogtto
ndash Iza Moise imoiseethzchndash Evangelos Pournaras epournarasethzch
Supervision - strictly for issues not addressed in the course
bull Mondays 1500-1700Clausiusstrasse 50 (CLU C 4) 8092 Zurich
Evangelos Pournaras Izabela Moise 26
Proposed Literature
B Ellis
Real-Time Analytics Techniques to Analyze and Visualize Streaming Data
Wiley Publishing 1st edition 2014
J Han
Data Mining Concepts and Techniques
Morgan Kaufmann Publishers Inc San Francisco CA USA 2005
T White
Hadoop The Definitive Guide
OrsquoReilly Media Inc 2015
I H Witten E Frank and M A Hall
Data Mining Practical Machine Learning Tools and Techniques
Morgan Kaufmann Publishers Inc San Francisco CA USA 3rd edition2011
Evangelos Pournaras Izabela Moise 27
What is next
bull Seminar thesis
bull Examples and applications
Evangelos Pournaras Izabela Moise 28
Lectures OutlineLecture 01 (200216)Introduction amp Coursework OutlineLecture 02 (270216)Data Mining Machine Learning ampApplicationsLecture 03 (060316)Data Science Techniques ampApplicationsLecture 04 (130316)Data Mining Machine Learning ampApplicationsLecture 05 (200316)Data Science Techniques ampApplicationsLecture 06 (270316)Big Data Analytics amp Applications
Lecture 07 (030416)Big Data Analytics amp ApplicationsLecture 08 (100416)Data Science Techniques ampApplicationsLecture 09 (080516)Big Data Analytics amp ApplicationsLecture 10 (150516)Data Science Techniques ampApplicationsLecture 11 (220516)Oral PresentationsLecture 12 (290516)Oral Presentations
Evangelos Pournaras Izabela Moise 25
How to contact us
Communication
bull Discussion session in the course
bull E-mail with subject[DATA-SCIENCE-COURSE-2016]ltotherinfogtto
ndash Iza Moise imoiseethzchndash Evangelos Pournaras epournarasethzch
Supervision - strictly for issues not addressed in the course
bull Mondays 1500-1700Clausiusstrasse 50 (CLU C 4) 8092 Zurich
Evangelos Pournaras Izabela Moise 26
Proposed Literature
B Ellis
Real-Time Analytics Techniques to Analyze and Visualize Streaming Data
Wiley Publishing 1st edition 2014
J Han
Data Mining Concepts and Techniques
Morgan Kaufmann Publishers Inc San Francisco CA USA 2005
T White
Hadoop The Definitive Guide
OrsquoReilly Media Inc 2015
I H Witten E Frank and M A Hall
Data Mining Practical Machine Learning Tools and Techniques
Morgan Kaufmann Publishers Inc San Francisco CA USA 3rd edition2011
Evangelos Pournaras Izabela Moise 27
What is next
bull Seminar thesis
bull Examples and applications
Evangelos Pournaras Izabela Moise 28
How to contact us
Communication
bull Discussion session in the course
bull E-mail with subject[DATA-SCIENCE-COURSE-2016]ltotherinfogtto
ndash Iza Moise imoiseethzchndash Evangelos Pournaras epournarasethzch
Supervision - strictly for issues not addressed in the course
bull Mondays 1500-1700Clausiusstrasse 50 (CLU C 4) 8092 Zurich
Evangelos Pournaras Izabela Moise 26
Proposed Literature
B Ellis
Real-Time Analytics Techniques to Analyze and Visualize Streaming Data
Wiley Publishing 1st edition 2014
J Han
Data Mining Concepts and Techniques
Morgan Kaufmann Publishers Inc San Francisco CA USA 2005
T White
Hadoop The Definitive Guide
OrsquoReilly Media Inc 2015
I H Witten E Frank and M A Hall
Data Mining Practical Machine Learning Tools and Techniques
Morgan Kaufmann Publishers Inc San Francisco CA USA 3rd edition2011
Evangelos Pournaras Izabela Moise 27
What is next
bull Seminar thesis
bull Examples and applications
Evangelos Pournaras Izabela Moise 28
Proposed Literature
B Ellis
Real-Time Analytics Techniques to Analyze and Visualize Streaming Data
Wiley Publishing 1st edition 2014
J Han
Data Mining Concepts and Techniques
Morgan Kaufmann Publishers Inc San Francisco CA USA 2005
T White
Hadoop The Definitive Guide
OrsquoReilly Media Inc 2015
I H Witten E Frank and M A Hall
Data Mining Practical Machine Learning Tools and Techniques
Morgan Kaufmann Publishers Inc San Francisco CA USA 3rd edition2011
Evangelos Pournaras Izabela Moise 27
What is next
bull Seminar thesis
bull Examples and applications
Evangelos Pournaras Izabela Moise 28
What is next
bull Seminar thesis
bull Examples and applications
Evangelos Pournaras Izabela Moise 28
top related