curriculum handbook for m.tech data science · 2019-07-12 · nitte meenakshi institue of...

18
NITTE MEENAKSHI INSTITUE OF TECHNOLOGY (A Unit of Nitte Education Trust (R), Mangalore) An Autonomous Institution Department of Information Science and Engineering Curriculum Handbook for M.Tech Data Science

Upload: others

Post on 03-Apr-2020

10 views

Category:

Documents


0 download

TRANSCRIPT

NITTE MEENAKSHI INSTITUE OF TECHNOLOGY (A Unit of Nitte Education Trust (R), Mangalore)

An Autonomous Institution

Department of Information Science and

Engineering

Curriculum

Handbook for

M.Tech – Data

Science

SEMESTER II

Semester: II Year: 2019-2020

Department: Information Science and Engineering Course Type: Core

Course Title: Scalable Computing Course Code:19DS21

L-T-P:3-0-2 Credits: 04

Total Contact Hours:39 hrs Duration of SEE: 3 hrs

SEE Marks: 50 CIE Marks: 50

Pre-requisites:

Software systems, programming, data structures and algorithms.

Good programming skills (preferably in Java) Operating Systems,

Distributed Computing Systems,

Introduction to Cloud Computing,

Design and Analysis of Algorithms.

Course Outcomes:

Students will be able to

CO’s Course Learning Outcomes BL

CO1 Describe the basic concepts and technologies of distributed systems. L2

CO2 Illustrate the requirements and challenges when designing, building and managing distributed systems.

L2

CO3 Analyze different scalable distributed system designs. L4

CO4 Analyze use cases for managing distributed file system L4

CO5 Implement the scalable distributed databases and its analysis. L3

Teaching Methodology:

Blackboard teaching and PPT

Assignment

Assessment Methods

Open Book Test for 10 Marks.

Assignment evaluation for 10 Marks.

Three internals, 30 Marks each will be conducted and the Average of best of two will be taken.

Final examination, of 100 Marks will be conducted and will be evaluated for 50 Marks.

Course Outcome to Programme Outcome Mapping

COURSE CONTENT

PO1 PO2 PO3 PO4 PO5 PO6

CO1 1 2 1 1

CO2 2 2 1 2 2

CO3 2 2 2 1

CO4 3 3 1 2 2

CO5 3 2 3 1 2 2

19DS21 2 2 2 1 2 2

Unit – I 8 Hrs

Distributed System Models and Enabling Technologies. : Scalable Computing over the Internet. . The Age of Internet

Computing, Scalable Computing Trends and New Paradigms, The Internet of Things and Cyber-Physical Systems,

Technologies for Network-Based Systems, Multicore CPUs and Multithreading Technologies, GPU Computing to

Exascale and Beyond, Memory, Storage, and Wide-Area Networking, Virtual Machines and Virtualization Middleware , Data Centre Virtualization for Cloud Computing, System Models for Distributed and Cloud Computing, Clusters of

Cooperative Computers, Grid Computing Infrastructures, Peer-to-Peer Network Families., Cloud Computing over the

Internet, Software Environments for Distributed Systems and Clouds. Service-Oriented Architecture (SOA), Trends

toward Distributed Operating Systems, Parallel and Distributed Programming Models, Performance, Security, and

Energy Efficiency, Performance Metrics and Scalability Analysis, Fault Tolerance and System Availability, Network

Threats and Data Integrity, Energy Efficiency in Distributed Computing.

Unit – II 8 Hrs

Virtual Machines and Virtualization of Clusters and Data Centres.: Implementation Levels of Virtualization, Levels of Virtualization Implementation, Design Requirements and Providers, Virtualization Support at the OS Level,

Middleware Support for Virtualization, Virtualization Structures/Tools and Mechanisms, Hypervisor and Xen

Architecture ,Binary Translation with Full Virtualization, Para-Virtualization with Compiler Support, Virtualization

of CPU, Memory, and I/O Devices, Hardware Support for Virtualization, CPU Virtualization, Memory Virtualization,

I/O Virtualization, Virtualization in Multi-Core Processors, Virtual Clusters and Resource Management, Physical

versus Virtual Clusters, Live VM Migration Steps and Performance Effects, Migration of Memory, Files, and Network

Resources, Dynamic Deployment of Virtual Clusters,

Containers: Containers and Serverless: Kernel namespaces and cgroups, Use cases: Docker, Kubernetes

Unit – III 8 Hrs

Cloud Platform Architecture over Virtualized Data Centers. Cloud Computing and Service Models, Public, Private,

and Hybrid Clouds, Cloud Ecosystem and Enabling Technologies, Infrastructure-as-a-Service (IaaS), Platform-as-a-

Service (PaaS) and Software-as-a-Service (SaaS). Public Cloud Platforms: GAE, AWS, and Azure. Data Science in

cloud: AWS machine learning, Azure Machine Learning, IBM BlueMix, Sense.io, Domino DataLabs, DataJoy,

PythonAnywhere

Unit – IV 7 Hrs

MapReduce and the New Software Stack :Distributed File Systems , MapReduce , Algorithms Using MapReduce ,

Extensions to MapReduce ,The Communication Cost Model,Complexity Theory for MapReduce ,

Unit – V 8 Hrs

Analysing Big Data:The Challenges of Data Science, Introducing Apache Spark. Introduction to Data Analysis with

Scala and Spark :Scala for Data Scientists,The Spark Programming Model, Record Linkage, Getting Started: The Spark

Shell and Spark Context,Bringing Data from the Cluster to the Client,Shipping Code from the Client to the

Cluster,Structuring Data with Tuples and Case Classes, Aggregations, Creating Histograms, Summary Statistics for

Continuous Variables, Creating Reusable Code for Computing Summary Statistics, Simple Variable Selection and Scoring

Text Books:

1. Kai Hwang, G. C. Fox, J.J. Dongarra “Distributed & Cloud Computing”, Morgan Kauffman Publishers

2. Mining of Massive Datasets. 2nd edition. - Jure Leskovec, AnandRajaraman, Jeff Ullman. Cambridge University Press. http://www.mmds.org/

3. By Sandy Ryza, Uri Laserson, Josh Wills, Sean Owen Advanced Analytics with Spark”” 2nd Edition,

Publisher: O'Reilly Media, ISBN: 9781491972946

Semester: II Year: 2019-2020

Department: Information Science and Engineering Course Type: Core

Course Title: Neural Network & Deep Learning Course Code:19DS22

L-T-P:3-0-2 Credits: 04

Total Contact Hours:39 hrs Duration of SEE: 3 hrs

SEE Marks: 50 CIE Marks: 50

Pre-requisites:

Machine learning-I, Data mining

Course Outcomes:

Students will be able to

CO’s Course Learning Outcomes BL

CO1 Understand the basic concepts of artificial neural networks L2

CO2 Model Neuron and Neural Network, and to analyze ANN learning, and its

applications. L4

CO3 Develop different single layer/multiple layer Perception learning

algorithms. L3

CO4 Design of another class of layered networks using deep learning principles. L3

Teaching Methodology:

Blackboard teaching and PPT

Executable Codes/ Live Demonstration

Programming Assignment

Assessment Methods

Online certification from Course-era/Edx, etc. for 10 marks

Programming assignments evaluated using rubrics for 10 marks

Three internals, 30 Marks each will be conducted and the Average of best of two will be taken.

Final examination, of 100 Marks will be conducted and will be evaluated for 50 Marks.

Course Outcome to Programme Outcome Mapping

PO1 PO2 PO3 PO4 PO5 PO6

CO1 1 2 1 1

CO2 2 2 1 2 2

CO3 2 2 2 1

CO4 3 3 1 2 2

CO5 3 2 3 1 2 2

19DS21 2 2 2 1 2 2

COURSE CONTENT

Unit – I 8 Hrs

Introduction to Neural Networks: Neural Network, Human Brain, Models of Neuron, Neural networks viewed as

directed graphs, Biological Neural Network, Artificial neuron, Artificial Neural Network architecture, ANN learning,

analysis and applications, Historical notes.

Learning Processes: Introduction, Error correction learning, Memory-based learning, Hebbian learning, Competitive

learning, Boltzmann learning, credit assignment problem, learning with and without teacher, learning tasks, Memory

and Adaptation.

Unit – II 8 Hrs

Single layer Perception: Introduction, Pattern Recognition, Linear classifier, Simple perception, Perception learning

algorithm, Modified Perception learning algorithm, Adaptive linear combiner, Continuous perception, Learning in

continuous perception. Limitation of Perception

Unit – III 8 Hrs

Multi-Layer Perceptron Networks: Introduction, MLP with 2 hidden layers, Simple layer of a MLP, Delta learning rule of the output layer, Multilayer feed forward neural network with continuous perceptions, Generalized delta learning

rule, Back propagation algorithm

Unit – IV 7 Hrs

Introduction to Deep learning: Neuro architectures as necessary building blocks for the DL techniques, Deep

Learning & Neocognitron, Deep Convolutional Neural Networks, Recurrent Neural Networks (RNN)

Unit – V 8 Hrs

Feature extraction, Deep Belief Networks, Restricted Boltzman Machines, Autoencoders, Training of Deep neural Networks, Applications and examples (Google, image/speech recognition), Deep Learning Tools: Tensorflow, Caffe,

Theano, Torch.

Text Books:

1. Neural Network- A Comprehensive Foundation, Simon Haykins, 2nd Edition, 1999, Pearson

Prentice Hall, ISBN-13: 978-0-13-147139-9.

2. Ian Goodfellow, Yoshua Bengio and Aaron Courville, Deep Learning, MIT Press, 2016.

3.

Reference Books:

1. Introduction to Artificial Neural Systems, Zurada and Jacek M, 1992, West Publishing Company,

ISBN: 9780534954604

2. Learning & Soft Computing, Vojislav Kecman, 1st Edition, 2004, Pearson Education, ISBN:0-262-

11255-8

3. Neural Networks Design, M T Hagan, H B Demoth, M Beale, 2002, Thomson Learning, ISBN-10:

0-9717321-1-6/ ISBN-13: 978-0-9717321-1-7

Online Materials

1. Deep learning courses by courseera: https://www.coursera.org/courses?query=deep%20learning

2. https://www.classcentral.com/course/coursera-neural-networks-and-deep-learning-9058

3. https://www.classcentral.com/course/coursera-introduction-to-deep-learning-9606

4. https://www.deeplearningbook.org/

Semester: II Year: 2019-2020

Department: Information Science and Engineering Course Type: Core

Course Title: Machine Learning-II Course Code:19DS23

L-T-P:3-0-2 Credits: 04

Total Contact Hours:39 hrs Duration of SEE: 3 hrs

SEE Marks: 50 CIE Marks: 50

Prerequisite:

Machine learning-I

Course Outcomes:

Students will be able to

CO’s Course Learning Outcomes BL

CO1 Understand Key concepts, tools and approaches for pattern recognition on

complex data sets L2

CO2 Understand Kernel methods for handling high dimensional and non-linear

patterns

L2

CO3 Apply the state-of-the-art algorithms such as Support Vector Machines and

Bayesian networks

L3

CO4 Solve real-world machine learning tasks from data to inference L2

CO5 Demonstrate the theoretical concepts and the motivations behind different

learning frameworks

L3

Teaching Methodology:

Black board teaching / Power Point presentations

Executable Codes/ Live Demonstration

Programming Assignment

Assessment Methods:

Online certification from NPTEL/course-era for 10 marks

Programming assignments evaluated using rubrics for 10 marks

Three internals, 30Marks each will be conducted and the Average of best of two will be taken.

Final examination, of100 Marks will be conducted and will be evaluatedfor50Marks.

Course Outcome to Programme Outcome Mapping:

PO

PO1

PO2

PO3

PO4

PO5

PO6

CO1

2 2 2

CO2

2 3 2

CO3

2 3 3 2 2

CO4 3 2 3 3 2

CO5 3 1 2 2 1 2

19DS23 2 2 3 2 1 2

COURSE CONTENT

Unit – I 8 Hrs

Instance based learning and learning set of rules: K- Nearest Neighbor Learning – Locally Weighted Regression –

Radial Basis Functions – Case Based Reasoning – Sequential Covering Algorithms – Learning Rule Sets – Learning

First Order Rules – Learning Sets of First Order Rules – Induction as Inverted Deduction – Inverting Resolution (TextBook1)

Unit – II 8 Hrs

Support Vector machine: Maximum margin hyperplanes: Rationale for Maximum Margin, Linear SVM: Separable

Case: Linear Decision Boundary, Margin of a Linear Classifier, Learning a Linear SVM model, Linear SVM: Non-

separable Case, Nonlinear SVM: Attribute Transformation, Learning a Nonlinear SVM, Kernel Trick, Characteristics

of SVM. (Chapter 5.5 from TextBook-2)

Unit – III 8 Hrs

Transfer Learning: Introduction, Transfer in inductive learning: Inductive transfer, Bayesian transfer, Hierarchical

transfer, Transfer with Missing Data or Class Labels.

Transfer in reinforcement learning: Starting-Point Methods, Imitation Methods, Hierarchical Methods, Alteration

Methods, New RL Algorithms

Avoiding negative transfer: Rejecting Bad Information, Choosing a Source Task, Modelling Task Similarity;

Automatically mapping tasks: Equalizing Task Representations, Trying Multiple Mappings, Mapping by Analogy; The future of transfer learning

Unit – IV 8 Hrs

Analytical learning: Introduction, Learning with Perfect domain theories, Remarks on Explanation based learning;

Explanation based learning of search control knowledge

Combining Inductive and Analytical learning: Motivation, Inductive-Analytical approaches to learning, using prior

knowledge to initialize the hypothesis, Using prior knowledge to initialize the hypothesis, Using prior knowledge to

alter the search objective, Using prior knowledge to augment search. (TextBook1)

Unit – V 7 Hrs

Reinforcement Learning: Introduction, The Learning Task, Q Learning, Nondeterministic Rewards and Actions,

Temporal Difference learning, Generalization from Examples, Relationship to Dynamic programming. (TextBook1)

Text Books:

1. Tom M. Mitchell, “Machine Learning”, McGraw-Hill Education (INDIAN EDITION), 2013.

2. Amanda Casari, Alice Zheng, “Feature Engineering for Machine Learning”, O’Reilly, 2018.

Reference Books:

1. Hands-On Machine Learning with Scikit-Learn and Tensor Flow: Concepts, Tools, and Techniques to

Build Intelligent Systems

Online Materials

1. Machine Learning by Stanford University-Coursera

2. Machine Learning with TensorFlow on Google Cloud Platform Specialization-Coursera

3. Become a Machine Learning Engineer – Udacity.

Semester: II Year: 2019-2020

Department: Information Science and Engineering Course Type: Elective

Course Title: Data Security and Privacy Course Code:19DSE243

L-T-P: 4-0-0 Credits: 04

Total Contact Hours:52 hrs Duration of SEE: 3 hrs

SEE Marks: 50 CIE Marks: 50

Pre-requisites:

Knowledge of databases and how they are managed.

Fundamentals of algorithm design techniques.

.

Course Outcomes: Students will be able to

CO’s Course Learning Outcomes BL

CO1 Analyze the vulnerabilities in any computing system and hence be able to

design a security solution L4

CO2 Identify the security issues in the data network and resolve it. L2

CO3 Evaluate security mechanisms using rigorous approaches. L4

CO4 Understand the privacy and anonymoization L2

Teaching Methodology:

Black Board Teaching / Power Point Presentation

Assessment Methods:

Seminar on data security for 10 marks

Assignment based on data security and access control problems for 10 marks

Three internals, 30 Marks each will be conducted and the Average of best of two will be taken.

Final examination of 100 Marks will be conducted and will be evaluated for 50 Marks.

Course Outcome to Programme Outcome Mapping

PO1 PO2 PO3 PO4 PO5 PO6

CO1 2 2 1

CO2 2 2 1 1

CO3 3 2 3 2

CO4 2 2 2 2

19DSE243 2 2 2 1 1 1

COURSE CONTENT

UNIT – I : DATA SECURITY FUNDAMENTALS 10 hrs

Computer Security Concepts,IntrusionDetection,Firewalls: Characteristics,Types.Classical Encryption Techniques Symmetric Cipher Model, Cryptography, Cryptanalysis and Brute-Force Attack,Substitution Techniques, Caesar

Cipher, Monoalphabetic Cipher, Polyalphabetic Cipher, One Time Pad.Block Ciphers and the data encryption standard: Traditional block Cipher structure, stream Ciphers and block Ciphers, Motivation for the Feistel Cipher structure, the Feistel Cipher.

UNIT – II: Public-Key Cryptography 10 hrs

Principles of Public-key Cryptosystems, Public-Key Cryptosystems, Applications for Public-Key Cryptosystems, Requirements for Public-Key Cryptosystems.Public-Key Cryptanalysis. The RSA Algorithm, Description of the

Algorithm,Computational Aspects, the Security of RSA. Other Public-Key Cryptosystems:Diffe-Hellman Key Exchange,

The Algorithm, Key exchange protocols, Man-in-the-Middle Attack, Simple secret key distribution, Secretkey distribution

with confidentiality and authentication, A hybrid scheme.Public keys certificates, X.509certificates. Public key

infrastructure, PKIXManagement Functions, PKIX Management Protocols.

UNIT – III : Authentication and Authorization 10 hrs

Authentication Vs Authorization, Authentication Methods –Password authentication, Public Key Cryptography, Biometric

authentication, Out of band, Authentication Protocols – SSL, Password Authentication Protocol (PAP), Kerberos, Email

authentication,- PGP, Database authentication, Message authentication; secure hash functions and Authorization

Approaches to hmac; publickey cryptography principles; public-key cryptography algorithms, digital signatures, key

management. Kerberos, x.509 directory authentication service. Authorization Definition, Multilayer authorization,

UNIT – IV: DATA PRIVACY AND ANONYMIZATION 12 hrs

Understanding Privacy: Social Aspects of Privacy Legal Aspects of Privacy and Privacy Regulations Effect of Database and Data Mining technologies on privacy challenges raised by new emerging technologies such RFID,

biometrics, etc., Privacy Models Introduction to Anonymization, Anonymization models: K-anonymity, l-diversity, t-closeness, differential privacy Database as a service

UNIT – V : DATA PRIVACY FOR DATA SCIENCE 10 hrs

Using technology for preserving privacy. Statistical Database security Inference Control Secure Multi-party computation and Cryptography Privacy-preserving Data mining Hippocratic databases

Emerging Applications: Social Network Privacy, Location Privacy, Query Log Privacy, Biomedical Privacy

Text books:

1. Cryptography and Network Security Principles and Practice William Stallings, 6th edition,

Pearson Education

2. The Algorithmic Foundations of DifferentialPrivacy, Cynthia Dwork and Aaron Roth. DOI:

10.1561/0400000042.

Reference books:

1. https://s3.amazonaws.com/assets.datacamp.com/production/course_6412/slides/chapter1.pdf

2. Privacy-Preserving Data Mining- Models and Algorithms, Charu C Aggarwal, Yu Philips, S., Springer 3. Principles of Information Security, Information SecurityProfessional - Michael E. Whitman and

Herbert J. Mattord,4th Edition, Thompson.

Semester: II Year: 2019-2020

Department: Information Science and Engineering Course Type: Elective

Course Title: Big Data Analytics Course Code:19DSE251

L-T-P: 4-0-0 Credits: 04

Total Contact Hours:52 hrs Duration of SEE: 3 hrs

SEE Marks: 50 CIE Marks: 50

Prerequisite:

Database Management Systems

Course Outcomes:

Students will be able to

CO’s Course Learning Outcomes BL

CO1 Describe Big Data and its importance with its applications L2

CO2 Differentiate various big data technologies like Hadoop MapReduce, Pig,

Hive, Hbase and No-SQL. L4

CO3 Apply tools and techniques to analyze Big Data. L3

CO4 Design a solution for a given problem using suitable Big Data Techniques L4

Teaching Methodology:

Black board teaching/ Power Point presentations

Executable Codes/ Live Demonstration

Programming Assignment

Assessment Methods: Online certification for 10 marks

Programming assignments evaluated using rubrics for 10 marks

Three internals, 30Marks each will be conducted and the Average of best of two will be taken.

Final examination, of100 Marks will be conducted and will be evaluatedfor50Marks.

Course Outcome to Programme Outcome Mapping

PO1 PO2 PO3 PO4 PO5 PO6

CO1 3 3 2

CO2 3 2 3 1

CO3 3 2 3 1

CO4 3 2 3 1

19DSE152 3 2 3 1

COURSE CONTENT

Unit – I 10 Hrs

INTRODUCTION TO BIG DATA: Big Data and its Importance – Four V’s of Big Data – Drivers for Big Data –

Introduction to Big Data Analytics – Big Data Analytics applications

BIG DATA TECHNOLOGIES:Hadoop’s Parallel World – Data discovery – Open source technology for Big Data Analytics – cloud and Big Data –Predictive Analytics – Mobile Business Intelligence and Big Data – Crowd Sourcing

Analytics – Inter- and Trans-Firewall Analytics - Information Management.

Unit – II 10Hrs

PROCESSING BIG DATA: Integrating disparate data stores - Mapping data to the programming framework -

Connecting and extracting data from storage - Transforming data for processing - Subdividing data in preparation for

Hadoop Map Reduce.

Unit – III 10 Hrs

HADOOP MAPREDUCE: Employing Hadoop Map Reduce - Creating the components of Hadoop Map Reduce jobs

- Distributing data processing across server farms -Executing Hadoop Map Reduce jobs - Monitoring the progress of

job flows - The Building Blocks of Hadoop Map Reduce - Distinguishing Hadoop daemons - Investigating the Hadoop

Distributed File System Selecting appropriate execution modes: local, pseudo-distributed, fully distributed.

Unit – IV 12 Hrs

BIG DATA TOOLS AND TECHNIQUES: Installing and Running Pig – Comparison with Databases – Pig Latin –

User Define Functions – Data Processing Operators – Installing and Running Hive – Hive QL – Tables – Querying

Data – User-Defined Functions – Oracle Big Data

Unit – V 10 Hrs

ADVANCED ANALYTICS PLATFORM: Real-Time Architecture – Orchestration and Synthesis Using Analytics Engines – Discovery using Data at Rest – Implementation of Big Data Analytics – Big Data Convergence – Analytics

Business Maturity Model.

Text Books:

1. Michael Minelli, Michehe Chambers, “Big Data, Big Analytics: Emerging Business Intelligence and

Analytic Trends for Today’s Business”, 1stEdition, AmbigaDhiraj, Wiely CIO Series, 2013.

2. ArvindSathi, “Big Data Analytics: Disruptive Technologies for Changing the Game”, 1st Edition, IBM

Corporation, 2012.

3. Bill Franks, “Taming the Big Data Tidal Wave: Finding Opportunities in Huge Data Streams with

Advanced Analytics”, 1st Edition, Wiley and SAS Business Series, 2012.

4. Tom White, “Hadoop: The Definitive Guide”, 3rd Edition, O’reilly, 2012

Additional Reference Book

1. Boris lublinsky, Kevin t. Smith, Alexey Yakubovich, “Professional Hadoop Solutions”, Wiley, ISBN:

9788126551071, 2015.

2. Chris Eaton, Dirk deroos et al., “Understanding Big data”, McGraw Hill, 2012.

3. VigneshPrajapati, “Big Data Analytics with R and Haoop”, Packet Publishing 2013.

4. JyLiebowitz, “Big Data and Business analytics”,CRC press, 2013

Online Materials

1. http://www.bigdatauniversity.com/

2. https://www.coursera.org/courses?query=big%20data%20analytics

Semester: II Year: 2019-2020

Department: Information Science and Engineering Course Type: Elective

Course Title: Business Analytics Course Code:19DS252

L-T-P: 4-0-0 Credits: 04

Total Contact Hours:52 hrs Duration of SEE: 3 hrs

SEE Marks: 50 CIE Marks: 50

Course Outcomes:

Students will be able to

CO’s Course Learning Outcomes BL

CO1 Describing the significance of global platform for data retrieval/process

among different business cultures of the world. L2

CO2 Develop domain knowledge of various technology and its application to

facilitates managerial decision /MIS L2

CO3 Enable communication for data driven decision making L3

CO4 Implement cross functional collaboration to enhance efficiency and

productivity. L3

Teaching Methodology:

ICT enabled Classroom teaching

Case study

Practical / live assignment

Interactive class room discussions

Assessment Methods

Group Discussion for 10 Marks.

Assignment evaluation for 10 Marks.

Three internals, 30 Marks each will be conducted and the Average of best of two will be taken.

Final examination, of 100 Marks will be conducted and will be evaluated for 50 Marks.

Course Outcome to Programme Outcome Mapping

PO1 PO2 PO3 PO4 PO5 PO6

CO1 1 2 2 1

CO2 2 2 1 2 2

CO3 3 2 3 1

CO4 3 2 3 1 2 1

19DS241 3 2 2 1 2 1

COURSE CONTENT

Unit – I 12 Hrs

Introduction to Business Analytics: Why Analytics, Business Analytics: the Science of data driven decision making,

Descriptive Analysis, Predictive Analytics, Prescriptive Analytics, Big Data Analytics, Web and Social media

Analytics, Machine Learning Algorithms, Framework for data driven decision making, Analytics Capability Building, Roadmap, Challenges, Types (Descriptive, Predictive and Prescriptive), Business Intelligence versus Business

Analytics, Transaction Processing v/s Analytic Processing, OLTP v/s OLAP, OLAP Operations, Data models for

OLTP

Unit – II 10Hrs

Descriptive Analytics: Introduction, Data Types and Scales, Types of Data Measurement Scale, Population and

Sample, Types of Data Measurement Scale

Data Warehouse: Definition, characteristics, framework Data lake Business Reporting, Visual Analytics: Definition,

concepts, Different types of charts and graphs, Emergence of data visualization and visual analytics

Unit – III 10 Hrs

Data Mining: Concepts and applications, Data mining process Text & Web Analytics, Text analytics and text mining

overview, Text mining applications, Web mining overview, Sentiment analysis overview, Supply Chain and

Operations Analytics, Customer Analytics, Project Management, Decision Analysis, Process Analytics, Market

Intelligence

Unit – IV 12 Hrs

Social Network Analysis: Overview of SNA, history and resources, Mathematical foundations, matrices and graph

theory, Whole versus personal networks, one-mode versus two-mode network data, Collecting network data, Informant accuracy, Network visualizations, Cohesive subgroups, bottom-up and top-down approaches, Block models, Egocentric

SNA, design and applications

Unit – V 8 Hrs

Business Performance Management: Business performance management cycle, KPI, Dashboard Analytics in

Business Support Functions, Sales & Marketing Analytics, HR Analytics, Financial Analytics, Production and

operations analytics, Analytics in Industries: Telecom, Retail, Healthcare, Financial Services

Text Books:

1. U. Dinesh Kumar, “Business Analytics – The Science of Data Driven Decision Making”, Wiley 2017.

2. Ramesh Sharda, DursunDelen, Efraim Turban, “Business Intelligence: A Managerial Perspective on

Analytics”, Pearson, 3e.

3. Wasserman, S., & Faust, K. (1994). Social Network Analysis: Methods and Applications. A classic,

essential textbook on SNA.

Reference Books:

1. Jesper Thorlund &Gert H.N. Laursen, “ Business Analytics for Managers: Taking Business Intelligence

Beyond”, Wiley 2. Sahil Raj, “Business Analytics”, Cengage

3. James R. Evans, “Business Analytics”, Pearson

4. https://www.bebr.ufl.edu/sites/default/files/ANG5420_Syllabus.pdf

List of Journals / Periodicals / Magazines / Newspapers / Web resources (Case Study):

International Journal of Business Analytics

International Journal of Business Analytics and intelligence

International Journal on Consumer and Business Analytics

Analytics India – Magazine

Semester: II Year: 2019-2020

Department: Information Science and Engineering Course Type: Elective

Course Title: Social Network Analysis Course Code:19DSE253

L-T-P: 4-0-0 Credits: 04

Total Contact Hours:52 hrs Duration of SEE: 3 hrs

SEE Marks: 50 CIE Marks: 50

Prerequisite:

Fundamental of Network, Data Mining, Graph theory

Advanced Algorithms

Course Outcomes:

Students will be able to

CO’s Course Learning Outcomes BL

CO1 Understand the basics of Social Network Models and analysis. L2

CO2 Analysesocial network models for community detection. L4

CO3 Implement link prediction and event detection L3

CO4 Analyse social influence and contributing factors. L4

Teaching Methodology:

Black board teaching

Power Point presentations

Assessment Methods: Rubrics for evaluation of case study 20 Marks

Three internals, 30Marks each will be conducted and the Average of best of two will be taken.

Final examination, of100 Marks will be conducted and will be evaluatedfor50Marks.

Course Outcome to Programme Outcome Mapping:

PO1 PO2 PO3 PO4 PO5 PO6

CO1 2 2 2

CO2 2 1

CO3 3 2 1 2

CO4 3 2 3

19DSE253 2 2 1 2

COURSE CONTENT

Unit – I 10 Hrs

Social Networks : An Introduction; Types of Networks: General Random Networks, Small World Networks, Scale-

Free Networks; Examples of Information Networks; Network Centrality Measures; Strong and Weak ties; Homophily Walks: Random walk-based proximity measures, Other graph-based proximity measures. Clustering with

random-walk based measures

Unit – II 12 Hrs

Community Detection Algorithms for Community Detection: The Kernighan-Lin algorithm,

Agglomerative/Divisive algorithms, Spectral Algorithms, Multi-level Graph partitioning, Markov Clustering;

Community Discovery in Directed Networks , Community Discovery in Dynamic Networks, Community Discovery in

Heterogeneous Networks, Evolution of Community.

Unit – III 12 Hrs

Link Prediction: Feature based Link Prediction, Bayesian Probabilistic Models, Probabilistic Relational Models,

Linear Algebraic Methods: Network Evolution based Probabilistic Model, Hierarchical Probabilistic Model, Relational

Bayesian Network. Relational Markov Network.

Unit – IV 10 Hrs

Event Detection: Classification of Text Streams, Event Detection and Tracking: Bag of Words, Temporal, location,

ontology based algorithms. Evolution Analysis in Text Streams, Sentiment analysis.

Unit – V 8 Hrs

Social Influence Analysis: Influence measures, Social Similarity - Measuring Influence, Influencing actions and

interactions. Influence maximization.

Text Books:

1. David Easley, Jon Kleinberg: Networks, Crowds and Markets: Reasoning about a highly connected

world, Cambridge Univ Press 2010

2. S.Wasserman, K.Faust: Social Network Analysis: Methods and Applications, Cambridge Univ Press,

1994

Semester: II Year: 2019-2020

Department: Information Science and Engineering Course Type: Core

Course Title: Natural Language & Text Mining Course Code: 19DSE254

L-T-P:4-0-0 Credits: 04

Total Contact Hours:52 hrs Duration of SEE: 3 hrs

SEE Marks: 50 CIE Marks: 50

Prerequisites:

Fundamental of Language Processing.

Course outcomes:

Students will be able to:

CO’s Course Learning Outcomes BL

CO1 Describe the basics of Natural Language Processing. L2

CO2 Analyze syntactic and semantic parsing techniques. L2

CO3 Implement a rule-based system to tackle morphology/syntax of a Language L3

CO4 Describe the various issues of Natural Language of Processing. L2

Teaching Methodology:

• Blackboard teaching

• PowerPoint presentations

Assessment Methods:

• Three internals, 30 Marks each will be conducted and the Average of best of two will be taken.

• Rubrics for evaluation of case study 20 Marks

• Final examination, of 100 Marks will be conducted and will be evaluated for 50 Marks.

Course Outcome to Programme Outcome Mapping:

PO 1 PO 2 PO 3 PO 4 PO 5 PO 6

CO1 3 2

CO2 3 2 1

CO3 3 2 2 1 2

CO4 3 3 2 2

19DSE254 3 2 2 2 2 2

COURSE CONTENT

Unit – I 11 Hrs

Classical Approaches to Natural Language Processing: context, Classical Toolkit Text Preprocessing Lexical

Analysis, Syntactic Parsing, Semantic Analysis , Natural Language Generation

Text Preprocessing :Introduction Challenges of Text Preprocessing , Character-Set Dependence , Language

Dependence , Corpus Dependence , Application Dependence ,Tokenization ,Tokenization in Space-Delimited

Languages , Tokenization in Un segmented Languages , Sentence Segmentation ,Sentence Boundary Punctuation

, The Importance of Context , Traditional Rule-Based Approaches. Lexical Analysis: Introduction ,Finite State

Morphonology ,Closing Remarks on Finite State Morphonology , Finite State Morphology , Disjunctive Affixes,

Inflectional Classes, and Exceptionality , Further Remarks on Finite State Lexical Analysis , “Difficult”

Morphology and Lexical Analysis ,Isomorphism Problems , Contiguity Problems , Paradigm-Based Lexical

Analysis, Paradigmatic Relations and Generalization..

Unit – II 12 Hrs

Syntactic Parsing: Introduction ,Background ,Context-Free Grammars , Example Grammar , Syntax Trees , Other

Grammar Formalisms , Basic Concepts in Parsing , Parsing as Deduction ,Deduction Systems , The CKY Algorithm ,

Chart Parsing , Bottom-Up Left-Corner Parsing , Top-Down Earley-Style Parsing , Example Session.Semantic

Analysis : Basic Concepts and Issues in Natural Language Semantics ,Theories and Approaches to Semantic

Representation , Logical Approaches , Discourse Representation Theory , Pustejovsky’s Generative Lexicon , Natural

Semantic Meta language , Object-Oriented Semantics , Relational Issues in Lexical Semantics , Sense Relations and

Ontologies , Roles , Fine-Grained Lexical-Semantic Analysis: Three Case Studies , Emotional Meanings: “Sadness”

and “Worry” in English, Ethno geographical Categories: “Rivers” and “Creeks” , Functional Macro-Categories .

Prospectus and “Hard Problems”

Unit – III 08 Hrs

Natural Language Generation: Introduction ,Generation Compared to Comprehension, The Components of a

Generator, Components and Levels of Representation , Approaches to Text Planning ,The Function of the Speaker ,

Desiderata for Text Planning , Pushing vs. Pulling , Planning by Progressive Refinement of the Speaker’s Message ,

Planning Using Rhetorical Operators , Text Schemas , The Linguistic Component, Surface Realization Components ,

Relationship to Linguistic Theory , Chunk Size , Assembling vs. Navigating , Systemic Grammars , Functional

Unification Grammars The Cutting Edge Story Generation , Personality-Sensitive Generation Conclusions.

Unit – IV 10 Hrs

Corpus Creation: Introduction, Corpus Size, Balance, Representativeness, and Sampling Data Capture and

Copyright Corpus Markup and Annotation Multilingual Corpora Multimodal Corpora. Part-of-Speech Tagging

Tunga: Introduction, Parts of Speech , Part-of-Speech Problem , The General Framework, Part-of-Speech Tagging

Approaches , Rule-Based Approaches , Markov Model Approaches , Maximum Entropy Approaches ,Other Statistical

and Machine Learning Approaches , Methods and Relevant Work , Combining Taggers

Unit – V 8 Hrs

Information Retrieval: Introduction, Indexing, Indexing Dimensions • Indexing Process, IR Models Classical

Boolean Model , Vector-Space Models , Probabilistic Models , Query Expansion and Relevance Feedback , Advanced

Models , Evaluation and Failure Analysis , Evaluation Campaigns , Evaluation Measures , Failure Analysis , Natural

Language Processing and Information Retrieval, Morphology , Orthographic Variation and Spelling Errors , Syntax ,

Semantics , Related Applications

Text Analytics: text analytics systems, Named entity recognition Disambiguation, Document clustering: identification

of sets of similar text documents, Term frequency-inverse document frequency- TFIDF, Analysis and Evaluation of

Current Graph-Based Text Mining Researches, Coreference: Relationship, Case study on Biomedical text mining,

Text Books:

1. Nitin Indurkhya, Fred J Damerau “Handbook of Natural Language Processing”, Chapman & Hall/CRC

Publications, 2nd Editions 2010.

Reference Books:

1. Tanveer Sidiqui, U.S Tiwary, “ Natural Language Processing & Information Retrieval”, Oxford

University Press, 2008.

2. Anne Kao & Stephen R Poteel, “ Natural Language & Text Mining”, Springer- Verlag , 2007