Department of COMPUTER SCIENCE 

Syllabus for

1 Semester  2020  Batch  
Course Code 
Course 
Hours Per Week 
Credits 
Marks 
MDS131  MATHEMATICAL FOUNDATION FOR DATA SCIENCE  I  4  4  100 
MDS132  PROBABILITY AND DISTRIBUTION THEORY  4  4  100 
MDS133  PRINCIPLES OF DATA SCIENCE  4  4  100 
MDS134  RESEARCH METHODOLOGY  2  2  50 
MDS161A  INTRODUCTION TO STATISTICS  2  2  50 
MDS161B  INTRODUCTION TO COMPUTERS AND PROGRAMMING  2  2  50 
MDS161C  LINUX ADMINISTRATION  2  2  50 
MDS171  DATA BASE TECHNOLOGIES  6  5  150 
MDS172  INFERENTIAL STATISTICS  6  5  150 
MDS173  PROGRAMMING FOR DATA SCIENCE IN PYTHON  6  4  100 
2 Semester  2020  Batch  
Course Code 
Course 
Hours Per Week 
Credits 
Marks 
MDS231  MATHEMATICAL FOUNDATION FOR DATA SCIENCE  II  4  04  100 
MDS232  REGRESSION ANALYSIS  4  4  100 
MDS241A  MULTIVARIATE ANALYSIS  4  4  100 
MDS241B  STOCHASTIC PROCESS  4  4  100 
MDS271  MACHINE LEARNING  6  5  150 
MDS272A  HADOOP  6  5  150 
MDS272B  IMAGE AND VIDEO ANALYTICS  6  5  150 
MDS272C  INTERNET OF THINGS  6  5  150 
MDS273  PROGRAMMING FOR DATA SCIENCE IN R  6  4  100 
3 Semester  2019  Batch  
Course Code 
Course 
Hours Per Week 
Credits 
Marks 
MDS331  NEURAL NETWORKS AND DEEP LEARNING  4  4  100 
MDS341A  TIME SERIES ANALYSIS AND FORECASTING TECHNIQUES  4  4  100 
MDS341B  BAYESIAN INFERENCE  4  4  100 
MDS341C  ECONOMETRICS  4  4  100 
MDS371  CLOUD ANALYTICS  6  5  150 
MDS372A  NATURAL LANGUAGE PROCESSING  6  5  150 
MDS372B  WEB ANALYTICS  6  5  150 
MDS372C  BIO INFORMATICS  6  5  150 
MDS372D  EVOLUTIONARY ALGORITHMS  6  5  150 
MDS381  SPECIALIZATION PROJECT  4  2  100 
MDS382  SEMINAR  2  1  50 
MDS383  RESEARCH MODELLING AND IMPLEMENTATION  4  2  50 
4 Semester  2019  Batch  
Course Code 
Course 
Hours Per Week 
Credits 
Marks 
MDS481  INDUSTRY PROJECT  2  10  300 
MDS482  RESEARCH PUBLICATION  4  2  100 
 
Assesment Pattern  
CIA  50% ESE  50%  
Examination And Assesments  
CIA  50% ESE  50%  
Department Overview:  
Department of Computer Science of CHRIST (Deemed to be University) strives to shape outstanding computer professionals with ethical and human values to reshape nation?s destiny. The training imparted aims to prepare young minds for the challenging opportunities in the IT industry with a global awareness rooted in the Indian soil, nourished and supported by experts in the field.  
Mission Statement:  
Vision
The Department of Computer Science endeavours to imbibe the vision of the University ?Excellence and Service?. The department is committed to this philosophy which pervades every aspect and functioning of the department.
Mission
?To develop IT professionals with ethical and human values?. To accomplish our mission, the department encourages students to apply their acquired knowledge and skills towards professional achievements in their career. The department also moulds the st  
Introduction to Program:  
Data Science is popular in all academia, business sectors, and research and development to make effective decision in day to day activities. MSc in Data Science is a two year programme with four semesters. This programme aims to provide opportunity to all candidates to master the skill sets specific to data science with research bent. The curriculum supports the students to obtain adequate knowledge in theory of data science with hands on experience in relevant domains and tools. Candidate gains exposure to research models and industry standard applications in data science through guest lectures, seminars, projects, internships, etc.  
Program Objective:  
Programme Objective
? To acquire indepth understanding of the theoretical concepts in statistics, data analysis, data mining, machine learning and other advanced data science techniques.
? To gain practical experience in programming tools for data sciences, database systems, machine learning and big data tools.
? To strengthen the analytical and problem solving skill through developing real time applications.
? To empower students with tools and techniques for handling, managing, analyzing and interpreting data.
? To imbibe quality research and develop solutions to the social issues.
Programme Specific Outcomes
PSO1: Abstract thinking: Ability to understand the abstract concepts that lead to various data science theories in Mathematics, Statistics and Computer science.
PSO2: Problem Analysis and Design Ability to identify analyze and design solutions for data science problems using fundamental principles of mathematics, Statistics, computing sciences, and relevant domain disciplines.
PSO3: Modern software tool usage: Acquire the skills in handling data science programming tools towards problem solving and solution analysis for domain specific problems.
PSO4: Innovation And Entrepreneurship: Produce innovative IT solutions and services based on global needs and trends.
PSO5: Societal And Environmental Concern: Utilize the data science theories for societal and environmental concerns.
PSO6: Professional Ethics: Understand and commit to professional ethics and  
 
Assesment Pattern  
CIA50% ESE50%  
Examination And Assesments  
CIA + ESE  
Department Overview:  
Department of Data Science of Christ (Deemed to be University), Lavasa is established to shape students into outstanding Data Scientist and Analytics professionals with ethical and human values. The department offers various under graduation and postgraduation programmes viz., Bachelor of Science in Data Science, Master of Science in Data Science, Bachelor of Science in Economics & Analytics, and Doctor of Philosophy in the areas of Computer Science and Engineering. The department has rich expertise in terms of faculty resources who are well trained in various fields like Data Science, Data Security, Data Analytics, Artificial Intelligence, Machine learning, Networking, Data mining, Big Data, Text Mining, Knowledge Representation, Soft Computing, and Cloud Computing. The department has a wide variety of labs set up, namely Machine learning lab, Data Analytics Lab, Open Source lab, etc. exclusively for the handson training of students for their laboriented courses and research.
The department intermittently organizes handson workshops on recent technologies like Machine learning, Cloud Computing, Hadoop, etc. for the students to keep them industryready. The department equips students with a holistic education to be better citizens.  
Mission Statement:  
*Vision
?Enrich Ethical Scientific Excellence?
*Mission
?To develop Data Science professionals with ethical and social values.?
? Divulge stateofart knowledge in the area of Data Science and Analytics.?
?Encourages research and innovation.?
?Accustoms the students with current industry practices, teamwork, and entrepreneurship.?  
Introduction to Program:  
Data Science is prevalent in all academia, business sectors, and research and development to make effective decisions in day to day activities. MSc in Data Science is a twoyear programme with four semesters. This programme aims to provide opportunities to all candidates to master the skill sets specific to data science with research bent. The curriculum supports the students to obtain adequate knowledge in a theory of data science with handson experience in relevant domains and tools. Candidate gains exposure to research models and industrystandard applications in data science through guest lectures, seminars, projects, internships, etc.  
Program Objective:  
Programme Objective
? To acquire an indepth understanding of the theoretical concepts in statistics, data analysis, data mining, machine learning, and other advanced data science techniques.
? To gain practical experience in programming tools for data sciences, database systems, machine learning, and big data tools.
? To strengthen analytical and problemsolving skill through developing realtime applications.
? To empower students with tools and techniques for handling, managing, analyzing, and interpreting data.
Ethics and Human Values
1. Only proprietary or opensource software would be used for academic teaching and learning purposes.
2. Copying of programs from the internet, friends, or other sources is strictly discouraged since it impairs the development of programming skills.
3. Unique Practical (Domainbased) exercises to ensure that the students don?t involve in code plagiarism.
4. Projects undertaken by students during the course are done in teams to improve collaborative work and synergy between team members.
5. Projects involve modularization, which initiates students to take individual responsibility for common goals.
6. Passion for excellence is promoted among the students, be it in software development or project documentation.
7. Giving due credit to sources during the seminar and research assignment is promoted among the students
8. The course and its design enforce the practice of proper referencing techniques to improve the sense of integri  
MDS131  MATHEMATICAL FOUNDATION FOR DATA SCIENCE  I (2020 Batch)  
Total Teaching Hours for Semester:60 
No of Lecture Hours/Week:4 
Max Marks:100 
Credits:4 
Course Objectives/Course Description 

Linear Algebra plays a fundamental role in the theory of Data Science. This course aims at introducing the basic notions of vector spaces, Linear Algebra and the use of Linear Algebra in applications to Data Science. 

Learning Outcome 

On successful completion of this course, a student will be able to: CO1: Understand the properties of Vector spaces CO2: Use the properties of Linear Maps in solving problems on Linear Algebra CO3: Demonstrate proficiency on the topics Eigenvalues, Eigenvectors and Inner Product Spaces CO4: Apply mathematics for some applications in Data Science 
Unit1 
Teaching Hours:15 
Indroduction to Vector Spacees


Vector Spaces: R^{n} and C^{n}, lists, F^{n} and digression on Fields, Definition of Vector spaces, Subspaces, sums of Subspaces, Direct Sums, Span and Linear Independence, bases, dimension.  
Unit2 
Teaching Hours:20 
Linear Maps


DefinitionofLinearMapsAlgebraicOperationson L(V,W)  Null spaces and InjectivityRangeandSurjectivityFundamentalTheoremsofLinearMapsRepresenting aLinearMapbyaMatrixInvertibleLinearMapsIsomorphicVectorspacesLinearMap as Matrix Multiplication  Operators  Products of Vector Spaces  Product of Direct Sum  Quotients of Vectorspaces.  
Unit3 
Teaching Hours:10 
Eigenvalues, Eigenvctors and inner product Spacees


Eigenvalues and Eigenvectors  Eigenvectors and Upper Triangular matrices  Eigenspaces and Diagonal Matrices  Inner Products and Norms  Linear functionals on Inner Product spaces.  
Unit4 
Teaching Hours:15 
Mathematics Applied to Data Scincee


Singular value decomposition  Handwritten digits and simple algorithm  Classification of handwritten digits using SVD bases  Tangent distance  Text Mining.  
Text Books And Reference Books: 1. S. Axler, Linear algebra done right, Springer, 2017. 2. Eldén Lars, Matrix methods in data mining and pattern recognition, Society for Industrial and Applied Mathematics, 2007.  
Essential Reading / Recommended Reading 1. E. Davis, Linear algebra and probability for computer science applications, CRC Press, 2012. 2. J. V. Kepner and J. R. Gilbert, Graph algorithms in the language of linear algebra, Society for Industrial and Applied Mathematics, 2011. 3. D. A. Simovici, Linear algebra tools for data mining, World Scientific Publishing, 2012. 4. P. N. Klein, Coding the matrix: linear algebra through applications to computer science, Newtonian Press, 2015.  
Evaluation Pattern CIA  50% ESE  50%  
MDS131L  MATHEMATICAL FOUNDATION FOR DATA SCIENCE  I (2020 Batch)  
Total Teaching Hours for Semester:60 
No of Lecture Hours/Week:4 
Max Marks:100 
Credits:4 
Course Objectives/Course Description 

Linear Algebra plays a fundamental role in the theory of Data Science. This course aims at introducing the basic notions of vector spaces, Linear Algebra and the use of Linear Algebra in applications to Data Science. 

Learning Outcome 

1. Understand the properties of Vector spaces
2. Use the properties of Linear Maps in solving problems on Linear Algebra
3. Demonstrate proficiency on the topics Eigenvalues, Eigenvectors and Inner Product Spaces
4. Apply mathematics for some applications in Data Science 
Unit1 
Teaching Hours:15 
INTRODUCTION TO VECTOR SPACES


Vector Spaces: R^{n} and C^{n}, lists, F^{n}and digression on Fields, Definition of Vector spaces, Subspaces, sums of Subspaces, Direct Sums, Span and Linear Independence, bases, dimension  
Unit2 
Teaching Hours:20 
LINEAR MAPS


Definition of Linear Maps  Algebraic Operations on L(V,W)  Null spaces and Injectivity  Range and Surjectivity  Fundamental Theorems of Linear Maps  Representing a Linear Map by a Matrix  Invertible Linear Maps  Isomorphic Vector spaces  Linear Map as Matrix Multiplication  Operators  Products of Vector Spaces  Product of Direct Sum  Quotients of Vector spaces  
Unit3 
Teaching Hours:10 
EIGENVALUES, EIGENVECTORS, AND INNER PRODUCT SPACES


Eigenvalues and Eigenvectors  Eigenvectors and Upper Triangular matrices  Eigenspaces and Diagonal Matrices  Inner Products and Norms  Linear functionals on Inner Product spaces.  
Unit4 
Teaching Hours:15 
MATHEMATICS APPLIED TO DATA SCIENCE


Singular value decomposition  Handwritten digits and simple algorithm  Classification of handwritten digits using SVD bases  Tangent distance  Text Mining  
Text Books And Reference Books: 1. S. Axler, Linear algebra done right, 2nd ed., Springer, 2017
2. Eldén Lars, Matrix methods in data mining and pattern recognition, Society for Industrial and Applied Mathematics, 2007  
Essential Reading / Recommended Reading 1. E. Davis, Linear algebra and probability for computer science applications, CRC Press, 2012
2.Friedberg, Stephen H., Arnold J., & L.Spence , Linear algebra,4th ed., Pearson, 2014
3. Hoffman, Kenneth, & Kunze, Ray Alden , Linear Algebra,2nd ed., Pearson, 2015
4. D. A. Simovici, Linear algebra tools for data mining, World Scientific Publishing, 2012
5. J.V. Kepner and J.R. Gilbert, Graph algorithms in the language of linear algebra, Society for Industrial and Applied Mathematics, 2011
6. P.N. Klein, Coding the matrix:linear algebra through applications to computer science, Newtonian Press, 2015  
Evaluation Pattern CIA I : 10% CIA II : 25% CIA III : 10% Attendance : 5% ESE : 50%  
MDS132  PROBABILITY AND DISTRIBUTION THEORY (2020 Batch)  
Total Teaching Hours for Semester:60 
No of Lecture Hours/Week:4 
Max Marks:100 
Credits:4 
Course Objectives/Course Description 

To enable the students to understand the properties and applications of various probability functions. 

Learning Outcome 

CO1: Demonstrate the random variables and its functions CO2: Infer the expectations for random variable functions and generating functions. CO3: Demonstrate various discrete and continuous distributions and their usage 
Unit1 
Teaching Hours:12 
ALGEBRA OF PROBABILITY


Algebra of sets  fields and sigma  fields, Inverse function Measurable function – Probability measure on a sigma field – simple properties  Probability space  Random variables and Random vectors – Induced Probability space – Distribution functions – Decomposition of distribution functions.  
Unit2 
Teaching Hours:12 
EXPECTATION AND MOMENTS OF RANDOM VARIABLES


Definitions and simple properties  Moment inequalities – Holder, Jenson Inequalities – Characteristic function – definition and properties – Inversion formula. Convergence of a sequence of random variables  convergence in distribution  convergence in probability almost sure convergence and convergence in quadratic mean  Weak and Complete convergence of distribution functions – Helly  Bray theorem.  
Unit3 
Teaching Hours:12 
LAW OF LARGE NUMBERS


Khintchin's weak law of large numbers, Kolmogorov strong law of large numbers (statement only) – Central Limit Theorem – Lindeberg – Levy theorem, Linderberg – Feller theorem (statement only), Liapounov theorem – Relation between Liapounov and Linderberg –Feller forms – Radon Nikodym theorem and derivative (without proof) – Conditional expectation – definition and simple properties.  
Unit4 
Teaching Hours:12 
DISTRIBUTION THEORY


Distribution of functions of random variables – Laplace, Cauchy, Inverse Gaussian, Lognormal, Logarithmic series and Power series distributions  Multinomial distribution  Bivariate Binomial – Bivariate Poisson – Bivariate Normal  Bivariate Exponential of Marshall and Olkin  Compound, truncated and mixture of distributions, Concept of convolution  Multivariate normal distribution (Definition and Concept only)  Sampling distributions: Non  central chi  square, t and F distributions and their properties.  
Unit5 
Teaching Hours:12 
ORDER STATISTICS


Order statistics, their distributions and properties  Joint and marginal distributions of order statistics  Distribution of range and mid range Extreme values and their asymptotic distributions (concepts only)  Empirical distribution function and its properties – Kolmogorov  Smirnov distributions – Life time distributions Exponential and Weibull distributions  Mills ratio – Distributions classified by hazard rate  
Text Books And Reference Books: 1. Modern Probability Theory, B.R Bhat, New Age International, 4th Edition, 2014. 2. An Introduction to Probability and Statistics, V.K Rohatgi and Saleh, 3rd Edition, 2015  
Essential Reading / Recommended Reading 1. Introduction to the theory of statistics, A.M Mood, F.A Graybill and D.C Boes, Tata McGrawHill, 3rd Edition (Reprint), 2017. 2. Order Statistics, H.A David and H.N Nagaraja, John Wiley & Sons, 3rd Edition, 2003.  
Evaluation Pattern CIA: 50% ESE: 50%  
MDS132L  PROBABILITY AND DISTRBUTION THEORY (2020 Batch)  
Total Teaching Hours for Semester:60 
No of Lecture Hours/Week:4 
Max Marks:100 
Credits:4 
Course Objectives/Course Description 

Course Objectives To enable the students to understand the properties and applications of various probability functions. 

Learning Outcome 

CO1: Demonstrate the random variables and its functions CO2: Infer the expectations for random variable functions and generating functions. CO3: Demonstrate various discrete and continuous distributions and their usage 
Unit1 
Teaching Hours:12 
ALGEBRA OF PROBABILITY


Algebra of sets  fields and sigma  fields, Inverse function Measurable function – Probability measure on a sigma field – simple properties  Probability space  Random variables and Random vectors – Induced Probability space – Distribution functions –Decomposition of distribution functions.  
Unit2 
Teaching Hours:12 
EXPECTATION AND MOMENTS OF RANDOM VARIABLES


Definitions and simple properties  Moment inequalities – Holder, Jenson Inequalities – Characteristic function – definition and properties – Inversion formula. Convergence of a sequence of random variables  convergence in distribution  convergence in probability almost sure convergence and convergence in quadratic mean  Weak and Complete convergence of distribution functions – Helly  Bray theorem.  
Unit3 
Teaching Hours:12 
LAW OF LARGE NUMBERS


Khintchin's weak law of large numbers, Kolmogorov strong law of large numbers (statement only) – Central Limit Theorem – Lindeberg – Levy theorem, Linderberg – Feller theorem (statement only), Liapounov theorem – Relation between Liapounov and Linderberg –Feller forms – Radon Nikodym theorem and derivative (without proof) – Conditional expectation – definition and simple properties.  
Unit4 
Teaching Hours:12 
DISTRIBUTION THEORY


Distribution of functions of random variables – Laplace, Cauchy, Inverse Gaussian, Lognormal, Logarithmic series and Power series distributions  Multinomial distribution  Bivariate Binomial – Bivariate Poisson – Bivariate Normal  Bivariate Exponential of Marshall and Olkin  Compound, truncated and mixture of distributions, Concept of convolution  Multivariate normal distribution (Definition and Concept only)  Sampling distributions: Noncentral chisquare, t and F distributions and their properties.  
Unit5 
Teaching Hours:12 
ORDER STATISTICS


Order statistics, their distributions and properties  Joint and marginal distributions of order statistics  Distribution of range and mid range Extreme values and their asymptotic distributions (concepts only)  Empirical distribution function and its properties – Kolmogorov  Smirnov distributions – Life time distributions Exponential and Weibull distributions  Mills ratio – Distributions classified by hazard rate.  
Text Books And Reference Books: 1. B.R Bhat, Modern Probability Theory, New Age International, 4^{th} Edition, 2014. 2. V.K Rohatgi and Saleh, An Introduction to Probability and Statistics, 3^{rd} Edition, 2015.  
Essential Reading / Recommended Reading 1. A.M Mood, F.A Graybill and D.C Boes, Introduction to the theory of statistics, Tata McGrawHill, 3^{rd} Edition (Reprint), 2017. 2. H.A David and H.N Nagaraja, Order Statistics, John Wiley & Sons, 3^{rd} Edition, 2003.  
Evaluation Pattern CIA  50% ESE  50%  
MDS133  PRINCIPLES OF DATA SCIENCE (2020 Batch)  
Total Teaching Hours for Semester:60 
No of Lecture Hours/Week:4 
Max Marks:100 
Credits:4 
Course Objectives/Course Description 

To provide strong foundation for data science and application area related to information technology and understand the underlying core concepts and emerging technologies in data science 

Learning Outcome 

CO1:Explore the fundamental concepts of data science CO2:Understand data analysis techniques for applications handling large data CO3:Understand various machine learning algorithms used in data science process CO4:Visualize and present the inference using various tools CO5:Learn to think through the ethics surrounding privacy, data sharing and algorithmic decisionmaking

Unit1 
Teaching Hours:10 
INTRODUCTION TO DATA SCIENCE


Definition – Big Data and Data Science Hype – Why data science – Getting Past the Hype – The Current Landscape – Who is Data Scientist?  Data Science Process Overview – Defining goals – Retrieving data – Data preparation – Data exploration – Data modeling – Presentation.  
Unit2 
Teaching Hours:12 
BIG DATA


Problems when handling large data – General techniques for handling large data – Case study – Steps in big data – Distributing data storage and processing with Frameworks – Case study.  
Unit3 
Teaching Hours:12 
MACHINE LEARNING


Machine learning – Modeling Process – Training model – Validating model – Predicting new observations –Supervised learning algorithms – Unsupervised learning algorithms.  
Unit4 
Teaching Hours:12 
DEEP LEARNING


Introduction – Deep Feedforward Networks – Regularization – Optimization of Deep Learning – Convolutional Networks – Recurrent and Recursive Nets – Applications of Deep Learning.  
Unit5 
Teaching Hours:14 
DATA VISUALIZATION


Introduction to data visualization – Data visualization options – Filters – MapReduce – Dashboard development tools – Creating an interactive dashboard with dc.jssummary.  
Unit5 
Teaching Hours:14 
ETHICS AND RECENT TRENDS


Data Science Ethics – Doing good data science – Owners of the data  Valuing different aspects of privacy  Getting informed consent  The Five Cs – Diversity – Inclusion – Future Trends.  
Text Books And Reference Books: [1]. Introducing Data Science, Davy Cielen, Arno D. B. Meysman, Mohamed Ali, Manning Publications Co., 1st edition, 2016 [2]. An Introduction to Statistical Learning: with Applications in R, Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani, Springer, 1st edition, 2013 [3]. Deep Learning, Ian Goodfellow, Yoshua Bengio, Aaron Courville, MIT Press, 1st edition, 2016 [4]. Ethics and Data Science, D J Patil, Hilary Mason, Mike Loukides, O’ Reilly, 1st edition, 2018  
Essential Reading / Recommended Reading [1]. Data Science from Scratch: First Principles with Python, Joel Grus, O’Reilly, 1st edition, 2015 [2]. Doing Data Science, Straight Talk from the Frontline, Cathy O'Neil, Rachel Schutt, O’Reilly, 1st edition, 2013 [3]. Mining of Massive Datasets, Jure Leskovec, Anand Rajaraman, Jeffrey David Ullman, Cambridge University Press, 2nd edition, 2014  
Evaluation Pattern CIA : 50 % ESE : 50 %  
MDS133L  PRINCIPLE OF DATA SCIENCE (2020 Batch)  
Total Teaching Hours for Semester:60 
No of Lecture Hours/Week:4 
Max Marks:100 
Credits:4 
Course Objectives/Course Description 

Course Description: To provide strong foundation for Data Science and related areas of application. The course includes with the fundamentals of data science, different techniques for handing big data and machine learning algorithms for supervised and unsupervised learning. The importance of handling data in an ethical manner and the ethical practices to be adopted while dealing the data is also a part of the course. Course Objectives: To enable students to understand the underlying core concepts and emerging technologies in Data Science.


Learning Outcome 

CO1: Explore the fundamental concepts of Data Science CO2: Understand the data analysis techniques for applications handling large data CO3: Understand and apply the various machine learning algorithms used in data science process CO4: Visualize and present the inference using various tools. CO5: Learn to think through the ethics surrounding privacy, data sharing and algorithmic decisionmaking and follow ethical practices while dealing with data. 
Unit1 
Teaching Hours:10 

INTRODUCTION TO DATA SCIENCE


Big Data and Data Science Hype – Why data science – Getting Past the Hype – The current Landscape – Data Science Process Overview  
Unit2 
Teaching Hours:12 

BIG DATA


Problems when handling large data General techniques for handling large data –Case study  
Unit3 
Teaching Hours:12 

MACHINE LEARNING


Machine learning –Modelling Process Training model – Validating model Supervised learning algorithms Unsupervised learning algorithms  
Unit4 
Teaching Hours:12 

DEEP LEARNING


Deep Feedforward Networks – Regularization – Optimization of Deep Learning – Convolutional Networks – Recurrent and Recursive Nets – Applications of Deep Learning  
Unit5 
Teaching Hours:14 

DATA VISUALIZATION


Data visualization options –Filters – MapReduce – Dashboard development tools Creating an interactive dashboard with dc.jssummary ETHICS AND RECENT TRENDS Data Science Ethics – Doing good data science – Owners of the data The Five Cs – Diversity – Inclusion –Future Trends  
Text Books And Reference Books: T1. Introducing Data Science, Davy Cielen, Amo D.B. Meysman, Mohammed Ali, Manning Publications Co., 1^{st} Edition, 2016 T2. An Introduction to Statistical Learning: with Applications in R, Gareth James, Daniela Witten, Trevor Hastic, Robert Tibshirani, Springer, 1^{st} edition, 2013 T3. Deep learning, Ian Goodfellow, Yoshua Bengio, Aaron Courville, MIT Press, 1^{st} Edition, 2016 T4. Ethics and Data Science, D J Patil, Hilary mason, Mike Loukides, O’ Reilly, 1^{st} Edition, 2018  
Essential Reading / Recommended Reading R1. Data Science from Scratch: First Principles with Python, Joel Grus, O’Reilly, 1^{st} Edition, 2015
R2.Doing Data Science, Straight talk from the Frontline, Cathy O’Neil, Rachel Schutt, O’ Reilly, 1^{st} Edition, 2013 R3. Mining of Massive Datasets, Jure Leskovee, Anand Rajaraman, Jeffrey David Ullman, Cambridge University Press, 2^{nd} edition, 2014  
Evaluation Pattern
 
MDS134  RESEARCH METHODOLOGY (2020 Batch)  
Total Teaching Hours for Semester:30 
No of Lecture Hours/Week:2 

Max Marks:50 
Credits:2 

Course Objectives/Course Description 

This course is intended to assist students in planning and carrying out research work.The students are exposed to the basic principles, procedures and techniques of implementing a research project. To introduce the research concept and the various research methodologies is the main objective. It focuses on finding out the research gap from the literature and encourages lateral, strategic and creative thinking. This course also introduces computer technology and basic statistics required for research and reporting the research outcomes scientifically emphasizing on research ethics.


Learning Outcome 

CO1: Understand the essense of research and the necessity of defining a research problem. CO2: Apply research methods and methodology including research design,data collection, data analysis, and interpretation. CO3: Create scientific reports according to specified standards.

Unit1 
Teaching Hours:8 
RESEARCH METHODOLOGY


Defining research problem:Selecting the problem, Necessity of defining the problem ,Techniques involved in defining a problem Ethics in Research.  
Unit2 
Teaching Hours:8 
RESEARCH DESIGN


Principles of experimental design,Working with Literature: Importance, finding literature, Using your resources, Managing the literature, Keep track of references,Using the literature, Literature review,Online Searching: Database ,SCIFinder, Scopus, Science Direct ,Searching research articles , Citation Index ,Impact Factor ,Hindex.  
Unit3 
Teaching Hours:7 
RESEARCH DATA


Measurement of Scaling: Quantitative, Qualitative, Classification of Measure scales, Data Collection, Data Preparation.  
Unit4 
Teaching Hours:7 
REPORT WRITING


Scientific Writing and Report Writing: Significance, Steps, Layout, Types, Mechanics and Precautions, Latex: Introduction, Text, Tables, Figures, Equations, Citations, Referencing, and Templates (IEEE style), Paper writing for international journals, Writing scientific report.  
Text Books And Reference Books: [1] C. R. Kothari, Research Methodology Methods and Techniques, 3rd. ed. New Delhi: New Age International Publishers, Reprint 2014. [2] Zina O’Leary, The Essential Guide of Doing Research, New Delhi: PHI, 2005.  
Essential Reading / Recommended Reading [1] J. W. Creswell, Research Design: Qualitative, Quantitative, and Mixed Methods Approaches, 4thed. SAGE Publications, 2014. [2] Kumar, Research Methodology: A Step by Step Guide for Beginners, 3rd. ed. Indian: PE, 2010.  
Evaluation Pattern CIA  50% ESE  50%  
MDS134L  RESEARCH METHODOLOGY (2020 Batch)  
Total Teaching Hours for Semester:30 
No of Lecture Hours/Week:2 
Max Marks:50 
Credits:2 
Course Objectives/Course Description 

The research methodology module is intended to assist students in planning and carrying out research projects. The students are exposed to the principles, procedures and techniques of implementing a research project. The course starts with an introduction to research and carries through the various methodologies involved. It continues with finding out the literature using computer technology, basic statistics required for research and ends with linear regression. 

Learning Outcome 

CO1: Define research and describe the research process and research methods CO2: Understand and apply basic research methods including research design, data analysis, and interpretation 
Unit1 
Teaching Hours:8 
RESEARCH METHODOLOGY


Defining research problem  selecting the problem  necessity of defining the problem  techniques involved in defining a problem  Ethics in Research.  
Unit2 
Teaching Hours:8 
RESEARCH DESIGN


Principles of experimental design Working with Literature: Importance, finding literature, using your resources, managing the literature, keep track of references, using the literature, literature review. Online Searching: Database – SCIFinder – Scopus  Science Direct  Searching research articles  Citation Index  Impact Factor  Hindex etc.  
Unit3 
Teaching Hours:7 
RESEARCH DATA


Measurement of Scaling: Quantitative, Qualitative, Classification of Measure scales, Data Collection, Data Preparation.  
Unit4 
Teaching Hours:7 
REPORT WRITING


Scientific Writing and Report Writing: Significance, Steps, Layout, Types, Mechanics and Precautions, Latex: Introduction, text, tables, figures, equations, citations, referencing, and templates (IEEE style), paper writing for international journals, Writing scientific report.  
Text Books And Reference Books: [1] C. R. Kothari, Research Methodology Methods and Techniques, 3rd. ed. New Delhi: New Age International Publishers, Reprint 2014. [2] Zina O’Leary, The Essential Guide of Doing Research, New Delhi: PHI, 2005.  
Essential Reading / Recommended Reading [1] J. W. Creswell, Research Design: Qualitative, Quantitative, and Mixed Methods Approaches, 4thed. SAGE Publications, 2014. [2] Kumar, Research Methodology: A Step by Step Guide for Beginners, 3rd. ed. Indian: PE, 2010.  
Evaluation Pattern CIA1 Evaluated out of = 20 Marks Converted to = 10 CIA2 Evaluated out of = 50 Marks Converted to = 25 CIA3 Evaluated out of = 20 Marks Converted to = 10
Total CIA marks after conversion = 45 Attendance Marks = 5 Final Marks = 50
 
MDS161A  INTRODUCTION TO STATISTICS (2020 Batch)  
Total Teaching Hours for Semester:30 
No of Lecture Hours/Week:2 
Max Marks:50 
Credits:2 
Course Objectives/Course Description 

To enable the students to understand the fundamentals of statistics to apply descriptive measures and probability for data analysis. 

Learning Outcome 

CO1: Demonstrate the history of statistics and present the data in various forms. CO2: Infer the concept of correlation and regression for relating two or more related variables. CO3: Demonstrate the probabilities for various events. 
Unit1 
Teaching Hours:8 
ORGANIZATION AND PRESENTATION OF DATA


Origin and development of Statistics, Scope, limitation and misuse of statistics. Types of data: primary, secondary, quantitative and qualitative data. Types of Measurements: nominal, ordinal, discrete and continuous data. Presentation of data by tables: construction of frequency distributions for discrete and continuous data, graphical representation of a frequency distribution by histogram and frequency polygon, cumulative frequency distributions  
Unit2 
Teaching Hours:8 
DESCRIPTIVE STATISTICS


Measures of location or central tendency: Arthimetic mean, Median, Mode, Geometric mean, Harmonic mean. Partition values: Quartiles, Deciles and percentiles. Measures of dispersion: Mean deviation, Quartile deviation, Standard deviation, Coefficient of variation. Moments: measures of skewness, Kurtosis.  
Unit3 
Teaching Hours:7 
CORRELATION AND REGRESSION


Correlation: Scatter plot, Karl Pearson coefficient of correlation, Spearman's rank correlation coefficient, multiple and partial correlations (for 3 variates only). Regression: Concept of errors, Principles of Least Square, Simple linear regression and its properties.  
Unit4 
Teaching Hours:7 
BASICS OF PROBABILITY


Random experiment, sample point and sample space, event, algebra of events. Definition of Probability: classical, empirical and axiomatic approaches to probability, properties of probability. Theorems on probability, conditional probability and independent events, Laws of total probability, Baye’s theorem and its applications  
Text Books And Reference Books: [1]. Rohatgi V.K and Saleh E, An Introduction to Probability and Statistics, 3rd edition, John Wiley & Sons Inc., New Jersey, 2015. [2]. Gupta S.C and Kapoor V.K, Fundamentals of Mathematical Statistics, 11th edition, Sultan Chand & Sons, New Delhi, 2014.  
Essential Reading / Recommended Reading [1]. Mukhopadhyay P, Mathematical Statistics, Books and Allied (P) Ltd, Kolkata, 2015. [2]. Walpole R.E, Myers R.H, and Myers S.L, Probability and Statistics for Engineers and Scientists, Pearson, New Delhi, 2017. [3]. Montgomery D.C and Runger G.C, Applied Statistics and Probability for Engineers, Wiley India, New Delhi, 2013. [4]. Mood A.M, Graybill F.A and Boes D.C, Introduction to the Theory of Statistics, McGraw Hill, New Delhi, 2008.  
Evaluation Pattern CIA  50% ESE  50%  
MDS161B  INTRODUCTION TO COMPUTERS AND PROGRAMMING (2020 Batch)  
Total Teaching Hours for Semester:30 
No of Lecture Hours/Week:2 
Max Marks:50 
Credits:2 
Course Objectives/Course Description 

To enable the students to understand the fundamental concepts of problem solving and programming structures. 

Learning Outcome 

CO1: Demonstrate the systematic approach for problem solving using computers. CO2: Apply different programming structure with suitable logic for computational problems. 
Unit1 
Teaching Hours:10 
COMPUTERS AND DIGITAL BASICS


Number Representation – Decimal, Binary, Octal, Hexadecimal and BCD numbers – Binary Arithmetic – Binary addition – Unsigned and Signed numbers – one’s and two’s complements of Binary numbers – Arithmetic operations with signed numbers  Number system conversions – Boolean Algebra – Logic gates – Design of Circuits – K  Map  
Unit2 
Teaching Hours:5 
GENERAL PROBLEM SOLVING CONCEPT


Types of Problems – Problem solving with Computers – Difficulties with problem solving – problem solving concepts for the Computer – Constants and Variables – Rules for Naming and using variables – Data types – numeric data – character data – logical data – rules for data types – examples of data types – storing the data in computer  Functions – Operators – Expressions and Equations  
Unit3 
Teaching Hours:5 
PLANNING FOR SOLUTION


Communicating with computer – organizing the solution – Analyzing the problem – developing the interactivity chart – developing the IPO chart – Writing the algorithms – drawing the flow charts – pseudocode – internal and external documentation – testing the solution – coding the solution – software development life cycle.  
Unit4 
Teaching Hours:10 
PROBLEM SOLVING


Introduction to programming structure – pointers for structuring a solution – modules and their functions – cohesion and coupling – problem solving with logic structure. Problem solving with decisions – the decision logic structure – straight through logic – positive logic – negative logic – logic conversion – decision tables – case logic structure  examples.  
Text Books And Reference Books: [1] Thomas L.Floyd and R.P.Jain,“Digital Fundamentals”,8th Edition, Pearson Education,2007. [2] Peter Norton “Introduction to Computers”,6th Edition, Tata Mc Graw Hill, New Delhi,2006.
[3] Maureen Sprankle and Jim Hubbard, Problem solving and programming concepts, PHI, 9th Edition, 2012  
Essential Reading / Recommended Reading [1]. E Balagurusamy, Fundamentals of Computers, TMH, 2011
 
Evaluation Pattern CIA: 50% ESE: 50%  
MDS161BL  INTRODUCTION TO COMPUTERS AND PROGRAMMING (2020 Batch)  
Total Teaching Hours for Semester:30 
No of Lecture Hours/Week:2 
Max Marks:50 
Credits:2 
Course Objectives/Course Description 

To provide foundation for the fundamental concepts of problem solving and programming. The course includes the fundamentals of programming, different types of problemsolving concepts and programming structures to build logic for suitable computational problems. 

Learning Outcome 

CO1: Demonstrate the systematic approach for problem solving using computers.
CO2: Apply different programming structure with suitable logic for computational problems. 
Unit1 
Teaching Hours:10 

COMPUTER AND DIGITAL BASICS


Number Representations  Hexa, octal, binary, decimal  BCD Numbers  Binary Arithmetic  Binary Addition  Unsigned and Signed Numbers  one's and two's complements of Binary Numbers  Arithmetic operations with signed numbers  Number System conversions  Boolean Algebra  Logic Gates Design Circuits  KMap  
Unit2 
Teaching Hours:5 

GENERAL PROBLEMSOLVING CONCEPTS


Types of Problems – Problem solving with Computers – Difficulties with problem solving problem solving concepts for the Computer – Constants and Variables Rules for Naming and using variables – Data types – numeric data – character data – logical data – rules for data types  examples of data types – storing the data in computer  Functions – Operators – Expressions and Equations  
Unit3 
Teaching Hours:5 

PLANNING FOR SOLUTION


Communicating with computer – organizing the solution – Analyzing the problem – developing the interactivity chart – developing the IPO chart  Writing the algorithms – drawing the flow charts – pseudocode –internal and external documentation  testing the solution – coding the solution –software development life cycle.  
Unit4 
Teaching Hours:10 

PROBLEM SOLVING


Introduction to programming structure – pointers for structuring a solution modules and their functions – cohesion and coupling  modules and their functions – cohesion and coupling  problem solving with logic structure Problem solving with decisions – the decision logic structure – straight through logic  positive logic – negative logic – logic conversion – decision tables  case logic structure  examples  
Text Books And Reference Books:
1.Maureen Sprankle and Jim Hubbard, Problem solving and programming concepts, PHI, 9th Edition, 2012  
Essential Reading / Recommended Reading
 
Evaluation Pattern
 
MDS161C  LINUX ADMINISTRATION (2020 Batch)  
Total Teaching Hours for Semester:30 
No of Lecture Hours/Week:2 

Max Marks:50 
Credits:2 

Course Objectives/Course Description 

To Enable the students to excel in the Linux Platform 

Learning Outcome 

CO1: Demostrate the systematic approach for configure the Liux environment CO2: Manage the Linux environment to work with open source data science tools 
Unit1 
Teaching Hours:10 
Module1


RHEL7.5,breaking root password, Understand and use essential tools for handling files, directories, commandline environments, and documentation  Configure local storage using partitions and logical volumes  
Unit2 
Teaching Hours:10 
Module2


Swapping, Extend LVM Partitions,LVM Snapshot  Manage users and groups, including use of a centralized directory for authentication  
Unit3 
Teaching Hours:10 
Module3


Kernel updations,yum and nmcli configuration, Scheduling jobs,at,crontab  Configure firewall settings using firewall config, firewallcmd, or iptables , Configure keybased authentication for SSH ,Set enforcing and permissive modes for SELinux , List and identify SELinux file and process context ,Restore default file contexts  
Text Books And Reference Books: 1. https://access.redhat.com/documentation/enUS/Red_Hat_Enterprise_Linux/7/ 2. https://access.redhat.com/documentation/enUS/Red_Hat_Enterprise_Linux/7/  
Essential Reading / Recommended Reading   
Evaluation Pattern CIA:50% ESE:50%  
MDS171  DATA BASE TECHNOLOGIES (2020 Batch)  
Total Teaching Hours for Semester:90 
No of Lecture Hours/Week:6 
Max Marks:150 
Credits:5 
Course Objectives/Course Description 

The main objective of this course is to fundamental knowledge and practical experience with, database concepts. It includes the concepts and terminologies which facilitate the construction of database tables and write effective queries. Also, to Comprehend Data warehouse and its functions. 

Learning Outcome 

CO1: Design conceptual models of a database using ER modeling CO2: Create and populate a RDBMS for a real life application, with constraints and keys, using SQL CO3: Retrieve any type of information from a data base by formulating complex queries in SQL CO4: Demonstrate various databases CO5: Distinguish database from data warehouse and examine ETL process 
Unit1 
Teaching Hours:16 
INTRODUCTION


Concept and Overview of DBMS, Data Models, Database Languages, Database Administrator, Database Users, Three Schema architecture of DBMS. Basic concepts, Design Issues, Mapping Constraints, Keys, EntityRelationship Diagram, Weak Entity Sets, Extended ER features. Lab Exercises 1. Data Definition, 2. Table Creation 3. Specification of Constraints  
Unit2 
Teaching Hours:16 
RELATIONAL MODEL AND DATABASE DESIGN


SQL and Integrity Constraints, Concept of DDL, DML, DCL. Basic Structure, Set operations, Aggregate Functions, Null Values, Domain Constraints, Referential Integrity Constraints, assertions, views, Nested Subqueries, Functional Dependency, Different anomalies in designing a Database, Normalization : using functional dependencies, BoyceCodd Normal Form, 4NF, 5NF Lab Exercises 1. Insert, Select, Update & Delete Commands 2. Nested Queries & Join Queries 3. Views  
Unit3 
Teaching Hours:10 
INTELLIGENT DATABASES


Active databases, Deductive Databases, Knowledge bases, Multimedia Databases, Multidimensional Data Structures, Image Databases, Text/Document Databases, Video Databases, Audio Databases, Multimedia Database Design.
 
Unit4 
Teaching Hours:16 
DATA WAREHOUSE: THE BUILDING BLOCKS


Defining Features, Data Warehouses and Data Marts, Architectural Types, Overview of the Components, Metadata in the Data warehouse, Data Design and Data Preparation: Principles of Dimensional Modeling, Dimensional Modeling Advanced Topics From Requirements To Data Design, The Star Schema, Star Schema Keys, Advantages of the Star Schema, Star Schema: Examples, Dimensional Modeling: Advanced Topics, Updates to the Dimension Tables, Miscellaneous Dimensions, The Snowflake Schema, Aggregate Fact Tables, Families Oo Stars.  
Unit5 
Teaching Hours:16 
REQUIREMENTS, REALITIES, ARCHITECTURE AND DATA FLOW


Requirements, ETL Data Structures, Extracting, Cleaning and Conforming, Delivering Dimension Tables, Delivering Fact Tables (CH:1,2,3,4,5,6) Lab Exercises: 1. Importing source data structures 2. Design Target Data Structures 3. Create target structure 4. Design and build the ETL mapping  
Unit6 
Teaching Hours:16 
IMPLEMENTATION, OPERATIONS AND ETL SYSTEMS


Development, Operations, Metadata, RealTime ETL Systems. (CH:7,8,9,11) Lab Exercises: 1. Perform the ETL process and transform into data map 2. Create the cube and process it 3. Generating Reports 4. Creating the Pivot table and pivot chart using some existing data  
Text Books And Reference Books: [1]. Henry F. Korth and Silberschatz Abraham, “Database System Concepts”, Mc.Graw Hill. [2]. Thomas Cannolly and Carolyn Begg, “Database Systems, A Practical Approach to Design, Implementation and Management”, Third Edition, Pearson Education, 2007. [3]. The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling, 2nd John Wiley & Sons, Inc. New York, USA, 2002  
Essential Reading / Recommended Reading [1] LiorRokach and OdedMaimon, Data Mining and Knowledge Discovery Handbook, Springer, 2nd edition, 2010.  
Evaluation Pattern CIA: 50% ESE: 50%  
MDS171L  DATABASE TECHNOLOGY (2020 Batch)  
Total Teaching Hours for Semester:90 
No of Lecture Hours/Week:6 
Max Marks:150 
Credits:5 
Course Objectives/Course Description 

The main objective of this course is to fundamental knowledge and practical experience with, database concepts. It includes the concepts and terminologies which facilitate the construction of database tables and write effective queries. Also, to Comprehend Data warehouse and its functions. 

Learning Outcome 

CO1: Design conceptual models of a database using ER modelling CO2: Create and populate a RDBMS for a reallife application, with constraint and keys, using SQL CO3: Retrieve any type of information from a database by formulating complex queries in SQL CO 4: Demonstrate various databases CO 5: Distinguish database from data warehouse and examine ETL process 
Unit1 
Teaching Hours:16 
Introduction


Concept & Overview of DBMS, Data Models, Database Languages, Database Administrator, Database Users, Three Schema architecture of DBMS. Basic concepts, Design Issues, Mapping Constraints, Keys, EntityRelationship Diagram, Weak Entity Sets, Extended ER features  
Unit1 
Teaching Hours:16 
Lab Exercises


1. Data Definition 2. Table Creation 3. Specialization of Constraints  
Unit2 
Teaching Hours:16 
RELATIONAL MODEL AND DATABASE DESIGN


SQL and Integrity Constraints, Concept of DDL, DML, DCL. Basic Structure, Set operations, Aggregate Functions, Null Values, Domain Constraints, Referential Integrity Constraints, assertions, views, Nested Subqueries, Functional Dependency, Different anomalies in designing a Database, Normalization: using functional dependencies, BoyceCodd Normal Form, 4NF, 5NF  
Unit2 
Teaching Hours:16 
Lab Exercises


1. Insert, Select, Update & Delete Commands 2. Nested Queries & Join Queries 3. Views  
Unit3 
Teaching Hours:10 
INTELLIGENT DATABASES


Active databases, Deductive Databases, Knowledge bases, Multimedia Databases, Multidimensional Data Structures, Image Databases, Text/Document Databases, Video Databases, Audio Databases, Multimedia Database Design.  
Unit4 
Teaching Hours:16 
DATA WAREHOUSE: THE BUILDING BLOCKS


Defining Features, Data Warehouses and Data Marts, Architectural Types, Overview of the Components, Metadata in the Data warehouse, Data Design and Data Preparation: Principles of Dimensional Modeling, Dimensional Modeling Advanced Topics From Requirements To Data Design, The Star Schema, Star Schema Keys, Advantages of the Star Schema, Star Schema: Examples, Dimensional Modeling: Advanced Topics, Updates to the Dimension Tables, Miscellaneous Dimensions, The Snowflake Schema, Aggregate Fact Tables, Families Oo Stars  
Unit5 
Teaching Hours:16 
REQUIREMENTS, REALITIES, ARCHITECTURE AND DATA FLOW


Requirements, ETL Data Structures, Extracting, Cleaning and Conforming, Delivering Dimension Tables, Delivering Fact Tables (CH:1,2,3,4,5,6)
Lab Exercises: 1. Importing source data structures 2. Design Target Data Structures 3. Create a target structure 4. Design and build the ETL mapping  
Unit6 
Teaching Hours:16 
IMPLEMENTATION, OPERATIONS AND ETL SYSTEMS:


Development, Operations, Metadata, RealTime ETL Systems. (CH:7,8,9,11) Lab Exercises: 1. Perform the ETL process and transform into data map 2. Create the cube and process it 3. Generating Reports 4. Creating the Pivot table and pivot chart using some existing data  
Text Books And Reference Books: [1]. Henry F. Korth and Silberschatz Abraham, “Database System Concepts”, Mc.Graw Hill. [2]. Thomas Cannolly and Carolyn Begg, “Database Systems, A Practical Approach to Design, Implementation and Management”, Third Edition, Pearson Education, 2007. [3]. The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling, 2nd John Wiley & Sons, Inc. New York, USA, 2002  
Essential Reading / Recommended Reading [1] LiorRokach and OdedMaimon, Data Mining and Knowledge Discovery Handbook, Springer, 2nd edition, 2010.  
Evaluation Pattern CIA  50% ESE  50%  
MDS172  INFERENTIAL STATISTICS (2020 Batch)  
Total Teaching Hours for Semester:90 
No of Lecture Hours/Week:6 
Max Marks:150 
Credits:5 
Course Objectives/Course Description 

This course is designed to introduce the concepts of theory of estimation and testing of hypothesis. This paper also deals with the concept of parametric tests for large and small samples. It also provides knowledge about nonparametric tests and its applications. 

Learning Outcome 

CO1: Demonstrate the concepts of point and interval estimation of unknown parameters and their significance using large and small samples. CO2: Apply the idea of sampling distributions of difference statistics in testing of hypotheses. CO3: Infer the concept of nonparametric tests for single sample and two samples. 
Unit1 
Teaching Hours:15 
SUFFICIENT STATISTICS


Neyman  Fisher Factorisation theorem  the existence and construction of minimal sufficient statistics  Minimal sufficient statistics and exponential family  sufficiency and completeness  sufficiency and invariance. Lab Exercise
 
Unit2 
Teaching Hours:15 
UNBIASED ESTIMATION


Minimum variance unbiased estimation  locally minimum variance unbiased estimators  Rao Blackwell – theorem – Completeness: Lehmann Scheffe theorems  Necessary and sufficient condition for unbiased estimators  Cramer Rao lower bound  Bhattacharya system of lower bounds in the 1parameter regular case  Chapman Robbins inequality Lab Exercise
 
Unit3 
Teaching Hours:15 
MAXIMUM LIKELIHOOD ESTIMATION


Computational routines  strong consistency of maximum likelihood estimators  Asymptotic Efficiency of maximum likelihood estimators  Best Asymptotically Normal estimators  Method of moments  Bayes’ and minimax estimation: The structure of Bayes’ rules  Bayes’ estimators for quadratic and convex loss functions  minimax estimation  interval estimation. Lab Exercise
 
Unit4 
Teaching Hours:15 
HYPOTHESIS TESTING


Uniformly most powerful tests  the NeymanPearson fundamental Lemma  Distributions with monotone likelihood ratio  Problems  Generalization of the fundamental lemma, two sided hypotheses  testing the mean and variance of a normal distribution. Lab Exercise
 
Unit5 
Teaching Hours:15 
MEAN TESTS


Unbiasedness for hypotheses testing  similarity and completeness  UMP unbiased tests for multi parameter exponential families  comparing two Poisson or Binomial populations  testing the parameters of a normal distribution (unbiased tests)  comparing the mean and variance of two normal distributions  Symmetry and invariance  maximal invariance  most powerful invariant tests. Lab Exercise
 
Unit6 
Teaching Hours:15 
SEQUENTIAL TESTS


SPRT procedures  likelihood ratio tests  locally most powerful tests  the concept of confidence sets  non parametric tests. Lab Exercise
 
Text Books And Reference Books: [1]. Rajagopalan M and Dhanavanthan P, Statistical Inference, PHI Learning (P) Ltd, New Delhi, 2012. [2]. An Introduction to Probability and Statistics, V.K Rohatgi and Saleh, 3rd Edition, 2015.  
Essential Reading / Recommended Reading [1]. Introduction to the theory of statistics, A.M Mood, F.A Graybill and D.C Boes, Tata McGrawHill, 3rd Edition (Reprint), 2017. [2]. Linear Statistical Inference and its Applications, Rao C.R, Willy Publications, 2nd Edition, 2001.  
Evaluation Pattern CIA  50% ESE  50%  
MDS172L  INFERENTIAL STATISTICAL LABORATORY (2020 Batch)  
Total Teaching Hours for Semester:90 
No of Lecture Hours/Week:6 
Max Marks:150 
Credits:5 
Course Objectives/Course Description 

This course is designed to introduce the concepts of theory of estimation and testing of hypothesis. This paper also deals with the concept of parametric tests for large and small samples. It also provides knowledge about nonparametric tests and its applications 

Learning Outcome 

CO1: Demonstrate the concepts of point and interval estimation of unknown parameters and their significance using large and small samples. CO2: Apply the idea of sampling distributions of different statistics in testing of hypotheses. CO3: Infer the concept of nonparametric tests for single sample and two samples. 
Unit1 
Teaching Hours:15 
SUFFICIENT STATISTICS


Neyman  Fisher Factorisation theorem  the existence and construction of minimal sufficient statistics  Minimal sufficient statistics and exponential family  sufficiency and completeness  sufficiency and invariance. Lab Excercise 1. Drawing random samples using random number tables. 2. Point estimation of parameters and obtaining estimates of standard errors.
 
Unit2 
Teaching Hours:15 
UNBIASED ESTIMATION


Minimum variance unbiased estimation  locally minimum variance unbiased estimators  Rao Blackwell – theorem – Completeness: Lehmann Scheffe theorems  Necessary and sufficient condition for unbiased estimators  Cramer Rao lower bound  Bhattacharya system of lower bounds in the 1parameter regular case  Chapman Robbins inequality Lab Excercise 1. Comparison of estimators by plotting mean square error. 2. Computing maximum likelihood estimates 1 3. Computing maximum likelihood estimates  2 4. Computing moment estimates  
Unit3 
Teaching Hours:15 
MAXIMUM LIKELIHOOD ESTIMATION


Computational routines  strong consistency of maximum likelihood estimators  Asymptotic Efficiency of maximum likelihood estimators  Best Asymptotically Normal estimators  Method of moments  Bayes’ and minimax estimation: The structure of Bayes’ rules  Bayes’ estimators for quadratic and convex loss functions  minimax estimation  interval estimation. Lab Exercise: 1. Constructing confidence intervals based on large samples. 2. Constructing confidence intervals based on small samples. 3. Generating random samples from discrete distributions. 4. Generating random samples from continuous distributions.  
Unit4 
Teaching Hours:15 
HYPOTHESIS TESTING


Uniformly most powerful tests  the NeymanPearson fundamental Lemma  Distributions with monotone likelihood ratio  Problems  Generalization of the fundamental lemma, two sided hypotheses  testing the mean and variance of a normal distribution. Lab Excercise : 1. Evaluation of probabilities of TypeI and TypeII errors and powers of tests. 2. MP test for parameters of binomial and Poisson distributions. 3. MP test for the mean of a normal distribution and power curve. 4. Tests for mean, equality of means when variance is (i) known, (ii) unknown under normality (small and large samples)  
Unit5 
Teaching Hours:15 
MEAN TESTS


Unbiased ness for hypotheses testing  similarity and completeness  UMP unbiased tests for multiparameter exponential families  comparing two Poisson or Binomial populations  testing the parameters of a normal distribution (unbiased tests)  comparing the mean and variance of two normal distributions  Symmetry and invariance  maximal invariance  most powerful invariant tests. Lab Excercise: 1. Tests for single proportion and equality of two proportions. 2. Tests for variance and equality of two variances under normality 3. Tests for correlation and regression coefficients.  
Unit6 
Teaching Hours:15 
SEQUENCTIAL TESTS


SPRT procedures  likelihood ratio tests  locally most powerful tests  the concept of confidence sets  non parametric tests. Lab Exercise : 1. Tests for the independence of attributes, analysis of categorical data and tests for the goodness of fit.(For uniform, binomial and Poisson distributions) 2. Nonparametric tests. 3. SPRT for binomial proportion and mean of a normal distribution.  
Text Books And Reference Books: [1]. Rajagopalan M and Dhanavanthan P, Statistical Inference, PHI Learning (P) Ltd, New Delhi, 2012. [2]. An Introduction to Probability and Statistics, V.K Rohatgi and Saleh, 3rd Edition, 2015.  
Essential Reading / Recommended Reading [1]. Introduction to the theory of statistics, A.M Mood, F.A Graybill and D.C Boes, Tata McGrawHill, 3rd Edition (Reprint), 2017. [2]. Linear Statistical Inference and its Applications, Rao C.R, Willy Publications, 2nd Edition, 2001.  
Evaluation Pattern CIA  50% ESE  50%  
MDS173  PROGRAMMING FOR DATA SCIENCE IN PYTHON (2020 Batch)  
Total Teaching Hours for Semester:90 
No of Lecture Hours/Week:6 
Max Marks:100 
Credits:4 
Course Objectives/Course Description 

The objective of this course is to provide comprehensive knowledge of python programming paradigms required for Data Science. 

Learning Outcome 

CO1: Demonstrate the use of builtin objects of Python CO2:Demonstrate significant experience with python program development environment CO3:Implement numerical programming, data handling and visualization through NumPy, Pandas and MatplotLibmodules. 
Unit1 
Teaching Hours:17 
INTRODUCTION TO PYTHON


Structure of Python ProgramUnderlying mechanism of Module ExecutionBranching and LoopingProblem Solving Using Branches and LoopsFunctions  Lists and Mutability Problem Solving Using Lists and Functions
Lab Exercises1. Demonstrate usage of branching and loopingstatements 2. Demonstrate Recursivefunctions 3. DemonstrateLists  
Unit2 
Teaching Hours:17 
SEQUENCE DATATYPES AND OBJECTORIENTED PROGRAMMING


Sequences, Mapping and Sets Dictionaries Classes: Classes and InstancesInheritance Exceptional HandlingIntroduction to Regular Expressions using “re” module. Lab Exercises1. Demonstrate Tuples andSets 2. DemonstrateDictionaries 3. Demonstrate inheritance and exceptionalhandling 4. Demonstrate use of“re”  
Unit3 
Teaching Hours:13 
USING NUMPY


Basics of NumPyComputation on NumPyAggregationsComputation on Arrays Comparisons, Masks and Boolean ArraysFancy IndexingSorting ArraysStructured Data: NumPy’s Structured Array. Lab Exercises1. DemonstrateAggregation 2. Demonstrate Indexing andSorting  
Unit4 
Teaching Hours:13 
DATA MANIPULATION WITH PANDAS I


Introduction to Pandas ObjectsData indexing and SelectionOperating on Data in Pandas Handling Missing DataHierarchical Indexing  Combining Data Sets Lab Exercises1. Demonstrate handling of missingdata 2. Demonstrate hierarchicalindexing  
Unit5 
Teaching Hours:17 
DATA MANIPULATION WITH PANDAS II


Aggregation and GroupingPivot TablesVectorized String Operations Working with Time SeriesHigh Performance Pandas and query() Lab Exercises1. Demonstrate usage of Pivottable 2. Demonstrate use of andquery()  
Unit6 
Teaching Hours:13 
VISUALIZATION AND MATPLOTLIB


Basic functions of matplotlibSimple Line Plot, Scatter PlotDensity and Contour Plots Histograms, Binnings and DensityCustomizing Plot Legends, Colour BarsThree Dimensional Plotting in Matplotlib. Lab Exercises1. DemonstrateScatterPlot 2. Demonstrate3Dplotting  
Text Books And Reference Books: [1]. Jake VanderPlas ,Python Data Science Handbook  Essential Tools for Working with Data, O’Reily Media,Inc, 2016 [2]. Zhang.Y ,An Introduction to Python and Computer Programming, Springer Publications,2016  
Essential Reading / Recommended Reading [1].JoelGrus,DataSciencefromScratchFirstPrincipleswithPython,O’ReillyMedia,2016 [2]. T.R.Padmanabhan, Programming with Python,SpringerPublications,2016  
Evaluation Pattern ESE 50%CIA 50%  
MDS173L  PROGRAMMING OF DATA SCIENCE IN PYTHON (2020 Batch)  
Total Teaching Hours for Semester:90 
No of Lecture Hours/Week:6 
Max Marks:100 
Credits:4 
Course Objectives/Course Description 

The objective of this course is to provide knowledge of python programming paradigms required for Data Science. 

Learning Outcome 

CO1: Understand and demonstrate the usage of builtin objects in Python CO2:Analyze the significance of python program development environment and apply it to solve real world applications CO3: Implement numerical programming, data handling and visualization through NumPy, Pandas and MatplotLib modules. 
Unit1 
Teaching Hours:17 

INTRODUCTION TO PYTHON


Structure of Python ProgramUnderlying mechanism of Module ExecutionBranching and LoopingProblem Solving Using Branches and LoopsFunctions  Lists and Mutability Problem Solving Using Lists and Functions  
Unit2 
Teaching Hours:17 

SEQUENCE DATATYPES AND OBJECTORIENTED PROGRAMMING


Sequences, Mapping and Sets Dictionaries Classes: Classes and InstancesInheritance Exceptional HandlingIntroduction to Regular Expressions using “re” module.  
Unit3 
Teaching Hours:13 

USING NUMPY


Basics of NumPyComputation on NumPyAggregationsComputation on Arrays Comparisons, Masks and Boolean ArraysFancy IndexingSorting ArraysStructured Data: NumPy’s Structured Array.  
Unit4 
Teaching Hours:13 

DATA MANIPULATION WITH PANDAS I


Introduction to Pandas ObjectsData indexing and SelectionOperating on Data in Pandas Handling Missing DataHierarchical Indexing  Combining Data Sets  
Unit5 
Teaching Hours:17 

DATA MANIPULATION WITH PANDAS II


Aggregation and GroupingPivot TablesVectorized String Operations Working with Time SeriesHigh Performance Pandas and query()  
Unit6 
Teaching Hours:13 

VISUALIZATION AND MATPLOTLIB


Basic functions of matplotlibSimple Line Plot, Scatter PlotDensity and Contour Plots Histograms, Binnings and DensityCustomizing Plot Legends, Colour BarsThree Dimensional Plotting in Matplotlib  
Text Books And Reference Books:
1. Jake VanderPlas ,Python Data Science Handbook  Essential Tools for Working with Data, O’Reily Media,Inc, 2016 2. Zhang.Y ,An Introduction to Python and Computer Programming, Springer Publications,2016  
Essential Reading / Recommended Reading
 
Evaluation Pattern
 
MDS231  MATHEMATICAL FOUNDATION FOR DATA SCIENCE  II (2020 Batch)  
Total Teaching Hours for Semester:60 
No of Lecture Hours/Week:4 

Max Marks:100 
Credits:04 

Course Objectives/Course Description 

This course aims at introducing data science related essential mathematics concepts such as fundamentals of topics on Calculus of several variables, Orthogonality, Convex optimization and Graph Theory. 

Learning Outcome 

CO1: Demonstrate the properties of multivariate calculus CO2: Use the idea of orthogonality and projections effectively CO3: Have a clear understanding of Convex Optimization CO4: Know the about the basic terminologies and properties in Graph Theory 
Unit1 
Teaching Hours:18 
Calculus of Several Variables


Functions of Several Variables: Functions of two, three variables  Limits and continuity in HIgher Dimensions: Limits for functions of two variables, Functions of more than two variables  Partial Derivatives: partial derivative of functions of two variables, partial derivatives of functions of more than two variables, partial derivatives and continuity, second order partial derivatives  The Chain Rule: chain rule on functions of two, three variables, chain rule on functions defined on surfaces  Directional Derivative and Gradient vectors: Directional derivatives in a plane, Interpretation of directional derivative, calculation and gradients, Gradients and tangents to level curves.  
Unit2 
Teaching Hours:10 
Orthogonality


Perpendicular vectors and Orthogonality  Inner Products and Projections onto lines  Projections of Rank one  Projections and Least Squares Approximations  Projection Matrices  Orthogonal Bases, Orthogonal Matrices and GramSchmidt orthogonalization  
Unit3 
Teaching Hours:12 
Introduction to Convex Optimization


Affine and Convex Sets: Lines and Line segments, affine sets, affine dimension andrelative interior, convexsets, cones  Hyperplanes and halfspaces  Euclidean balls and ellipsoids Norm balls and Norm cones  polyhedra  simplexes, Convex hull description of polyhedra  The positive semidefinitecone.
 
Unit4 
Teaching Hours:20 
Basic Graph Theory


Graph Classes: Definition of a Graph and Graph terminology, isomorphism of graphs, Completegraphs,bipartitegraphs,completebipartitegraphsVertexdegree:adjacencyand incidence, regular graphs  subgraphs, spanning subgraphs, induced subgraphs, removing or adding edges of a graph, removing vertices from graphs  Graph Operations: Graph Union, intersection, complement, self complement, Paths and Cycles, Connected graphs, Matrix Representation of Graphs, Adjacency matrices, Incidence Matrices, Trees and its properties, Bridges (cutedges), spanning trees, weighted Graphs, minimal spanning tree problems, Shortest path problems, cut vertices, cuts, vertex and edge connectivity, Eulerian and HamiltonianGraphs.
 
Text Books And Reference Books: 1. M. D. Weir, J. Hass, and G. B. Thomas, Thomas' calculus. Pearson, 2016. (Unit 1) 2. G Strang, Linear Algebra and its Applications, 4th ed., Cengage, 2006. (Unit 2) 3. S. P. Boyd and L.Vandenberghe, Convex optimization.Cambridge Univ. Pr., 2011.(Unit 3) 4. J Clark, D A Holton, A first look at Graph Theory, Allied Publishers India, 1995. (Unit 4)
 
Essential Reading / Recommended Reading 1.J. Patterson and A. Gibson, Deep learning: a practitioner's approach. O'Reilly Media, 2017. 2.S. Sra, S. Nowozin, and S. J. Wright, Optimization for machine learning. MIT Press, 2012. 3.D. Jungnickel, Graphs, networks and algorithms. Springer, 2014. 4.D Samovici, Mathematical Analysis for Machine Learning and Data Mining, World Scientific Publishing Co. Pte. Ltd, 2018 5.P. N. Klein, Coding the matrix: linear algebra through applications to computer science. Newtonian Press, 2015. 6.K H Rosen, Discrete Mathematics and its applications, 7th ed., McGraw Hill, 2016  
Evaluation Pattern CIA:50% ESE :50%  
MDS231L  MATHEMATICAL FOUNDATION FOR DATA SCIENCE  II (2020 Batch)  
Total Teaching Hours for Semester:60 
No of Lecture Hours/Week:4 
Max Marks:100 
Credits:4 
Course Objectives/Course Description 

This course aims at introducing data science related essential mathematics concepts such as fundamentals of topics on Calculus of several variables, Orthogonality, Convex optimization and Graph Theory 

Learning Outcome 


Unit1 
Teaching Hours:18 

Calculus of Several Variables


Functions of Several Variables: Functions of two, three variables  Limits and continuity in HIgher Dimensions: Limits for functions of two variables, Functions of more than two variables  Partial Derivatives: partial derivative of functions of two variables, partial derivatives of functions of more than two variables, partial derivatives and continuity, second order partial derivatives  The Chain Rule: chain rule on functions of two, three variables, chain rule on functions defined on surfaces  Directional Derivative and Gradient vectors: Directional derivatives in a plane, Interpretation of directional derivative, calculation and gradients, Gradients and tangents to level curves.  
Unit2 
Teaching Hours:10 

Orthogonality


Perpendicular vectors and Orthogonality  Inner Products and Projections onto lines  Projections of Rank one  Projections and Least Squares Approximations  Projection Matrices  Orthogonal Bases, Orthogonal Matrices and GramSchmidt orthogonalization  
Unit3 
Teaching Hours:12 

Introduction to Convex Optimization


 
Unit4 
Teaching Hours:20 

Basic Graph Theory


Graph Classes: Definition of a Graph and Graph terminology, isomorphism of graphs, Completegraphs,bipartitegraphs,completebipartitegraphsVertexdegree:adjacencyand incidence, regular graphs  subgraphs, spanning subgraphs, induced subgraphs, removing or adding edges of a graph, removing vertices from graphs  Graph Operations: Graph Union, intersection, complement, self complement, Paths and Cycles, Connected graphs, Matrix Representation of Graphs, Adjacency matrices, Incidence Matrices, Trees and its properties, Bridges (cutedges), spanning trees, weighted Graphs, minimal spanning tree problems, Shortest path problems, cut vertices, cuts, vertex and edge connectivity, Eulerian and HamiltonianGraphs  
Text Books And Reference Books: 1. M. D. Weir, J. Hass, and G. B. Thomas, Thomas' calculus. Pearson, 2016. 2. G Strang, Linear Algebra and its Applications, 4th ed., Cengage, 2006. 3. S. P. Boyd and L.Vandenberghe, Convex optimization.Cambridge Univ. Pr., 2011. 4. J Clark, D A Holton, A first look at Graph Theory, Allied Publishers India, 1995.  
Essential Reading / Recommended Reading 1.J. Patterson and A. Gibson, Deep learning: a practitioner's approach. O'Reilly Media, 2017. 2.S. Sra, S. Nowozin, and S. J. Wright, Optimization for machine learning. MIT Press, 2012. 3.D. Jungnickel, Graphs, networks and algorithms. Springer, 2014. 4.D Samovici, Mathematical Analysis for Machine Learning and Data Mining, World Scientific Publishing Co. Pte. Ltd, 2018 5.P. N. Klein, Coding the matrix: linear algebra through applications to computer science. Newtonian Press, 2015. 6.K H Rosen, Discrete Mathematics and its applications, 7th ed., McGraw Hill, 2016  
Evaluation Pattern CIA I : 10% CIA II : 25% CIA III : 10% ATTENDANCE : 5% ESE : 50%
 
MDS232  REGRESSION ANALYSIS (2020 Batch)  
Total Teaching Hours for Semester:60 
No of Lecture Hours/Week:4 

Max Marks:100 
Credits:4 

Course Objectives/Course Description 

This course aims to provide the grounding knowledge about the regression model building of simple and multiple regression. 

Learning Outcome 

CO1: Demonstrate deeper understanding of the linear regression model. CO2: Evaluate Rsquare criteria for model selection CO3: Understand the forward, backward and stepwise methods for selecting the variables CO4: Understand the importance of multicollinearity in regression modelling CO5: Ability touse and understand generalizations of the linear model to binary and count data 
Unit1 
Teaching Hours:15 
SIMPLE LINEAR REGRESSION


Introduction to regression analysis: Modelling a response, overview and applications of regression analysis, major steps in regression analysis. Simple linear regression (Two variables): assumptions, estimation and properties of regression coefficients, significance and confidence intervals of regression coefficients, measuring the quality of the fit.  
Unit2 
Teaching Hours:15 
MULTIPLE LINEAR REGRESSION


Multiple linear regression model: assumptions, ordinary least square estimation of regression coefficients, interpretation and properties of regression coefficient, significance and confidence intervals of regression coefficients.  
Unit3 
Teaching Hours:10 
CRITERIA FOR MODEL SELECTION


Mean Square error criteria, R2 and criteria for model selection; Need of the transformation of variables; BoxCox transformation; Forward, Backward and Stepwise procedures.  
Unit4 
Teaching Hours:10 
RESIDUAL ANALYSIS


Residual analysis, Departures from underlying assumptions, Effect of outliers, Collinearity, Nonconstant variance and serial correlation, Departures from normality, Diagnostics and remedies.  
Unit5 
Teaching Hours:10 
NON LINEAR REGRESSION


Introduction to nonlinear regression, Least squares in the nonlinear case and estimation of parameters, Models for binary response variables, estimation and diagnosis methods for logistic and Poisson regressions. Prediction and residual analysis.  
Text Books And Reference Books: [1].D.C Montgomery, E.A Peck and G.G Vining, Introduction to Linear Regression Analysis, John Wiley and Sons,Inc.NY, 2003. [2]. S. Chatterjee and AHadi, Regression Analysis by Example, 4^{th} Ed., John Wiley and Sons, Inc, 2006 [3].Seber, A.F. and Lee, A.J. (2003) Linear Regression Analysis, John Wiley, Relevant sections from chapters 3, 4, 5, 6, 7, 9, 10.  
Essential Reading / Recommended Reading [1]. Iain Pardoe, Applied Regression Modeling, John Wiley and Sons, Inc, 2012. [2]. P. McCullagh, J.A. Nelder, Generalized Linear Models, Chapman & Hall, 1989.  
Evaluation Pattern CIA  50% ESE  50%  
MDS232L  REGRESSION ANALYSIS (2020 Batch)  
Total Teaching Hours for Semester:60 
No of Lecture Hours/Week:4 
Max Marks:100 
Credits:4 
Course Objectives/Course Description 

This course aims to provide the grounding knowledge about the regression model building of simple and multiple regression. The Course enables Students to


Learning Outcome 

CO1: Demonstrate deeper understanding of the linear regression model. CO2: Evaluate Rsquare criteria for model selection CO3: Understand the forward, backward and stepwise methods for selecting the variables CO4: Understand the importance of multicollinearity in regression modelling
CO5: Ability to use and understand generalizations of the linear model to binary and count data 
Unit1 
Teaching Hours:15 
SIMPLE LINEAR REGRESSION


Introduction to regression analysis: Modelling a response, overview and applications of regression analysis, major steps in regression analysis. Simple linear regression (Two variables): assumptions, estimation and properties of regression coefficients, significance and confidence intervals of regression coefficients, measuring the quality of the fit.  
Unit2 
Teaching Hours:15 
MULTIPLE LINEAR REGRESSION


Unit3 
Teaching Hours:10 
CRITERIA FOR MODEL SELECTION


Mean Square error criteria, R2 and criteria for model selection; Need of the transformation of variables; BoxCox transformation; Forward, Backward and Stepwise procedures.  
Unit4 
Teaching Hours:10 
RESIDUAL ANALYSIS


Residual analysis, Departures from underlying assumptions, Effect of outliers, Collinearity, Nonconstant variance and serial correlation, Departures from normality, Diagnostics and remedies.  
Unit5 
Teaching Hours:10 
NON LINEAR REGRESSION


Introduction to nonlinear regression, Least squares in the nonlinear case and estimation of parameters, Models for binary response variables, estimation and diagnosis methods for logistic and Poisson regressions. Prediction and residual analysis.  
Text Books And Reference Books:
1. G. S. Madala, Introduction to Econometrics, Wiley. 2. C. Brooks, Introductory Econometrics for Finance, 4th Ed., Cambridge University Press, 2019 3. G.G Vining, Introduction to Linear Regression Analysis, John Wiley and Sons,Inc.NY, 2003
 
Essential Reading / Recommended Reading
1. J M. Wooldridge, Introductory Econometrics: A Modern Approach, 5th Ed., SouthWestern, Cengage Learning, 2013. 2. G.G Vining, Introduction to Linear Regression Analysis, John Wiley and Sons,Inc.NY, 2003 3. S. Chatterjee and A. Hadi, Regression Analysis by Example, 4th Ed., John Wiley and Sons, Inc, 2006 4. Iain Pardoe, Applied Regression Modeling, John Wiley and Sons, Inc, 2012.  
Evaluation Pattern CIA I: 10% CIA II: 25% CIA III: 10% Attendance: 5% ESE: 50%  
MDS241A  MULTIVARIATE ANALYSIS (2020 Batch)  
Total Teaching Hours for Semester:60 
No of Lecture Hours/Week:4 
Max Marks:100 
Credits:4 
Course Objectives/Course Description 

This course lays the foundation of Multivariate data analysis. The exposure provided to multivariate data structure, multinomial and multivariate normal distribution, estimation and testing of parameters, various data reduction methods would help the students in having a better understanding of research data, its presentation and analysis. 

Learning Outcome 

CO1: Understand multivariate data structure, multinomial and multivariate normal distribution CO2: Apply Multivariate analysis of variance (MANOVA) of one and twoway classified data. 
Unit1 
Teaching Hours:12 

INTRODUCTION


Basic concepts on multivariate variable. Multivariate normal distribution, Marginal and conditional distribution, Concept of random vector: Its expectation and VarianceCovariance matrix. Marginal and joint distributions. Conditional distributions and Independence of random vectors. Multinomial distribution. Sample mean vector and its distribution.  
Unit2 
Teaching Hours:12 

DISTRIBUTION


Sample mean vector and its distribution. Likelihood ratio tests: Tests of hypotheses about the mean vectors and covariance matrices for multivariate normal populations. Independence of sub vectors and sphericity test.  
Unit3 
Teaching Hours:12 

MULTIVARIATE ANALYSIS


Multivariate analysis of variance (MANOVA) of one and two way classified data. Multivariate analysis of covariance. Wishart distribution, Hotelling’s T2 and Mahalanobis’ D2 statistics, Null distribution of Hotelling’s T2. Rao’s U statistics and its distribution.  
Unit4 
Teaching Hours:12 

CLASSIFICATION AND DISCRIMINANT PROCEDURES


Bayes, minimax, and Fisher’s criteria for discrimination between two multivariate normal populations. Sample discriminant function. Tests associated with discriminant functions. Probabilities of misclassification and their estimation. Discrimination for several multivariate normal populations  
Unit5 
Teaching Hours:12 

PRINCIPAL COMPONENT and FACTOR ANALYSIS


Principal components, sample principal components asymptotic properties. Canonical variables and canonical correlations: definition, estimation, computations. Test for significance of canonical correlations. Factor analysis: Orthogonal factor model, factor loadings, estimation of factor loadings, factor scores. Applications  
Text Books And Reference Books: [1]. Anderson, T.W. 2009. An Introduction to Multivariate Statistical Analysis, 3rd Edition, John Wiley. [2]. Everitt B, Hothorn T, 2011. An Introduction to Applied Multivariate Analysis with R, Springer. [3]. Barry J. Babin, Hair, Rolph E Anderson, and William C. Blac, 2013, Multivariate Data Analysis, Pearson New International Edition,  
Essential Reading / Recommended Reading [1] Giri, N.C. 1977. Multivariate Statistical Inference. Academic Press. [2] Chatfield, C. and Collins, A.J. 1982. Introduction to Multivariate analysis. Prentice Hall [3] Srivastava, M.S. and Khatri, C.G. 1979. An Introduction to Multivariate Statistics. North Holland  
Evaluation Pattern CIA  50% ESE  50%  
MDS241AL  MULTIVARIATE ANALYSIS (2020 Batch)  
Total Teaching Hours for Semester:60 
No of Lecture Hours/Week:4 

Max Marks:100 
Credits:4 

Course Objectives/Course Description 

This course lays the foundation of Multivariate data analysis. The exposure provided to multivariate data structure, multinomial and multivariate normal distribution, estimation and testing of parameters, various data reduction methods would help the students in having a better understanding of research data, its presentation and analysis. 

Learning Outcome 

CO1: Understand multivariate data structure, multinomial and multivariate normal distribution CO2: Apply Multivariate analysis of variance (MANOVA) of one and twoway classified data. 
Unit1 
Teaching Hours:12 

Introduction


Basic concepts on multivariate variable. Multivariate normal distribution, Marginal and conditional distribution, Concept of random vector: Its expectation and VarianceCovariance matrix. Marginal and joint distributions. Conditional distributions and Independence of random vectors. Multinomial distribution. Sample mean vector and its distribution  
Unit2 
Teaching Hours:12 

DISTRIBUTION


Sample mean vector and its distribution. Likelihood ratio tests: Tests of hypotheses about the mean vectors and covariance matrices for multivariate normal populations. Independence of sub vectors and sphericity test  
Unit3 
Teaching Hours:12 

Multivariate Analysis


Multivariate analysis of variance (MANOVA) of one and two way classified data. Multivariate analysis of covariance. Wishart distribution, Hotelling’s T2 and Mahalanobis’ D2 statistics, Null distribution of Hotelling’s T2. Rao’s U statistics and its distribution  
Unit4 
Teaching Hours:12 

Classification and Discriminant Procedures


Bayes, minimax, and Fisher’s criteria for discrimination between two multivariate normal populations. Sample discriminant function. Tests associated with discriminant functions. Probabilities of misclassification and their estimation. Discrimination for several multivariate normal populations  
Unit5 
Teaching Hours:12 

Principal Component and Factor Analysis


Principal components, sample principal components asymptotic properties. Canonical variables and canonical correlations: definition, estimation, computations. Test for significance of canonical correlations. Factor analysis: Orthogonal factor model, factor loadings, estimation of factor loadings, factor scores. Applications  
Text Books And Reference Books: [1]. Anderson, T.W. 2009. An Introduction to Multivariate Statistical Analysis, 3rd Edition, John Wiley. [2]. Everitt B, Hothorn T, 2011. An Introduction to Applied Multivariate Analysis with R, Springer. [3]. Barry J. Babin, Hair, Rolph E Anderson, and William C. Blac, 2013, Multivariate Data Analysis, Pearson New International Edition,  
Essential Reading / Recommended Reading
 
Evaluation Pattern CIA 50% ESE 50%  
MDS241B  STOCHASTIC PROCESS (2020 Batch)  
Total Teaching Hours for Semester:60 
No of Lecture Hours/Week:4 

Max Marks:100 
Credits:4 

Course Objectives/Course Description 

This course is designed to introduce the concepts of theory of estimation and testing of hypothesis. This paper also deals with the concept of parametric tests for large and small samples. It also provides knowledge about nonparametric tests and its applications. 

Learning Outcome 

CO1: Demonstrate the concepts of point and interval estimation of unknown parameters and their significance using large and small samples. CO2: Apply the idea of sampling distributions of difference statistics in testing of hypotheses. CO3: Infer the concept of nonparametric tests for single sample and two samples. 
Unit1 
Teaching Hours:12 
INTRODUCTION TO STOCHASTIC PROCESSES


Classification of Stochastic Processes, Markov Processes – Markov Chain  Countable State Markov Chain. Transition Probabilities, Transition Probability Matrix. Chapman  Kolmogorov's Equations, Calculation of n  step Transition Probability and its limit.  
Unit2 
Teaching Hours:12 
POISSON PROCESS


Classification of States, Recurrent and Transient States  Transient Markov Chain, Random Walk and Gambler's Ruin Problem. Continuous Time Markov Process:, Poisson Processes, Birth and Death Processes, Kolmogorov’s Differential Equations, Applications.  
Unit3 
Teaching Hours:12 
BRANCHING PROCESS


Branching Processes – Galton – Watson Branching Process  Properties of Generating Functions – Extinction Probabilities – Distribution of Total Number of Progeny. Concept of Weiner Process.  
Unit4 
Teaching Hours:12 
RENEWAL PROCESS


Renewal Processes – Renewal Process in Discrete and Continuous Time – Renewal Interval – Renewal Function and Renewal Density – Renewal Equation – Renewal theorems: Elementary Renewal Theorem. Probability Generating Function of Renewal Processes.  
Unit5 
Teaching Hours:12 
STATIONARY PROCESS


Stationary Processes: Discrete Parameter Stochastic Process – Application to Time Series. Autocovariance and Autocorrelation functions and their properties. Moving Average, Autoregressive, Autoregressive Moving Average, Autoregressive Integrated Moving Average Processes. Basic ideas of residual analysis, diagnostic checking, forecasting.  
Text Books And Reference Books: [1]. Stochastic Processes, R.G Gallager, Cambridge University Press, 2013. [2]. Stochastic Processes, S.M Ross, Wiley India Pvt. Ltd, 2008.  
Essential Reading / Recommended Reading [1]. Stochastic Processes from Applications to Theory, P.D Moral and S. Penev, CRC Press, 2016 [2]. Introduction to Probability and Stochastic Processes with Applications, B..C. Liliana, A Viswanathan, S. Dharmaraja, Wiley Pvt. Ltd, 2012.  
Evaluation Pattern CIA  50% ESE  50%  
MDS241BL  STOCHASTIC PROCESS (2020 Batch)  
Total Teaching Hours for Semester:60 
No of Lecture Hours/Week:4 
Max Marks:100 
Credits:4 
Course Objectives/Course Description 

This course is designed to introduce the concepts of theory of estimation and testing of hypothesis. This paper also deals with the concept of parametric tests for large and small samples. It also provides knowledge about nonparametric tests and its applications. 

Learning Outcome 

CO1: Demonstrate the concepts of point and interval estimation of unknown parameters and their significance using large and small samples. CO2: Apply the idea of sampling distributions of the difference statistics in the testing of hypotheses. CO3: Infer the concept of nonparametric tests for single sample and two samples. 
Unit1 
Teaching Hours:12 

INTRODUCTION TO STOCHASTIC PROCESSES


Classification of Stochastic Processes, Markov Processes – Markov Chain  Countable State Markov Chain. Transition Probabilities, Transition Probability Matrix. Chapman  Kolmogorov's Equations, Calculation of n  step Transition Probability and it's limit.  
Unit2 
Teaching Hours:12 

POISSON PROCESS


Classification of States, Recurrent and Transient States  Transient Markov Chain, Random Walk , and Gambler's Ruin Problem. ContinuousTime Markov Process: Poisson Processes, Birth and Death Processes, Kolmogorov’s Differential Equations, Applications.  
Unit3 
Teaching Hours:12 

BRANCHING PROCESS


Branching Processes – Galton – Watson Branching Process  Properties of Generating Functions – Extinction Probabilities – Distribution of Total Number of Progeny. Concept of Weiner Process.  
Unit4 
Teaching Hours:12 

RENEWAL PROCESS


Renewal Processes – Renewal Process in Discrete and Continuous Time – Renewal Interval – Renewal Function and Renewal Density – Renewal Equation – Renewal theorems: Elementary Renewal Theorem. Probability Generating Function of Renewal Processes.  
Unit5 
Teaching Hours:12 

STATIONARY PROCESS


Stationary Processes: Discrete Parameter Stochastic Process – Application to Time Series. Autocovariance and Autocorrelation functions and their properties. Moving Average, Autoregressive, Autoregressive Moving Average, Autoregressive Integrated Moving Average Processes. Basic ideas of residual analysis, diagnostic checking, forecasting.  
Text Books And Reference Books: [1]. Stochastic Processes, R.G Gallager, Cambridge University Press, 2013. [2]. Stochastic Processes, S.M Ross, Wiley India Pvt. Ltd, 2008.  
Essential Reading / Recommended Reading
 
Evaluation Pattern
 
MDS271  MACHINE LEARNING (2020 Batch)  
Total Teaching Hours for Semester:90 
No of Lecture Hours/Week:6 

Max Marks:150 
Credits:5 

Course Objectives/Course Description 

Theobjectiveofthiscourseistoprovideintroductiontotheprinciplesanddesignofmachine learning algorithms. The course is aimed at providing foundations for conceptual aspects of machinelearningalgorithmsalongwiththeirapplicationstosolverealworldproblems. 

Learning Outcome 

CO1: Understand the basic principles of machine learning techniques. CO2:Understandhowmachinelearningproblemsareformulatedandsolved. CO3:Applymachinelearningalgorithmstosolverealworldproblems. 
Unit1 
Teaching Hours:18 
INRTODUCTION


MachineLearningExamplesofMachineApplicationsLearningAssociationsClassification RegressionUnsupervisedLearningReinforcementLearning.SupervisedLearning:Learning class from examples Probably Approach Correct(PAC) LearningNoiseLearning Multiple classes. RegressionModel Selection andGeneralization. IntroductiontoParametricmethodsMaximumLikelihood Estimation:BernoulliDensity Multinomial DensityGaussian Density, Nonparametric Density Estimation: Histogram EstimatorKernel EstimatorKNearest NeighbourEstimator.
Lab Exercise: 1. Data Exploration using parametricMethods 2. Data Exploration using nonparametricMethods 3. Regressionanalysis  
Unit2 
Teaching Hours:18 
DIMENSIONALITY REDUCTION


Dimensionality Reduction: Introduction Subset SelectionPrincipal Component Analysis, Feature EmbeddingFactor AnalysisSingular Value DecompositionMultidimensional ScalingLinear Discriminant Analysis Bayesian Decision Theory. Lab Exercise: 1. Data reduction using Principal ComponentAnalysis 2. Data reduction using multidimensionalscaling  
Unit3 
Teaching Hours:18 
SUPERVISED LEARNING  I


Linear Discrimination: Introduction Generalizing the Linear ModelGeometry of the Linear Discriminant Pairwise SeparationGradient DescentLogistic Discrimination. Kernel MachinesIntroduction optical separating hyperplane vSVM, kernel tricks vertical kernel vertical kernel defining kernel multiclass kernel machines oneclass kernel machines. Lab Exercise 1. Lineardiscrimination 2. Logisticdiscrimination 3. Classification using kernelmachines
 
Unit4 
Teaching Hours:18 
SUPERVISED LEARNING  II


Multilayer perceptronIntroduction, training a perceptron learning Boolean functions multilayer perceptron backpropogation algorithm training procedures. Combining Multiple LearnersRationaleGenerating diverse learners Model combination schemes voting, Bagging Boosting fine tuning an Ensemble. Lab Exercise 1. Classification usingMLP 2. EnsembleLearning
 
Unit5 
Teaching Hours:18 
UNSUPERVISED LEARNING


Clustering IntroductionMixture Densities, KMeans Clustering ExpectationMaximization algorithm Mixtures of Latent Varaible ModelsSupervised Learning after ClusteringSpectral Clustering Hierachial ClusteringClustering Choosing the number of Clusters. Lab Exercise 1. K meansclustering
2. Hierarchicalclustering  
Text Books And Reference Books: [1]. E. Alpaydin, Introduction to Machine Learning, 3^{rd} Edition, MIT Press, 2014.  
Essential Reading / Recommended Reading 1. C.M.Bishop,PatternRecognitionandMachineLearning,Springer,2016.
2. T. Hastie, R. Tibshirani and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference and Prediction, Springer, 2nd Edition,2009 3. K.P.Murphy,MachineLearning:AProbabilisticPerspective,MITPress,2012.
 
Evaluation Pattern ESE 50 CIA 50  
MDS271L  MACHINE LEARNING (2020 Batch)  
Total Teaching Hours for Semester:90 
No of Lecture Hours/Week:6 
Max Marks:150 
Credits:5 
Course Objectives/Course Description 

Course Description and Course Objectives The objective of this course is to provide introduction to the principles and design of machine learning algorithms. The course is aimed at providing foundations for conceptual aspects of machine learning algorithms along with their applications to solve real world problems.


Learning Outcome 

CO1: Understand the basic principles of machine learning techniques. CO2:Understandhowmachinelearningproblemsareformulatedandsolved. CO3:Applymachinelearningalgorithmstosolverealworldproblems 
Unit1 
Teaching Hours:18 
Introduction


Machine LearningExamples of Machine ApplicationsLearningAssociationsClassification RegressionUnsupervisedLearningReinforcement Learning.SupervisedLearning:Learning class from examples Probably Approach Correct(PAC) LearningNoiseLearning Multiple classes. RegressionModel Selection and Generalization. Introduction to Parametric methodsMaximum Likelihood Estimation:Bernoulli Density Multinomial DensityGaussian Density, Nonparametric Density Estimation: Histogram EstimatorKernel EstimatorKNearest NeighbourEstimator.  
Unit1 
Teaching Hours:18 
Lab Exercises:


 
Unit2 
Teaching Hours:18 
DIMENSIONALITY REDUCTION


Introduction Subset SelectionPrincipal Component Analysis, Feature EmbeddingFactor AnalysisSingular Value DecompositionMultidimensional ScalingLinear Discriminant Analysis Bayesian Decision Theory.  
Unit2 
Teaching Hours:18 
Lab Exercise:


 
Unit3 
Teaching Hours:18 
SUPERVISED LEARNING  I


Linear Discrimination: Introduction Generalizing the Linear ModelGeometry of the Linear Discriminant Pairwise SeparationGradient DescentLogistic Discrimination.  
Unit3 
Teaching Hours:18 
Lab Excercises


 
Unit3 
Teaching Hours:18 
Kernel Machines


Introduction optical separating hyperplane vSVM, kernel tricks vertical kernel vertical kernel defining kernel multiclass kernel machines oneclass kernel machines.  
Unit4 
Teaching Hours:18 
SUPERVISED LEARNING  II


Multilayer perceptron Introduction, training a perceptron learning Boolean functions multilayer perceptron backpropogation algorithm training procedures. Combining Multiple Learners RationaleGenerating diverse learners Model combination schemes voting, Bagging Boosting fine tuning an Ensemble.  
Unit4 
Teaching Hours:18 
Lab Exercises


 
Unit5 
Teaching Hours:18 
Lab exercises


 
Unit5 
Teaching Hours:18 
UNSUPERVISED LEARNING


Clustering IntroductionMixture Densities, KMeans Clustering ExpectationMaximization algorithm Mixtures of Latent Varaible ModelsSupervised Learning after ClusteringSpectral Clustering Hierachial ClusteringClustering Choosing the number of Clusters.  
Text Books And Reference Books:
 
Essential Reading / Recommended Reading
 
Evaluation Pattern CIA 50% ESE 50%  
MDS272A  HADOOP (2020 Batch)  
Total Teaching Hours for Semester:90 
No of Lecture Hours/Week:6 
Max Marks:150 
Credits:5 
Course Objectives/Course Description 

The subject is intended to give the knowledge of Big Data evolving in every realtime applications and how they are manipulated using the emerging technologies. This course breaks down the walls of complexity in processing Big Data by providing a practical approach to developing Java applications on top of the Hadoop platform. It describes the Hadoop architecture and how to work with the Hadoop Distributed File System (HDFS) and HBase in Ubuntu platform. 

Learning Outcome 

CO1: Understand the Big Data concepts in real time scenario CO2: Understand the big data systems and identify the main sources of Big Data in the real world. CO3: Demonstrate an ability to use Hadoop framework for processing Big Data for Analytics. CO4: Evaluate the Map reduce approach for different domain problems.

Unit1 
Teaching Hours:15 
INTRODUCTION


Distributed file system – Big Data and its importance, Four Vs, Drivers for Big data, Big data analytics, Big data applications, Algorithms using map reduce, MatrixVector Multiplication by Map Reduce. Apache Hadoop– Moving Data in and out of Hadoop – Understanding inputs and outputs ofMapReduce  Data Serialization, Problems with traditional largescale systemsRequirements for a new approachHadoop – ScalingDistributed FrameworkHadoop v/s RDBMSBrief history of Hadoop.
Lab Exercise
1. Installing and Configuring Hadoop  
Unit2 
Teaching Hours:15 
CONFIGURATIONS OF HADOOP


Hadoop Processes (NN, SNN, JT, DN, TT)Temporary directory – UICommon errors when running Hadoop cluster, solutions. Setting up Hadoop on a local Ubuntu host: Prerequisites, downloading Hadoop, setting up SSH, configuring the pseudodistributed mode, HDFS directory, NameNode, Examples of MapReduce, Using Elastic MapReduce, Comparison of local versus EMR Hadoop. Understanding MapReduce:Key/value pairs,TheHadoop Java API for MapReduce, Writing MapReduce programs, Hadoopspecific data types, Input/output. Developing MapReduce Programs: Using languages other than Java with Hadoop, Analysing a large dataset. Lab Exercise 1. 1. Word count application in Hadoop. 2. 2. Sorting the data using MapReduce. 3. 3. Finding max and min value in Hadoop.  
Unit3 
Teaching Hours:15 
ADVANCED MAPREDUCE TECHNIQUES


Simple, advanced, and inbetween Joins, Graph algorithms, using languageindependent data structures. Hadoop configuration properties  Setting up a cluster, Cluster access control, managing the NameNode, Managing HDFS, MapReduce management, Scaling. Lab Exercise: 1. Implementation of decision tree algorithms using MapReduce. 2. Implementation of Kmeans Clustering using MapReduce. 3. Generation of Frequent Itemset using MapReduce.  
Unit4 
Teaching Hours:15 
HADOOP STREAMING


Hadoop Streaming  Streaming Command Options  Specifying a Java Class as the Mapper/Reducer  Packaging Files With Job Submissions  Specifying Other Plugins for Jobs. Lab Exercise: 1. 1. Count the number of missing and invalid values through joining two large given datasets. 2. 2. Using hadoop’s mapreduce, Evaluating Number of Products Sold in Each Country in the online shopping portal. Dataset is given. 3. 3. Analyze the sentiment for product reviews, this work proposes a MapReduce technique provided by Apache Hadoop.  
Unit5 
Teaching Hours:15 
HIVE & PIG


Architecture, Installation, Configuration, Hive vs RDBMS, Tables, DDL & DML, Partitioning & Bucketing, Hive Web Interface, Pig, Use case of Pig, Pig Components, Data Model, Pig Latin. Lab Exercise 1. Trend Analysis based on Access Pattern over Web Logs using Hadoop. 2. Service Rating Prediction by Exploring Social Mobile Users Geographical Locations.  
Unit6 
Teaching Hours:15 
Hbase


RDBMS VsNoSQL, HBasics, Installation, Building an online query application – Schema design, Loading Data, Online Queries, Successful service. Hands On: Single Node Hadoop Cluster Set up in any cloud service provider How to create instance.How to connect that Instance Using putty.InstallingHadoop framework on this instance. Run sample programs which come with Hadoop framework. Lab Exercise: 1. 1. Big Data Analytics Framework Based Simulated Performance and Operational Efficiencies Through Billons of Patient Records in Hospital System.  
Text Books And Reference Books: [1] Boris lublinsky, Kevin t. Smith, Alexey Yakubovich, Professional Hadoop Solutions, Wiley, 2015. [2] Tom White, Hadoop: The Definitive Guide, O’Reilly Media Inc., 2015. [3] Garry Turkington, Hadoop Beginner's Guide, Packt Publishing, 2013.  
Essential Reading / Recommended Reading [1] Pethuru Raj, Anupama Raman, DhivyaNagaraj and Siddhartha Duggirala, HighPerformance BigData Analytics: Computing Systems and Approaches, Springer, 2015. [2] Jonathan R. Owens, Jon Lentz and Brian Femiano, Hadoop RealWorld Solutions Cookbook, Packt Publishing, 2013. [3] Tom White, HADOOP: The definitive Guide, O Reilly, 2012.  
Evaluation Pattern CIA  50% ESE  50%  
MDS272AL  HADOOP (2020 Batch)  
Total Teaching Hours for Semester:90 
No of Lecture Hours/Week:6 
Max Marks:150 
Credits:5 
Course Objectives/Course Description 

The subject is intended to give the knowledge of Big Data evolving in every realtime applications and how they are manipulated using the emerging technologies. This course breaks down the walls of complexity in processing Big Data by providing a practical approach to developing Java applications on top of the Hadoop platform. It describes the Hadoop architecture and how to work with the Hadoop Distributed File System (HDFS) and HBase in Ubuntu platform. 

Learning Outcome 


Unit1 
Teaching Hours:15 

INTRODUCTION


 
Unit2 
Teaching Hours:15 

CONFIGURATIONS OF HADOOP


 
Unit3 
Teaching Hours:15 

ADVANCED MAPREDUCE TECHNIQUES


 
Unit4 
Teaching Hours:15 

HADOOP STREAMING


 
Unit5 
Teaching Hours:15 

HIVE & PIG


 
Unit6 
Teaching Hours:15 

Hbase


