# Syllabus for Master of Science (Data Science) Academic Year  (2021)

 1 Semester - 2021 - Batch Course Code Course Type Hours Per Week Credits Marks MDS131 MATHEMATICAL FOUNDATION FOR DATA SCIENCE - I Core Courses 4 4 100 MDS132 PROBABILITY AND DISTRIBUTION THEORY Core Courses 4 4 100 MDS133 PRINCIPLES OF DATA SCIENCE Core Courses 4 4 100 MDS134 RESEARCH METHODOLOGY Core Courses 2 2 50 MDS161A INTRODUCTION TO STATISTICS Generic Elective 2 2 50 MDS161B INTRODUCTION TO COMPUTERS AND PROGRAMMING Generic Elective 2 2 50 MDS161C LINUX ADMINISTRATION Generic Elective 2 2 50 MDS171 DATA BASE TECHNOLOGIES Core Courses 6 5 150 MDS172 INFERENTIAL STATISTICS Core Courses 6 5 150 MDS173 PROGRAMMING FOR DATA SCIENCE IN PYTHON Core Courses 6 4 100 2 Semester - 2021 - Batch Course Code Course Type Hours Per Week Credits Marks MDS231 MATHEMATICAL FOUNDATION FOR DATA SCIENCE - II Core Courses 4 4 100 MDS232 REGRESSION ANALYSIS Core Courses 4 4 100 MDS241A MULTIVARIATE ANALYSIS Discipline Specific Elective 4 4 100 MDS241B STOCHASTIC PROCESS Discipline Specific Elective 4 4 100 MDS241C CATEGORICAL DATA ANALYSIS Discipline Specific Elective 4 4 100 MDS271 MACHINE LEARNING Core Courses 6 5 150 MDS272A HADOOP Discipline Specific Elective 6 5 150 MDS272B IMAGE AND VIDEO ANALYTICS Discipline Specific Elective 6 5 150 MDS272C INTERNET OF THINGS Discipline Specific Elective 6 5 150 MDS273 PROGRAMMING FOR DATA SCIENCE IN R Core Courses 6 4 100 3 Semester - 2020 - Batch Course Code Course Type Hours Per Week Credits Marks MDS331 NEURAL NETWORKS AND DEEP LEARNING Core Courses 4 4 100 MDS341A TIME SERIES ANALYSIS AND FORECASTING TECHNIQUES Discipline Specific Elective 4 4 100 MDS341B BAYESIAN INFERENCE Discipline Specific Elective 4 4 100 MDS341C ECONOMETRICS Discipline Specific Elective 4 4 100 MDS341D BIO-STATISTICS Discipline Specific Elective 4 4 100 MDS371 CLOUD ANALYTICS Core Courses 6 5 150 MDS372A NATURAL LANGUAGE PROCESSING Discipline Specific Elective 6 5 150 MDS372B WEB ANALYTICS Discipline Specific Elective 6 5 150 MDS372C BIO INFORMATICS Discipline Specific Elective 6 5 150 MDS372D EVOLUTIONARY ALGORITHMS Discipline Specific Elective 6 5 150 MDS372E OPTIMIZATION TECHNIQUE Discipline Specific Elective 6 5 150 MDS381 SPECIALIZATION PROJECT Core Courses 4 2 100 MDS382 SEMINAR Skill Enhancement Course 2 1 50 4 Semester - 2020 - Batch Course Code Course Type Hours Per Week Credits Marks MDS481 INDUSTRY PROJECT Core Courses 2 12 300

 MDS131 - MATHEMATICAL FOUNDATION FOR DATA SCIENCE - I (2021 Batch) Total Teaching Hours for Semester:60 No of Lecture Hours/Week:4 Max Marks:100 Credits:4 Course Objectives/Course Description Linear Algebra plays a fundamental role in the theory of Data Science. This course aims at introducing the basic notions of vector spaces, Linear Algebra and the use of Linear Algebra in applications to Data Science. Course Outcome CO1: Understand the properties of Vector spaces CO2: Use the properties of Linear Maps in solving problems on Linear Algebra CO3: Demonstrate proficiency on the topics Eigenvalues, Eigenvectors and Inner Product Spaces CO4: Apply mathematics for some applications in Data Science
 Unit-1 Teaching Hours:12 INTRODUCTION TO VECTOR SPACES Vector Spaces: Rn and Cn, lists, Fn and digression on Fields, Definition of Vector spaces, Subspaces, sums of Subspaces, Direct Sums, Span and Linear Independence, bases, dimension. Unit-2 Teaching Hours:12 LINEAR MAPS DefinitionofLinearMaps-AlgebraicOperationson L(V,W) - Null spaces and Injectivity-RangeandSurjectivity-FundamentalTheoremsofLinearMaps-Representing aLinearMapbyaMatrix-InvertibleLinearMaps-IsomorphicVectorspaces-LinearMap as Matrix Multiplication - Operators - Products of Vector Spaces - Product of Direct Sum - Quotients of Vector spaces. Unit-3 Teaching Hours:12 EIGENVALUES, EIGENVECTORS, AND INNER PRODUCT SPACES Eigenvalues and Eigenvectors - Eigenvectors and Upper Triangular matrices - Eigenspaces and Diagonal Matrices - Inner Products and Norms - Linear functionals on Inner Product spaces. Unit-4 Teaching Hours:12 BASIC MATRIX METHODS FOR APPLICATIONS Matrix Norms – Least square problem - Singular value decomposition- Householder Transformation and QR decomposition- Non Negative Matrix Factorization – bidiagonalization. Unit-5 Teaching Hours:12 MATHEMATICS APPLIED TO DATA SCIENCE Handwritten digits recognition using simple algorithm - Classification of handwritten digits using SVD bases and Tangent distance - Text Mining using Latent semantic index, Clustering, Non-negative Matrix Factorization and LGK bidiagonalization. Text Books And Reference Books:1. S. Axler, Linear algebra done right, Springer, 2017. 2. Eldén Lars, Matrix methods in data mining and pattern recognition, Society for Industrial and Applied Mathematics, 2007. Essential Reading / Recommended Reading1. E. Davis, Linear algebra and probability for computer science applications, CRC Press, 2012. 2. J. V. Kepner and J. R. Gilbert, Graph algorithms in the language of linear algebra, Society for Industrial and Applied Mathematics, 2011. 3. D. A. Simovici, Linear algebra tools for data mining, World Scientific Publishing, 2012. 4. P. N. Klein, Coding the matrix: linear algebra through applications to computer science, Newtonian Press, 2015. Evaluation PatternCIA - 50% ESE - 50% MDS131L - MATHEMATICAL FOUNDATION FOR DATA SCIENCE I (2021 Batch) Total Teaching Hours for Semester:60 No of Lecture Hours/Week:4 Max Marks:100 Credits:4 Course Objectives/Course Description Linear Algebra plays a fundamental role in the theory of Data Science. This course aims at introducing the basic notions of vector spaces, Linear Algebra and the use of Linear Algebra in applications to Data Science Course Outcome Understand the properties of Vector spaces  Use the properties of Linear Maps in solving problems on Linear Algebra  Demonstrate proficiency on the topics Eigenvalues, Eigenvectors and Inner Product Spaces  Apply mathematics for some applications in Data Science
 Unit-1 Teaching Hours:12 INTRODUCTION TO VECTOR SPACES Vector Spaces: Rn and Cn, lists, Fn and digression on Fields, Definition of Vector spaces, Subspaces, sums of Subspaces, Direct Sums, Span and Linear Independence, bases, dimension Unit-2 Teaching Hours:12 LINEAR MAPS Definition of LinearMaps-AlgebraicOperationson L(V,W) - Null spaces and Injectivity-RangeandSurjectivity-FundamentalTheoremsofLinearMaps-Representing aLinearMapbyaMatrix-InvertibleLinearMaps-IsomorphicVectorspaces-LinearMap as Matrix Multiplication - Operators - Products of Vector Spaces - Product of Direct Sum - Quotients of Vector spaces Unit-3 Teaching Hours:12 EIGENVALUES, EIGENVECTORS, AND INNER PRODUCT SPACES Eigenvalues and Eigenvectors - Eigenvectors and Upper Triangular matrices - Eigenspaces and Diagonal Matrices - Inner Products and Norms - Linear functionals on Inner Product spaces. Unit-4 Teaching Hours:12 BASIC MATRIX METHODS FOR APPLICATIONS Matrix Norms – Least square problem - Singular value decomposition- Householder Transformation and QR decomposition- Non Negative Matrix Factorization – bidiagonalization. Unit-5 Teaching Hours:12 MATHEMATICS APPLIED TO DATA SCIENCE Handwritten digits recognition using simple algorithm - Classification of handwritten digits using SVD bases and Tangent distance - Text Mining using Latent semantic index, Clustering, Non-negative Matrix Factorization and LGK bidiagonalization Text Books And Reference Books:1. S. Axler, Linear algebra done right, Springer, 2017. 2. Eldén Lars, Matrix methods in data mining and pattern recognition, Society for Industrial and Applied Mathematics, 2007. Essential Reading / Recommended Reading1. E. Davis, Linear algebra and probability for computer science applications, CRC Press, 2012. 2. J. V. Kepner and J. R. Gilbert, Graph algorithms in the language of linear algebra, Society for Industrial and Applied Mathematics, 2011. 3. D. A. Simovici, Linear algebra tools for data mining, World Scientific Publishing, 2012. 4. P. N. Klein, Coding the matrix: linear algebra through applications to computer science, Newtonian Press,2015 Evaluation PatternCIA I : 10% CIA II : 25% CIA III : 10% ATTENDANCE : 5% ESE : 50% MDS132 - PROBABILITY AND DISTRIBUTION THEORY (2021 Batch) Total Teaching Hours for Semester:60 No of Lecture Hours/Week:4 Max Marks:100 Credits:4 Course Objectives/Course Description Probability and probability distributions play an essential role in modeling data from the real-world phenomenon. This course will equip students with thorough knowledge in probability and various probability distributions and model real-life data sets with an appropriate probability distribution Course Outcome CO1: Describe random event and probability of events CO2: Identify various discrete and continuous distributions and their usage. CO3: Evaluate condition probabilities and conditional expectations CO4: Apply Chebychev’s inequality to verify the convergence of sequence in probability
 Unit-1 Teaching Hours:12 DESCRIPTIVE STATISTICS AND PROBABILITY Data – types of variables: numeric vs categorical - measures of central tendency – measures of dispersion - random experiment - sample space and random events – probability - probability axioms - finite sample space with equally likely outcomes - conditional probability - independent events - Baye’s theorem Unit-2 Teaching Hours:12 PROBABILITY DISTRIBUTIONS FOR DISCRETE DATA Random variable – data as observed values of a random variable - expectation – moments & moment generating function - mean and variance in terms of moments - discrete sample space and discrete random variable – Bernoulli experiment and Binary variable: Bernoulli and binomial distributions – Count data: Poisson distribution – overdispersion in count data: negative binomial distribution – dependent Bernoulli  trails: hypergeometric distribution. Unit-3 Teaching Hours:12 PROBABILITY DISTRIBUTIONS FOR CONTINUOUS DATA Continuous sample space - Interval data - continuous random variable – uniform distribution - normal distribution (Gaussian distribution) – modeling lifetime data: exponential distribution, gamma distribution, Weibull distribution. Unit-4 Teaching Hours:12 JOINTLY DISTRIBUTED RANDOM VARIABLES Joint distribution of vector random variables – joint moments – covariance – correlation - the correlation - independent random variables - conditional distribution – conditional expectation - sampling distributions: chi-square, t, F (central). Unit-5 Teaching Hours:12 LIMIT THEOREMS Chebychev’s inequality - weak law of large n u mbers (iid): examples - strong law of large numbers (statement only) - central limit theorems (iid case): examples. Text Books And Reference Books:1. Ross, Sheldon. A first course in probability. 10th Edition. Pearson, 2019. 2. An Introduction to Probability and Statistics, V.K Rohatgi and Saleh, 3rd Edition, 2015 Essential Reading / Recommended Reading1. Introduction to the theory of statistics, A.M Mood, F.A Graybill and D.C Boes, Tata McGraw-Hill, 3rd Edition (Reprint), 2017. 2. Ross, Sheldon M. Introduction to probability models. 12th Edition, Academic Press, 2019. Evaluation PatternCIA: 50% ESE: 50% MDS132L - PROBABILITY AND DISTRIBUTION THEORY (2021 Batch) Total Teaching Hours for Semester:60 No of Lecture Hours/Week:4 Max Marks:100 Credits:4 Course Objectives/Course Description Course Objectives  To enable the students to understand the properties and applications of various probability functions. Course Outcome CO1: Demonstrate the random variables and its functions CO2: Infer the expectations for random variable functions and generating functions. CO3: Demonstrate various discrete and continuous distributions and their usage
 Unit-1 Teaching Hours:12 ALGEBRA OF PROBABILITY Algebra of sets - fields and sigma - fields, Inverse function -Measurable function – Probability measure on a sigma field – simple properties - Probability space - Random variables and Random vectors – Induced Probability space – Distribution functions –Decomposition of distribution functions. Unit-2 Teaching Hours:12 EXPECTATION AND MOMENTS OF RANDOM VARIABLES Definitions and simple properties - Moment inequalities – Holder, Jenson Inequalities – Characteristic function – definition and properties – Inversion formula. Convergence of a sequence of random variables - convergence in distribution - convergence in probability almost sure convergence and convergence in quadratic mean - Weak and Complete convergence of distribution functions – Helly - Bray theorem. Unit-3 Teaching Hours:12 LAW OF LARGE NUMBERS Khintchin's weak law of large numbers, Kolmogorov strong law of large numbers (statement only) – Central Limit Theorem – Lindeberg – Levy theorem, Linderberg – Feller theorem (statement only), Liapounov theorem – Relation between Liapounov and Linderberg –Feller forms – Radon Nikodym theorem and derivative (without proof) – Conditional expectation – definition and simple properties. Unit-4 Teaching Hours:12 DISTRIBUTION THEORY Distribution of functions of random variables – Laplace, Cauchy, Inverse Gaussian, Lognormal, Logarithmic series and Power series distributions - Multinomial distribution - Bivariate Binomial – Bivariate Poisson – Bivariate Normal - Bivariate Exponential of Marshall and Olkin - Compound, truncated and mixture of distributions, Concept of convolution - Multivariate normal distribution (Definition and Concept only) - Sampling distributions: Non-central chi-square, t and F distributions and their properties. Unit-5 Teaching Hours:12 ORDER STATISTICS Order statistics, their distributions and properties - Joint and marginal distributions of order statistics - Distribution of range and mid range -Extreme values and their asymptotic distributions (concepts only) - Empirical distribution function and its properties – Kolmogorov - Smirnov distributions – Life time distributions -Exponential and Weibull distributions - Mills ratio – Distributions classified by hazard rate. Text Books And Reference Books:1. B.R Bhat, Modern Probability Theory,  New Age International, 4th Edition, 2014. 2. V.K Rohatgi and Saleh, An Introduction to Probability and Statistics, 3rd Edition, 2015. Essential Reading / Recommended Reading1. A.M Mood, F.A Graybill and D.C Boes, Introduction to the theory of statistics, Tata McGraw-Hill, 3rd Edition (Reprint), 2017. 2. H.A David and H.N Nagaraja, Order Statistics, John Wiley & Sons, 3rd Edition, 2003. Evaluation PatternCIA - 50% ESE - 50% MDS133 - PRINCIPLES OF DATA SCIENCE (2021 Batch) Total Teaching Hours for Semester:60 No of Lecture Hours/Week:4 Max Marks:100 Credits:4 Course Objectives/Course Description To provide strong foundation for data science and application area related to information technology and understand the underlying core concepts and emerging technologies in data science Course Outcome CO1:Explore the fundamental concepts of data science CO2:Understand data analysis techniques for applications handling large data CO3:Understand various machine learning algorithms used in data science process CO4:Visualize and present the inference using various tools CO5:Learn to think through the ethics surrounding privacy, data sharing and algorithmic decision-making
Unit-1
Teaching Hours:10
INTRODUCTION TO DATA SCIENCE

Definition – Big Data and Data Science Hype – Why data science – Getting Past the Hype – The Current Landscape – Who is Data Scientist? - Data Science Process Overview – Defining goals – Retrieving data – Data preparation – Data exploration – Data modeling – Presentation.

Unit-2
Teaching Hours:12
BIG DATA

Problems when handling large data – General techniques for handling large data – Case study – Steps in big data – Distributing data storage and processing with Frameworks – Case study.

Unit-3
Teaching Hours:12
MACHINE LEARNING

Machine learning – Modeling Process – Training model – Validating model – Predicting new observations –Supervised learning algorithms – Unsupervised learning algorithms.

Unit-4
Teaching Hours:12
DEEP LEARNING

Introduction – Deep Feedforward Networks – Regularization – Optimization of Deep Learning – Convolutional Networks – Recurrent and Recursive Nets – Applications of Deep Learning.

Unit-5
Teaching Hours:14
DATA VISUALIZATION

Introduction to data visualization – Data visualization options – Filters – MapReduce – Dashboard development tools – Creating an interactive dashboard with dc.js-summary.

Unit-5
Teaching Hours:14
ETHICS AND RECENT TRENDS

Data Science Ethics – Doing good data science – Owners of the data - Valuing different aspects of privacy - Getting informed consent - The Five Cs – Diversity – Inclusion – Future Trends.

Text Books And Reference Books:

. Introducing Data Science, Davy Cielen, Arno D. B. Meysman, Mohamed Ali, Manning Publications Co., 1st edition, 2016

. An Introduction to Statistical Learning: with Applications in R, Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani, Springer, 1st edition, 2013

. Deep Learning, Ian Goodfellow, Yoshua Bengio, Aaron Courville, MIT Press, 1st edition, 2016

. Ethics and Data Science, D J Patil, Hilary Mason, Mike Loukides, O’ Reilly, 1st edition, 2018

. Data Science from Scratch: First Principles with Python, Joel Grus, O’Reilly, 1st edition, 2015

. Doing Data Science, Straight Talk from the Frontline, Cathy O'Neil, Rachel Schutt, O’Reilly, 1st edition, 2013

. Mining of Massive Datasets, Jure Leskovec, Anand Rajaraman, Jeffrey David Ullman, Cambridge University Press, 2nd edition, 2014

Evaluation Pattern

CIA : 50 %

ESE : 50 %

MDS133L - PRINCIPLES OF DATA SCIENCE (2021 Batch)

Total Teaching Hours for Semester:60
No of Lecture Hours/Week:4
Max Marks:100
Credits:4

Course Objectives/Course Description

Course Description:

To provide strong foundation for Data Science and related areas of application. The course includes with the fundamentals of data science, different techniques for handing big data and machine learning algorithms for supervised and unsupervised learning. The importance of handling data in an ethical manner and the ethical practices to be adopted while dealing the data is also  a part of the course.

Course Objectives:

 To provide strong foundation for data science and application area related to information technology and understand the underlying core concepts and emerging technologies in data science

Course Outcome

 CO1:Explore the fundamental concepts of data science CO2:Understand data analysis techniques for applications handling large data CO3:Understand various machine learning algorithms used in data science process CO4:Visualize and present the inference using various tools CO5:Learn to think through the ethics surrounding privacy, data sharing and algorithmic decision-making

Unit-1
Teaching Hours:10
INTRODUCTION TO DATA SCIENCE

 Definition – Big Data and Data Science Hype – Why data science – Getting Past the Hype – The Current Landscape – Who is Data Scientist? - Data Science Process Overview – Defining goals – Retrieving data – Data preparation – Data exploration – Data modeling – Presentation.
Unit-2
Teaching Hours:12
BIG DATA

Problems when handling large data – General techniques for handling large data – Case study – Steps in big data – Distributing data storage and processing with Frameworks – Case study.

Unit-3
Teaching Hours:12
MACHINE LEARNING

 Machine learning – Modeling Process – Training model – Validating model – Predicting new observations –Supervised learning algorithms – Unsupervised learning algorithms.
Unit-4
Teaching Hours:12
DEEP LEARNING

 Introduction – Deep Feedforward Networks – Regularization – Optimization of Deep Learning – Convolutional Networks – Recurrent and Recursive Nets – Applications of Deep Learning.
Unit-5
Teaching Hours:14
DATA VISUALIZATION

 Introduction to data visualization – Data visualization options – Filters – MapReduce – Dashboard development tools – Creating an interactive dashboard with dc.js-summary. ETHICS AND RECENT TRENDS Data Science Ethics – Doing good data science – Owners of the data - Valuing different aspects of privacy - Getting informed consent - The Five Cs – Diversity – Inclusion – Future Trends.
Text Books And Reference Books:

T1. Introducing Data Science, Davy Cielen, Amo D.B. Meysman, Mohammed Ali,  Manning Publications Co., 1st              Edition, 2016

T2. An Introduction to Statistical Learning: with Applications in R, Gareth James, Daniela Witten, Trevor Hastic, Robert Tibshirani, Springer, 1st edition, 2013

T3. Deep learning, Ian Goodfellow, Yoshua Bengio, Aaron Courville, MIT Press, 1st   Edition, 2016

T4. Ethics and Data Science, D J Patil, Hilary mason, Mike Loukides, O’ Reilly, 1st Edition, 2018

R1. Data Science from Scratch: First Principles with Python, Joel Grus, O’Reilly, 1st Edition, 2015

R2.Doing Data Science, Straight talk from the Frontline, Cathy O’Neil, Rachel Schutt, O’ Reilly, 1st Edition, 2013

R3. Mining of Massive Datasets, Jure Leskovee, Anand Rajaraman, Jeffrey David Ullman, Cambridge University Press, 2nd edition, 2014

Evaluation Pattern

 CIA I CIA  II CIA III Attendance ESE 10% 25% 10% 5% 50%

MDS134 - RESEARCH METHODOLOGY (2021 Batch)

Total Teaching Hours for Semester:30
No of Lecture Hours/Week:2
Max Marks:50
Credits:2

Course Objectives/Course Description

This course is intended to assist students in planning and carrying out research work.The students are exposed to the basic principles, procedures and techniques of implementing a research project.

To introduce the research concept and the various research methodologies is the main objective. It focuses on finding out the research gap from the literature and encourages lateral, strategic and creative thinking. This course also introduces computer technology and basic statistics required for research and reporting the research outcomes scientifically emphasizing on research ethics.

Course Outcome

CO1: Understand the essense of research and the necessity of defining a research problem.

CO2: Apply research methods and methodology including research design,data collection, data analysis, and interpretation.

CO3: Create scientific reports according to specified standards.

 Unit-1 Teaching Hours:8 RESEARCH METHODOLOGY Defining research problem:Selecting the problem, Necessity of defining the problem ,Techniques involved in defining a problem- Ethics in Research. Unit-2 Teaching Hours:8 RESEARCH DESIGN Principles of experimental design,Working with Literature: Importance, finding literature, Using your resources, Managing the literature, Keep track of references,Using the literature, Literature review,On-line Searching: Database ,SCIFinder, Scopus, Science Direct ,Searching research articles , Citation Index ,Impact Factor ,H-index. Unit-3 Teaching Hours:7 RESEARCH DATA Measurement of Scaling: Quantitative, Qualitative, Classification of Measure scales, Data Collection, Data Preparation. Unit-4 Teaching Hours:7 REPORT WRITING Scientific Writing and Report Writing: Significance, Steps, Layout, Types, Mechanics and Precautions, Latex: Introduction, Text, Tables, Figures, Equations, Citations, Referencing, and Templates (IEEE style), Paper writing for international journals, Writing scientific report. Text Books And Reference Books: C. R. Kothari, Research Methodology Methods and Techniques, 3rd. ed. New Delhi: New Age International Publishers, Reprint 2014.  Zina O’Leary, The Essential Guide of Doing Research, New Delhi: PHI, 2005. Essential Reading / Recommended Reading J. W. Creswell, Research Design: Qualitative, Quantitative, and Mixed Methods Approaches, 4thed. SAGE Publications, 2014.  Kumar, Research Methodology: A Step by Step Guide for Beginners, 3rd. ed. Indian: PE, 2010. Evaluation PatternCIA - 50% ESE - 50% MDS134L - RESEARCH METHODOLOGY (2021 Batch) Total Teaching Hours for Semester:30 No of Lecture Hours/Week:2 Max Marks:50 Credits:2 Course Objectives/Course Description This course is intended to assist students in planning and carrying out research work.The students are exposed to the basic principles, procedures and techniques of implementing a research project.  To introduce the research concept and the various research methodologies is the main objective. It focuses on finding out the research gap from the literature and encourages lateral, strategic and creative thinking. This course also introduces computer technology and basic statistics required for research and reporting the research outcomes scientifically emphasizing on research ethics. Course Outcome CO1: Understand the essense of research and the necessity of defining a research problem. CO2: Apply research methods and methodology including research design,data collection, data analysis, and interpretation. CO3: Create scientific reports according to specified standards.
 Unit-1 Teaching Hours:8 RESEARCH METHODOLOGY Defining research problem:Selecting the problem, Necessity of defining the problem ,Techniques involved in defining a problem- Ethics in Research. Unit-2 Teaching Hours:8 RESEARCH DESIGN Principles of experimental design,Working with Literature: Importance, finding literature, Using your resources, Managing the literature, Keep track of references,Using the literature, Literature review,On-line Searching: Database ,SCIFinder, Scopus, Science Direct, Searching research articles , Citation Index ,Impact Factor ,H-index. Unit-3 Teaching Hours:7 RESEARCH DATA Measurement of Scaling: Quantitative, Qualitative, Classification of Measure scales, Data Collection, Data Preparation. Unit-4 Teaching Hours:7 REPORT WRITING Scientific Writing and Report Writing: Significance, Steps, Layout, Types, Mechanics and Precautions, Latex: Introduction, Text, Tables, Figures, Equations, Citations, Referencing, and Templates (IEEE style), Paper writing for international journals, Writing scientific report. Text Books And Reference Books: C. R. Kothari, Research Methodology Methods and Techniques, 3rd. ed. New Delhi: New Age International Publishers, Reprint 2014.  Zina O’Leary, The Essential Guide of Doing Research, New Delhi: PHI, 2005. Essential Reading / Recommended Reading J. W. Creswell, Research Design: Qualitative, Quantitative, and Mixed Methods Approaches, 4thed. SAGE Publications, 2014.  Kumar, Research Methodology: A Step by Step Guide for Beginners, 3rd. ed. Indian: PE, 2010. Evaluation PatternCIA- 50% ESE- 50% MDS161A - INTRODUCTION TO STATISTICS (2021 Batch) Total Teaching Hours for Semester:30 No of Lecture Hours/Week:2 Max Marks:50 Credits:2 Course Objectives/Course Description To enable the students to understand the fundamentals of statistics to apply descriptive measures and probability for data analysis. Course Outcome CO1: Demonstrate the history of statistics and present the data in various forms. CO2: Infer the concept of correlation and regression for relating two or more related variables. CO3: Demonstrate the probabilities for various events.
 Unit-1 Teaching Hours:8 ORGANIZATION AND PRESENTATION OF DATA Origin and development of Statistics, Scope, limitation and misuse of statistics. Types of data: primary, secondary, quantitative and qualitative data. Types of Measurements: nominal, ordinal, discrete and continuous data. Presentation of data by tables: construction of frequency distributions for discrete and continuous data, graphical representation of a frequency distribution by histogram and frequency polygon, cumulative frequency distributions Unit-2 Teaching Hours:8 DESCRIPTIVE STATISTICS Measures of location or central tendency: Arthimetic mean, Median, Mode, Geometric mean, Harmonic mean. Partition values: Quartiles, Deciles and percentiles. Measures of dispersion: Mean deviation, Quartile deviation, Standard deviation, Coefficient of variation. Moments: measures of skewness, Kurtosis. Unit-3 Teaching Hours:7 CORRELATION AND REGRESSION Correlation: Scatter plot, Karl Pearson coefficient of correlation, Spearman's rank correlation coefficient, multiple and partial correlations (for 3 variates only). Regression: Concept of errors, Principles of Least Square, Simple linear regression and its properties. Unit-4 Teaching Hours:7 BASICS OF PROBABILITY Random experiment, sample point and sample space, event, algebra of events. Definition of Probability: classical, empirical and axiomatic approaches to probability, properties of probability. Theorems on probability, conditional probability and independent events, Laws of total probability, Baye’s theorem and its applications Text Books And Reference Books:. Rohatgi V.K and Saleh E, An Introduction to Probability and Statistics, 3rd edition, John Wiley & Sons Inc., New Jersey, 2015. . Gupta S.C and Kapoor V.K, Fundamentals of Mathematical Statistics, 11th edition, Sultan Chand & Sons, New Delhi, 2014. Essential Reading / Recommended Reading. Mukhopadhyay P, Mathematical Statistics, Books and Allied (P) Ltd, Kolkata, 2015. . Walpole R.E, Myers R.H, and Myers S.L, Probability and Statistics for Engineers and Scientists, Pearson, New Delhi, 2017. . Montgomery D.C and Runger G.C, Applied Statistics and Probability for Engineers, Wiley India, New Delhi, 2013. . Mood A.M, Graybill F.A and Boes D.C, Introduction to the Theory of Statistics, McGraw Hill, New Delhi, 2008. Evaluation PatternCIA - 50% ESE - 50% MDS161B - INTRODUCTION TO COMPUTERS AND PROGRAMMING (2021 Batch) Total Teaching Hours for Semester:30 No of Lecture Hours/Week:2 Max Marks:50 Credits:2 Course Objectives/Course Description To enable the students to understand the fundamental concepts of problem solving and programming structures. Course Outcome CO1: Demonstrate the systematic approach for problem-solving using computers. CO2: Apply different programming structures with suitable logic for computational problems.
 Unit-1 Teaching Hours:10 COMPUTERS AND DIGITAL BASICS Number Representation – Decimal, Binary, Octal, Hexadecimal and BCD numbers – Binary Arithmetic – Binary addition – Unsigned and Signed numbers – one’s and two’s complements of Binary numbers – Arithmetic operations with signed numbers - Number system conversions – Boolean Algebra – Logic gates – Design of Circuits – K - Map Unit-2 Teaching Hours:5 GENERAL PROBLEM SOLVING CONCEPT Types of Problems – Problem solving with Computers – Difficulties with problem solving – problem solving concepts for the Computer – Constants and Variables – Rules for Naming and using variables – Data types – numeric data – character data – logical data – rules for data types – examples of data types – storing the data in computer - Functions – Operators – Expressions and Equations Unit-3 Teaching Hours:5 PLANNING FOR SOLUTION Communicating with computer – organizing the solution – Analyzing the problem – developing the interactivity chart – developing the IPO chart – Writing the algorithms – drawing the flow charts – pseudocode – internal and external documentation – testing the solution – coding the solution – software development life cycle. Unit-4 Teaching Hours:10 PROBLEM SOLVING Introduction to programming structure – pointers for structuring a solution – modules and their functions – cohesion and coupling – problem solving with logic structure. Problem solving with decisions – the decision logic structure – straight through logic – positive logic – negative logic – logic conversion – decision tables – case logic structure -  examples. Text Books And Reference Books: Thomas L.Floyd and R.P.Jain,“Digital Fundamentals”,8th Edition, Pearson Education,2007.  Peter Norton “Introduction to Computers”,6th Edition, Tata Mc Graw Hill, New Delhi,2006.  Maureen Sprankle and Jim Hubbard, Problem-solving and programming concepts, PHI, 9th Edition, 2012 Essential Reading / Recommended Reading . E Balagurusamy, Fundamentals of Computers, TMH, 2011 Evaluation PatternCIA: 50% ESE: 50% MDS161BL - INTRODUCTION TO COMPUTERS AND PROGRAMMING (2021 Batch) Total Teaching Hours for Semester:30 No of Lecture Hours/Week:2 Max Marks:50 Credits:2 Course Objectives/Course Description To enable the students to understand the fundamental concepts of problem solving and programming structures. Course Outcome CO1: Demonstrate the systematic approach for problem solving using computers. EM CO2: Apply different programming structure with suitable logic for computational problems. EM+S
 Unit-1 Teaching Hours:10 COMPUTERS AND DIGITAL BASICS Number Representation – Decimal, Binary, Octal, Hexadecimal and BCD numbers – Binary Arithmetic – Binary addition – Unsigned and Signed numbers – one’s and two’s complements of Binary numbers – Arithmetic operations with signed numbers - Number system conversions – Boolean Algebra – Logic gates – Design of Circuits – K - Map Unit-2 Teaching Hours:5 GENERAL PROBLEM SOLVING CONCEPT Types of Problems – Problem solving with Computers – Difficulties with problem solving – problem solving concepts for the Computer – Constants and Variables – Rules for Naming and using variables – Data types – numeric data – character data – logical data – rules for data types – examples of data types – storing the data in computer - Functions – Operators – Expressions and Equations Unit-3 Teaching Hours:5 PLANNING FOR SOLUTION Communicating with computer – organizing the solution – Analyzing the problem – developing the interactivity chart – developing the IPO chart – Writing the algorithms – drawing the flow charts – pseudocode – internal and external documentation – testing the solution – coding the solution – software development life cycle. Unit-4 Teaching Hours:10 PROBLEM SOLVING Introduction to programming structure – pointers for structuring a solution – modules and their functions – cohesion and coupling – problem solving with logic structure. Problem solving with decisions – the decision logic structure – straight through logic – positive logic – negative logic – logic conversion – decision tables – case logic structure - examples. Text Books And Reference Books:Thomas L.Floyd and R.P.Jain,“Digital Fundamentals”,8th Edition, Pearson Education,2007. Peter Norton “Introduction to Computers”,6th Edition, Tata Mc Graw Hill, New Delhi,2006. Maureen Sprankle and Jim Hubbard, Problem solving and programming concepts, PHI, 9th Edition, 2012 Essential Reading / Recommended Reading. EBalagurusamy,FundamentalsofComputers, TMH,2011 Evaluation PatternCIA:50%   ESE:50% MDS161C - LINUX ADMINISTRATION (2021 Batch) Total Teaching Hours for Semester:30 No of Lecture Hours/Week:2 Max Marks:50 Credits:2 Course Objectives/Course Description To Enable the students to excel in the Linux Platform Course Outcome CO1: Demostrate the systematic approach for configure the Liux environment CO2: Manage the Linux environment to work with open source data science tools
Unit-1
Teaching Hours:10
Module-1

RHEL7.5,breaking root password, Understand and use essential tools for handling files, directories, command-line environments, and documentation - Configure local storage using partitions and logical volumes

Unit-2
Teaching Hours:10
Module-2

Swapping, Extend LVM Partitions,LVM Snapshot - Manage users and groups, including use of a centralized directory for authentication

Unit-3
Teaching Hours:10
Module-3

Kernel updations,yum and nmcli configuration, Scheduling jobs,at,crontab - Configure firewall settings using firewall config, firewall-cmd, or iptables , Configure key-based authentication for SSH ,Set enforcing and permissive modes for SELinux , List and identify SELinux file and process context ,Restore default file contexts

Text Books And Reference Books:

1.    https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/

2.    https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/

-

Evaluation Pattern

CIA:50%

ESE:50%

MDS161LA - INTRODUCTION TO STATISTICS (2021 Batch)

Total Teaching Hours for Semester:1
No of Lecture Hours/Week:2
Max Marks:50
Credits:2

Course Objectives/Course Description

 To enable the students to understand the fundamentals of statistics to apply descriptive measures and probability for data analysis.

Course Outcome

 CO1: Demonstrate the history of statistics and present the data in various forms. CO2: Infer the concept of correlation and regression for relating two or more related variables. CO3: Demonstrate the probabilities for various events.

Unit-1
Teaching Hours:8
ORGANIZATION AND PRESENTATION OF DATA

Origin and development of Statistics, Scope, limitation and misuse of statistics. Types of data: primary, secondary, quantitative and qualitative data. Types of Measurements: nominal, ordinal, discrete and continuous data. Presentation of data by tables: construction of frequency distributions for discrete and continuous data, graphical representation of a frequency distribution by histogram and frequency polygon, cumulative frequency distributions

Unit-2
Teaching Hours:8
DESCRIPTIVE STATISTICS

Measures of location or central tendency: Arthimetic mean, Median, Mode, Geometric mean, Harmonic mean. Partition values: Quartiles, Deciles and percentiles. Measures of dispersion: Mean deviation, Quartile deviation, Standard deviation, Coefficient of variation. Moments: measures of skewness, Kurtosis

Unit-3
Teaching Hours:7
CORRELATION AND REGRESSION

Correlation: Scatter plot, Karl Pearson coefficient of correlation, Spearman's rank correlation coefficient, multiple and partial correlations (for 3 variates only). Regression: Concept of errors, Principles of Least Square, Simple linear regression and its properties

Unit-4
Teaching Hours:7
BASICS OF PROBABILITY

Random experiment, sample point and sample space, event, algebra of events. Definition of Probability: classical, empirical and axiomatic approaches to probability, properties of probability. Theorems on probability, conditional probability and independent events, Laws of total probability, Baye’s theorem and its applications

Text Books And Reference Books:
 . Rohatgi V.K and Saleh E, An Introduction to Probability and Statistics, 3rd edition, John Wiley & Sons Inc., New Jersey, 2015. . Gupta S.C and Kapoor V.K, Fundamentals of Mathematical Statistics, 11th edition, Sultan Chand & Sons, New Delhi, 2014.

. Mukhopadhyay P, Mathematical Statistics, Books and Allied (P) Ltd, Kolkata, 2015.

. Walpole R.E, Myers R.H, and Myers S.L, Probability and Statistics for Engineers and Scientists, Pearson, New Delhi, 2017.

. Montgomery D.C and Runger G.C, Applied Statistics and Probability for Engineers, Wiley India, New Delhi, 2013.

. Mood A.M, Graybill F.A and Boes D.C, Introduction to the Theory of Statistics, McGraw Hill, New Delhi, 2008.

Evaluation Pattern

CIA - 50%

ESE - 50%

MDS171 - DATA BASE TECHNOLOGIES (2021 Batch)

Total Teaching Hours for Semester:90
No of Lecture Hours/Week:6
Max Marks:150
Credits:5

Course Objectives/Course Description

The main objective of this course is to fundamental knowledge and practical experience with, database concepts. It includes the concepts and terminologies which facilitate the construction of relational databases, writing effective queries comprehend data warehouse and NoSQL databases and its types

Course Outcome

CO1: Demonstrate various databases and Compose effective queries

CO2: Understanding the process of OLAP system construction

CO3: Develop applications using Relational and NoSQL databases.

Unit-1
Teaching Hours:18
INTRODUCTION

Concept & Overview of DBMS, Data Models, Database Languages, Database Administrator, Database Users, Three Schema architecture of DBMS. Basic concepts, Design Issues, Mapping Constraints, Keys, Entity-Relationship Diagram, Weak Entity Sets, Extended E-R features

Lab Exercises

1. Data Definition,

2. Table Creation

3. Constraints

Unit-2
Teaching Hours:18
RELATIONAL MODEL AND DATABASE DESIGN

SQL and Integrity Constraints, Concept of DDL, DML, DCL. Basic Structure, Set operations, Aggregate Functions, Null Values, Domain Constraints, Referential Integrity Constraints, assertions, views, Nested Subqueries, Functional Dependency, Different anomalies in designing a Database, Normalization: using functional dependencies, Boyce-Codd Normal Form, 4NF

Lab Exercises

1. Insert, Select, Update & Delete Commands

2. Nested Queries & Join Queries

3. Views

Unit-3
Teaching Hours:18
DATA WAREHOUSE: THE BUILDING BLOCKS

Defining Features, Data Warehouses and Data Marts, Architectural Types, Overview of the Components, Metadata in the Data warehouse, Data Design and Data Preparation: Principles of Dimensional Modeling, Dimensional Modeling Advanced Topics From Requirements To Data Design, The Star Schema, Star Schema Keys, Advantages of the Star Schema, Star Schema: Examples, Dimensional Modeling: Advanced Topics, Updates to the Dimension Tables, Miscellaneous Dimensions, The Snowflake Schema, Aggregate Fact Tables, Families Oo Stars

Lab Exercises:

1. Importing source data structures

2. Design Target Data Structures

3. Create target multidimensional cube

Unit-4
Teaching Hours:18
DATA INTEGRATION and DATA FLOW (ETL)

Requirements, ETL Data Structures, Extracting, Cleaning and Conforming, Delivering Dimension Tables, Delivering Fact Tables, Real-Time ETL Systems

Lab Exercises:

1. Perform the ETL process and transform into data map

2. Create the cube and process it

3. Generating Reports

4. Creating the Pivot table and pivot chart using some existing data

Unit-5
Teaching Hours:18
NOSQL Databases

Introduction to NOSQL Systems, The CAP Theorem, Document-Based NOSQL Systems and MongoDB, NOSQL Key-Value Stores, Column-Based or Wide Column NOSQL Systems, Graph databases, Multimedia databases.

Lab Exercises:

1. MongoDB Exercise - 1

2. MongoDB Exercise - 2

Text Books And Reference Books:

. Henry F. Korth and Silberschatz Abraham, “Database System Concepts”, Mc.Graw Hill.

. Thomas Cannolly and Carolyn Begg, “Database Systems, A Practical Approach to Design, Implementation and Management”, Third Edition, Pearson Education, 2007.

. The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling, 2nd John Wiley & Sons, Inc. New York, USA, 2002

 LiorRokach and OdedMaimon, Data Mining and Knowledge Discovery Handbook, Springer, 2nd edition, 2010.

Evaluation Pattern

CIA: 50%

ESE: 50%

MDS171L - DATABASE TECHNOLOGIES (2021 Batch)

Total Teaching Hours for Semester:90
No of Lecture Hours/Week:6
Max Marks:150
Credits:5

Course Objectives/Course Description

Course Description and Course Objectives

 The main objective of this course is to fundamental knowledge and practical experience with, database concepts. It includes the concepts and terminologies which facilitate the construction of relational databases, writing effective queries comprehend data warehouse and NoSQL databases and its types

Course Outcome

CO1: Demonstrate various databases and Compose effective queries

CO2: Understanding the process of OLAP system construction

CO3: Develop applications using Relational and NoSQL databases.

Unit-1
Teaching Hours:18
INTRODUCTION

Concept & Overview of DBMS, Data Models, Database Languages, Database Administrator, Database Users, Three Schema architecture of DBMS. Basic concepts, Design Issues, Mapping Constraints, Keys, Entity-Relationship Diagram, Weak Entity Sets, Extended E-R features

Lab Exercises

1. Data Definition,

2. Table Creation

3. Constraints

Unit-2
Teaching Hours:18
RELATIONAL MODEL AND DATABASE DESIGN

SQL and Integrity Constraints, Concept of DDL, DML, DCL. Basic Structure, Set operations, Aggregate Functions, Null Values, Domain Constraints, Referential Integrity Constraints, assertions, views, Nested Subqueries, Functional Dependency, Different anomalies in designing a Database, Normalization: using functional dependencies, Boyce-Codd Normal Form, 4NF

Lab Exercises

1. Insert, Select, Update & Delete Commands

2. Nested Queries & Join Queries

3. Views

Unit-3
Teaching Hours:18
DATA WAREHOUSE: THE BUILDING BLOCKS

Defining Features, Data Warehouses and Data Marts, Architectural Types, Overview of the Components, Metadata in the Data warehouse, Data Design and Data Preparation: Principles of Dimensional Modeling, Dimensional Modeling Advanced Topics From Requirements To Data Design, The Star Schema, Star Schema Keys, Advantages of the Star Schema, Star Schema: Examples, Dimensional Modeling: Advanced Topics, Updates to the Dimension Tables, Miscellaneous Dimensions, The Snowflake Schema, Aggregate Fact Tables, Families Oo Stars

Lab Exercises:

1. Importing source data structures

2. Design Target Data Structures

3. Create target multidimensional cube

Unit-4
Teaching Hours:18
DATA INTEGRATION and DATA FLOW (ETL)

 Requirements, ETL Data Structures, Extracting, Cleaning and Conforming, Delivering Dimension Tables, Delivering Fact Tables, Real-Time ETL Systems  Lab Exercises: 1. Perform the ETL process and transform into data map 2. Create the cube and process it 3. Generating Reports 4. Creating the Pivot table and pivot chart using some existing data
Unit-5
Teaching Hours:18
NOSQL DATABASES

Introduction to NOSQL Systems, The CAP Theorem, Document-Based NOSQL Systems and MongoDB, NOSQL Key-Value Stores, Column-Based or Wide Column NOSQL Systems, Graph databases, Multimedia databases.

Lab Exercises:

1. MongoDB Exercise - 1

2. MongoDB Exercise - 2

Text Books And Reference Books:
 . Henry F. Korth and Silberschatz Abraham, “Database System Concepts”, Mc.Graw Hill. . Thomas Cannolly and Carolyn Begg, “Database Systems, A Practical Approach to Design, Implementation and Management”, Third Edition, Pearson Education, 2007. . The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling, 2nd John Wiley & Sons, Inc. New York, USA, 2002

 LiorRokach and OdedMaimon, Data Mining and Knowledge Discovery Handbook, Springer, 2nd edition, 2010.

Evaluation Pattern
 CIA: 50% ESE: 50%

MDS172 - INFERENTIAL STATISTICS (2021 Batch)

Total Teaching Hours for Semester:90
No of Lecture Hours/Week:6
Max Marks:150
Credits:5

Course Objectives/Course Description

Statistical inference plays an important role in modeling data and decision-making from the real-world phenomenon. This course is designed to impart the knowledge of testing of hypothesis and estimation of parameters for real-life data sets.

Course Outcome

CO1: Demonstrate the concepts of population and samples.

CO2: Apply the idea of sampling distribution of different statistics in testing of hypothesis

CO3: Test the hypothesis using nonparametric tests for real world problems.

CO4: Estimate the unknown population parameters using the concepts of point and interval estimations.

 Unit-1 Teaching Hours:18 INTRODUCTION Population and Statistics – Finite and Infinite population – Parameter and Statistics – Types of sampling - Sampling Distribution – Sampling Error - Standard Error – Test of significance –concept of hypothesis – types of hypothesis – Errors in hypothesis-testing – Critical region – level of significance - Power of the test – p-value. Lab Exercise: 1. Calculation of sampling error and standard error 2. Calculation of probability of critical region using standard distributions 3. Calculation of power of the test using standard distributions. Unit-2 Teaching Hours:18 TESTING OF HYPOTHESIS I Concept of large and small samples – Tests concerning a single population mean for known σ – equality of two means for known σ – Test for Single variance - Test for equality of two variance for normal population – Tests for single proportion – Tests of equality of two proportions for the normal population.   Lab Exercise: 4. Test of the single sample mean for known σ. 5. Test of equality of two means when known σ 6. Tests of single variance and equality of variance for large samples 7. Tests for single proportion and equality of two proportion for large samples. Unit-3 Teaching Hours:18 TESTING OF HYPOTHESIS II Students t-distribution and its properties (without proofs) – Single sample mean test – Independent sample mean test – Paired sample mean test – Tests of proportion (based on t distribution) – F distribution and its properties (without proofs) – Tests of equality of two variances using F-test – Chi-square distribution and its properties (without proofs) – chisquare test for independence of attributes – Chi-square test for goodness of fit.   Lab Exercise: 8. Single sample mean test 9. Independent and Paired sample mean test 10. Tests of proportion of one and two samples based on t-distribution 11. Test of equality of two variances 12. Chi-square test for independence of attributes and goodness of fit. Unit-4 Teaching Hours:18 ANALYSIS OF VARIANCE Meaning and assumptions - Fixed, random and mixed effect models - Analysis of variance of one-way and two-way classified data with and without interaction effects – Multiple comparison tests: Tukey’s method - critical difference.   Lab Exercise: 13. Construction of one-way ANOVA 14. Construction of two-way ANOVA with interaction 15. Construction of two-way ANOVA without interaction 16. Multiple comparision test using Tukey’s method and critical difference methods Unit-5 Teaching Hours:18 NONPARAMETRIC TESTS Concept of Nonparametric tests - Run test for randomness - Sign test and Wilcoxon Signed Rank Test for one and paired samples - Run test - Median test and Mann-Whitney-Wilcoxon tests for two samples.   Lab Exercise: 17. Test of one sample using Run and sign tests 18. Test of paried sample using Wilcoxon signed rank test 19. Test of two samples using Run test and Median test 20. Test for two samples using Mann-Whitney-Wilcoxon tests Text Books And Reference Books:1. Gupta S.C and Kapoor V.K, Fundamentals of Mathematical Statistics, 12th edition, Sultan Chand & Sons, New Delhi, 2020. 2. Brian Caffo, Statistical Inference for Data Science, Learnpub, 2016. Essential Reading / Recommended Reading1. Walpole R.E, Myers R.H and Myers S.L, Probability and Statistics for Engineers and Scientists, 9th edition, Pearson, New Delhi, 2017. 2. John V, Using R for Introductory Statistics, 2nd edition, CRC Press, Boca Raton, 2014. 3. Rajagopalan M and Dhanavanthan P, Statistical Inference, PHI Learning (P) Ltd, New Delhi, 2012. 4. Rohatgi V.K and Saleh E, An Introduction to Probability and Statistics, 3rd edition, JohnWiley & Sons Inc, New Jersey, 2015. Evaluation PatternCIA: 50% ESE:50% MDS172L - INFERENTIAL STATISTICS (2021 Batch) Total Teaching Hours for Semester:90 No of Lecture Hours/Week:6 Max Marks:150 Credits:5 Course Objectives/Course Description This course is designed to introduce the concepts of theory of estimation and testing of hypothesis. This paper also deals with the concept of parametric tests for large and small samples. It also provides knowledge about non-parametric tests and its applications Course Outcome CO1: Demonstrate the concepts of point and interval estimation of unknown parameters and their significance using large and small samples. CO2: Apply the idea of sampling distributions of different statistics in testing of hypotheses. CO3: Infer the concept of nonparametric tests for single sample and two samples.
 Unit-1 Teaching Hours:15 SUFFICIENT STATISTICS Neyman - Fisher Factorisation theorem - the existence and construction of minimal sufficient statistics - Minimal sufficient statistics and exponential family - sufficiency and completeness - sufficiency and invariance. Lab Excercise  1. Drawing random samples using random number tables. 2. Point estimation of parameters and obtaining estimates of standard errors. Unit-2 Teaching Hours:15 UNBIASED ESTIMATION Minimum variance unbiased estimation - locally minimum variance unbiased estimators - Rao Blackwell – theorem – Completeness: Lehmann Scheffe theorems - Necessary and sufficient condition for unbiased estimators - Cramer- Rao lower bound - Bhattacharya system of lower bounds in the 1-parameter regular case - Chapman -Robbins inequality Lab Excercise  1. Comparison of estimators by plotting mean square error. 2. Computing maximum likelihood estimates -1 3. Computing maximum likelihood estimates - 2 4. Computing moment estimates Unit-3 Teaching Hours:15 MAXIMUM LIKELIHOOD ESTIMATION Computational routines - strong consistency of maximum likelihood estimators - Asymptotic Efficiency of maximum likelihood estimators - Best Asymptotically Normal estimators - Method of moments - Bayes’ and minimax estimation: The structure of Bayes’ rules - Bayes’ estimators for quadratic and convex loss functions - minimax estimation - interval estimation. Lab Exercise:  1. Constructing confidence intervals based on large samples. 2. Constructing confidence intervals based on small samples. 3. Generating random samples from discrete distributions. 4. Generating random samples from continuous distributions. Unit-4 Teaching Hours:15 HYPOTHESIS TESTING Uniformly most powerful tests - the Neyman-Pearson fundamental Lemma - Distributions with monotone likelihood ratio - Problems - Generalization of the fundamental lemma, two sided hypotheses - testing the mean and variance of a normal distribution. Lab Excercise : 1. Evaluation of probabilities of Type-I and Type-II errors and powers of tests. 2. MP test for parameters of binomial and Poisson distributions. 3. MP test for the mean of a normal distribution and power curve. 4. Tests for mean, equality of means when variance is (i) known, (ii) unknown under normality (small and large samples) Unit-5 Teaching Hours:15 MEAN TESTS Unbiased ness for hypotheses testing - similarity and completeness - UMP unbiased tests for multi-parameter exponential families - comparing two Poisson or Binomial populations - testing the parameters of a normal distribution (unbiased tests) - comparing the mean and variance of two normal distributions - Symmetry and invariance - maximal invariance - most powerful invariant tests. Lab Excercise: 1. Tests for single proportion and equality of two proportions. 2. Tests for variance and equality of two variances under normality 3. Tests for correlation and regression coefficients. Unit-6 Teaching Hours:15 SEQUENCTIAL TESTS SPRT procedures - likelihood ratio tests - locally most powerful tests - the concept of confidence sets - non parametric tests. Lab Exercise : 1. Tests for the independence of attributes, analysis of categorical data and tests for the goodness of fit.(For uniform, binomial and Poisson distributions) 2. Nonparametric tests. 3. SPRT for binomial proportion and mean of a normal distribution. Text Books And Reference Books:. Rajagopalan M and Dhanavanthan P, Statistical Inference, PHI Learning (P) Ltd, New Delhi, 2012. . An Introduction to Probability and Statistics, V.K Rohatgi and Saleh, 3rd Edition, 2015. Essential Reading / Recommended Reading. Introduction to the theory of statistics, A.M Mood, F.A Graybill and D.C Boes, Tata McGraw-Hill, 3rd Edition (Reprint), 2017. . Linear Statistical Inference and its Applications, Rao C.R, Willy Publications, 2nd Edition, 2001. Evaluation PatternCIA - 50% ESE - 50% MDS173 - PROGRAMMING FOR DATA SCIENCE IN PYTHON (2021 Batch) Total Teaching Hours for Semester:90 No of Lecture Hours/Week:6 Max Marks:100 Credits:4 Course Objectives/Course Description The objective of this course is to provide comprehensive knowledge of python programming paradigms required for Data Science. Course Outcome CO1: Demonstrate the use of built-in objects of Python CO2:Demonstrate     significant     experience     with      python     program     development environment CO3:Implement   numerical   programming,   data   handling   and   visualization   through NumPy, Pandas and MatplotLibmodules.
Unit-1
Teaching Hours:17
INTRODUCTION TO PYTHON

Structure of Python Program-Underlying mechanism of Module Execution-Branching and Looping-Problem Solving Using Branches and Loops-Functions - Lists and Mutability- Problem Solving Using Lists and Functions

## Lab Exercises

1.      Demonstrate usage of branching and loopingstatements

2.      Demonstrate Recursivefunctions

3.      DemonstrateLists

Unit-2
Teaching Hours:17
SEQUENCE DATATYPES AND OBJECT-ORIENTED PROGRAMMING

Sequences, Mapping and Sets- Dictionaries- -Classes: Classes and Instances-Inheritance- Exceptional Handling-Introduction to Regular Expressions using “re” module.

## Lab Exercises

1.      Demonstrate Tuples andSets

2.      DemonstrateDictionaries

3.      Demonstrate inheritance and exceptionalhandling

4.      Demonstrate use of“re”

Unit-3
Teaching Hours:13
USING NUMPY

Basics of NumPy-Computation on NumPy-Aggregations-Computation on Arrays- Comparisons, Masks and Boolean Arrays-Fancy Indexing-Sorting Arrays-Structured Data: NumPy’s Structured Array.

## Lab Exercises

1.      DemonstrateAggregation

2.      Demonstrate Indexing andSorting

Unit-4
Teaching Hours:13
DATA MANIPULATION WITH PANDAS -I

Introduction to Pandas Objects-Data indexing and Selection-Operating on Data in Pandas- Handling Missing Data-Hierarchical Indexing - Combining Data Sets

## Lab Exercises

1.      Demonstrate handling of missingdata

2.      Demonstrate hierarchicalindexing

Unit-5
Teaching Hours:17
DATA MANIPULATION WITH PANDAS -II

Aggregation and Grouping-Pivot Tables-Vectorized String Operations -Working with Time Series-High Performance Pandas- and query()

## Lab Exercises

1.      Demonstrate usage of Pivottable

2.      Demonstrate use of andquery()

Unit-6
Teaching Hours:13
VISUALIZATION AND MATPLOTLIB

Basic functions of matplotlib-Simple Line Plot, Scatter Plot-Density and Contour Plots- Histograms, Binnings and Density-Customizing Plot Legends, Colour Bars-Three- Dimensional Plotting in Matplotlib.

## Lab Exercises

1.      DemonstrateScatterPlot

2.      Demonstrate3Dplotting

Text Books And Reference Books:

. Jake VanderPlas ,Python Data Science Handbook - Essential Tools for Working with Data, O’Reily Media,Inc, 2016

.   Zhang.Y   ,An   Introduction   to    Python   and   Computer   Programming,   Springer Publications,2016

.JoelGrus,DataSciencefromScratchFirstPrincipleswithPython,O’ReillyMedia,2016

Evaluation Pattern
##### ESE: 50%

MDS173L - PROGRAMMING OF DATA SCIENCE IN PYTHON (2021 Batch)

Total Teaching Hours for Semester:90
No of Lecture Hours/Week:6
Max Marks:100
Credits:4

Course Objectives/Course Description

This course aims at laying down the foundational concepts of python programming. Starting with the fundamental programming using python, it escalates to the advanced programming concepts required for Data Science. It enables the students to organize, process and visualize data using the packages available in Python.

The objective of this course is to provide knowledge of python programming paradigms required for Data Science.

Course Outcome

CO1: Understand and demonstrate the usage of built-in objects in Python

CO2:Analyze the significance of python program development environment and apply it to solve real world applications

CO3: Implement numerical programming, data handling and visualization through NumPy, Pandas and MatplotLib modules.

Unit-1
Teaching Hours:17
INTRODUCTION TO PYTHON

Structure of Python Program-Underlying mechanism of Module Execution-Branching and Looping-Problem Solving Using Branches and Loops-Functions - Lists and Mutability- Problem Solving Using Lists and Functions

Unit-2
Teaching Hours:17
SEQUENCE DATATYPES AND OBJECT-ORIENTED PROGRAMMING

Sequences, Mapping and Sets- Dictionaries- -Classes: Classes and Instances-Inheritance- Exceptional Handling-Introduction to Regular Expressions using “re” module.

Unit-3
Teaching Hours:13
USING NUMPY

Basics of NumPy-Computation on NumPy-Aggregations-Computation on Arrays- Comparisons, Masks and Boolean Arrays-Fancy Indexing-Sorting Arrays-Structured Data: NumPy’s Structured Array.

Unit-4
Teaching Hours:13
DATA MANIPULATION WITH PANDAS -I

Introduction to Pandas Objects-Data indexing and Selection-Operating on Data in Pandas- Handling Missing Data-Hierarchical Indexing - Combining Data Sets

Unit-5
Teaching Hours:17
DATA MANIPULATION WITH PANDAS -II

Aggregation and Grouping-Pivot Tables-Vectorized String Operations -Working with Time Series-High Performance Pandas- and query()

Unit-6
Teaching Hours:13
VISUALIZATION AND MATPLOTLIB

Basic functions of matplotlib-Simple Line Plot, Scatter Plot-Density and Contour Plots- Histograms, Binnings and Density-Customizing Plot Legends, Colour Bars-Three- Dimensional Plotting in Matplotlib

Text Books And Reference Books:

1. Jake VanderPlas ,Python Data Science Handbook - Essential Tools for Working with   Data, O’Reily Media,Inc, 2016

2. Zhang.Y ,An Introduction to Python and Computer Programming, Springer Publications,2016

1.   Joel Grus ,Data Science from Scratch First Principles with Python, O’Reilly Media,2016.
2.   T.R.Padmanabhan, Programming with Python,Springer Publications,2016
3. "CS41 - The Python Programming Language", Stanfordpython.com, 2019. [Online]. Available: https://stanfordpython.com/#overview. [Accessed: 20- Jun- 2019].
4.  "Python for Data Science", Cognitive Class, 2019. [Online]. Available: https://cognitiveclass.ai/courses/python-for-data-science/. [Accessed: 20- Jun- 2019].

Evaluation Pattern

 CIA I CIA  II CIA III Attendance ESE 10% 25% 10% 5% 50%

MDS231 - MATHEMATICAL FOUNDATION FOR DATA SCIENCE - II (2021 Batch)

Total Teaching Hours for Semester:60
No of Lecture Hours/Week:4
Max Marks:100
Credits:4

Course Objectives/Course Description

This course aims at introducing data science related essential mathematics concepts such as fundamentals of topics on Calculus of several variables, Orthogonality, Convex optimization and Graph Theory.

Course Outcome

CO1: Demonstrate the properties of multivariate calculus

CO2: Use the idea of orthogonality and projections effectively

CO3: Have a clear understanding of Convex Optimization

CO4: Know the about the basic terminologies and properties in Graph Theory

 Unit-1 Teaching Hours:14 Calculus of Several Variables Functions of Several Variables: Functions of two, three variables - Limits and continuity in HIgher Dimensions: Limits for functions of two variables, Functions of more than two variables - Partial Derivatives: partial derivative of functions of two variables, partial derivatives of functions of more than two variables, partial derivatives and continuity, second order partial derivatives - The Chain Rule: chain rule on functions of two, three variables, chain rule on functions defined on surfaces - Directional Derivative and Gradient vectors: Directional derivatives in a plane, Interpretation of directional derivative, calculation and gradients, Gradients and tangents to level curves. Unit-2 Teaching Hours:10 Orthogonality Perpendicular vectors and Orthogonality - Inner Products and Projections onto lines - Projections of Rank one - Projections and Least Squares Approximations - Projection Matrices - Orthogonal Bases, Orthogonal Matrices and Gram-Schmidt orthogonalization Unit-3 Teaching Hours:12 Introduction to Convex Optimization Affine and Convex Sets: Lines and Line segments, affine sets, affine dimension andrelative interior, convexsets, cones - Hyperplanes and half-spaces - Euclidean balls and ellipsoids- Norm balls and Norm cones - polyhedra - simplexes, Convex hull description of polyhedra - The positive semidefinitecone. Unit-4 Teaching Hours:12 Graph Theory - Basics Graph Classes: Definition of a Graph and Graph terminology, isomorphism of graphs, Completegraphs, bipartite graphs, complete bipartite graphs-Vertex degree: adjacency and incidence, regular graphs - subgraphs, spanning subgraphs, induced subgraphs, removing or adding edges of a graph, removing vertices from graphs - Graph Operations: Graph Union, intersection, complement, self complement, Paths and Cycles, Connected graphs, Eulerian and HamiltonianGraphs. Unit-5 Teaching Hours:12 Graph Theory - More concepts Matrix Representation of Graphs, Adjacency matrices, Incidence Matrices, Trees and its properties, Bridges (cut-edges), spanning trees, weighted Graphs, minimal spanning tree problems, Shortest path problems, cut vertices, cuts, vertex and edge connectivity,  Graph Algorithms - Applications of Graph Theory Text Books And Reference Books:1.     M. D. Weir, J. Hass, and G. B. Thomas, Thomas' calculus. Pearson, 2016. (Unit 1) 2.     G Strang, Linear Algebra and its Applications, 4th ed., Cengage, 2006. (Unit 2) 3.     S. P. Boyd and L.Vandenberghe, Convex optimization.Cambridge Univ. Pr., 2011.(Unit 3) 4.     J Clark, D A Holton, A first look at Graph Theory, Allied Publishers India, 1995. (Unit 4) Essential Reading / Recommended Reading1.J. Patterson and A. Gibson, Deep learning: a practitioner's approach. O'Reilly Media, 2017. 2.S. Sra, S. Nowozin, and S. J. Wright, Optimization for machine learning. MIT Press, 2012. 3.D. Jungnickel, Graphs, networks and algorithms. Springer, 2014. 4.D Samovici, Mathematical Analysis for Machine Learning and Data Mining, World Scientific Publishing Co. Pte. Ltd, 2018 5.P. N. Klein, Coding the matrix: linear algebra through applications to computer science. Newtonian Press, 2015. 6.K H Rosen, Discrete Mathematics and its applications, 7th ed., McGraw Hill, 2016 Evaluation PatternCIA:50% ESE :50% MDS231L - MATHEMATICAL FOUNDATION FOR DATA SCIENCE II (2021 Batch) Total Teaching Hours for Semester:60 No of Lecture Hours/Week:4 Max Marks:100 Credits:4 Course Objectives/Course Description This course aims at introducing data science related essential mathematics concepts such as fundamentals of topics on Calculus of several variables, Orthogonality, Convex optimization and Graph Theory. Course Outcome Demonstrate the properties of multivariate calculus   Use the idea of orthogonality and projections effectively   Have a clear understanding of Convex Optimization   Know the about the basic terminologies and properties in Graph Theory
 Unit-1 Teaching Hours:14 Calculus of Several Variables Functions of Several Variables: Functions of two, three variables - Limits and continuity in HIgher Dimensions: Limits for functions of two variables, Functions of more than two variables - Partial Derivatives: partial derivative of functions of two variables, partial derivatives of functions of more than two variables, partial derivatives and continuity, second order partial derivatives - The Chain Rule: chain rule on functions of two, three variables, chain rule on functions defined on surfaces - Directional Derivative and Gradient vectors: Directional derivatives in a plane, Interpretation of directional derivative, calculation and gradients, Gradients and tangents to level curves. Unit-2 Teaching Hours:10 Orthogonality Perpendicular vectors and Orthogonality - Inner Products and Projections onto lines - Projections of Rank one - Projections and Least Squares Approximations - Projection Matrices - Orthogonal Bases, Orthogonal Matrices and Gram-Schmidt orthogonalization Unit-3 Teaching Hours:12 Introduction to Convex Optimization Affine and Convex Sets: Lines and Line segments, affine sets, affine dimension andrelative interior, convexsets, cones - Hyperplanes and half-spaces - Euclidean balls and ellipsoids- Norm balls and Norm cones - polyhedra - simplexes, Convex hull description of polyhedra - The positive semidefinitecone. Unit-4 Teaching Hours:12 Graph Theory - Basics Graph Classes: Definition of a Graph and Graph terminology, isomorphism of graphs, Complete graphs, bipartite graphs, complete bipartite graphs- Vertex degree: adjacency and incidence, regular graphs - subgraphs, spanning subgraphs, induced subgraphs, removing or adding edges of a graph, removing vertices from graphs - Graph Operations: Graph Union, intersection, complement, self complement, Paths and Cycles, Connected graphs, Euclerian and Hamiltonian graphs. Unit-5 Teaching Hours:12 Graph Theory - More concepts Matrix Representation of Graphs, Adjacency matrices, Incidence Matrices, Trees and its properties, Bridges (cut-edges), spanning trees, weighted Graphs, minimal spanning tree problems, Shortest path problems, cut vertices, cuts, vertex and edge connectivity, Graph Algorithms - Applications of Graph Theory Text Books And Reference Books:1. M. D. Weir, J. Hass, and G. B. Thomas, Thomas' calculus. Pearson, 2016.  2. G Strang, Linear Algebra and its Applications, 4th ed., Cengage, 2006.  3. S. P. Boyd and L.Vandenberghe, Convex optimization.Cambridge Univ. Pr., 2011. 4. J Clark, D A Holton, A first look at Graph Theory, Allied Publishers India, 1995. Essential Reading / Recommended Reading1. J. Patterson and A. Gibson, Deep learning: a practitioner's approach. O'Reilly Media, 2017. 2. S. Sra, S. Nowozin, and S. J. Wright, Optimization for machine learning. MIT Press, 2012. 3. D. Jungnickel, Graphs, networks and algorithms. Springer, 2014. 4. D Samovici, Mathematical Analysis for Machine Learning and Data Mining, World Scientific Publishing Co. Pte. Ltd, 2018 5. P. N. Klein, Coding the matrix: linear algebra through applications to computer science. Newtonian Press, 2015. 6. K H Rosen, Discrete Mathematics and its applications, 7th ed., McGraw Hill, 2016 Evaluation PatternCIA I : 10% CIA  II : 25% CIA III : 10% Attendance : 5% ESE : 50% MDS232 - REGRESSION ANALYSIS (2021 Batch) Total Teaching Hours for Semester:60 No of Lecture Hours/Week:4 Max Marks:100 Credits:4 Course Objectives/Course Description This course aims to provide the grounding knowledge about the regression model building of simple and multiple regression. Course Outcome CO1: Demonstrate deeper understanding of the linear regression model. CO2: Evaluate R-square criteria for model selection CO3: Understand the forward, backward and stepwise methods for selecting the variables CO4: Understand the importance of multicollinearity in regression modelling CO5: Ability touse and understand generalizations of the linear model to binary and count data
 Unit-1 Teaching Hours:13 SIMPLE LINEAR REGRESSION Introduction to regression analysis: Modelling a response, overview and applications of regression analysis, major steps in regression analysis. Simple linear regression (Two variables): assumptions, estimation and properties of regression coefficients, significance and confidence intervals of regression coefficients, measuring the quality of the fit. Unit-2 Teaching Hours:13 MULTIPLE LINEAR REGRESSION Multiple linear regression model: assumptions, ordinary least square estimation of regression coefficients, interpretation and properties of regression coefficient, significance and confidence intervals of regression coefficients. Unit-3 Teaching Hours:12 CRITERIA FOR MODEL SELECTION Mean Square error criteria, R2 and  criteria for model selection; Need of the transformation of variables; Box-Cox transformation; Forward, Backward and Stepwise procedures. Unit-4 Teaching Hours:12 RESIDUAL ANALYSIS Residual analysis, Departures from underlying assumptions, Effect of outliers, Collinearity, Non-constant variance and serial correlation, Departures from normality, Diagnostics and remedies. Unit-5 Teaching Hours:10 NON LINEAR REGRESSION Introduction to nonlinear regression, Least squares in the nonlinear case and estimation of parameters, Models for binary response variables, estimation and diagnosis methods for logistic and Poisson regressions. Prediction and residual analysis. Text Books And Reference Books:.D.C Montgomery, E.A Peck and G.G Vining, Introduction to Linear Regression Analysis, John Wiley and Sons,Inc.NY, 2003. . S. Chatterjee and AHadi, Regression Analysis by Example, 4th Ed., John Wiley and Sons, Inc, 2006 .Seber, A.F. and Lee, A.J. (2003) Linear Regression Analysis, John Wiley, Relevant sections from chapters 3, 4, 5, 6, 7, 9, 10. Essential Reading / Recommended Reading. Iain Pardoe, Applied Regression Modeling, John Wiley and Sons, Inc, 2012. . P. McCullagh, J.A. Nelder, Generalized Linear Models, Chapman & Hall, 1989. Evaluation PatternCIA - 50% ESE - 50% MDS232L - REGRESSION ANALYSIS (2021 Batch) Total Teaching Hours for Semester:60 No of Lecture Hours/Week:4 Max Marks:100 Credits:4 Course Objectives/Course Description Course Description - This course aims to provide the grounding knowledge about the regression model building of simple and multiple regression.     Course Objectives : To build a foundation on the basic tools of regression analysis.  To apply econometric modelling on different types of data To learn how to identify the goodness of fit of some basic econometric models To diagnose common problems in linear regression modelling Course Outcome CO1: Demonstrate deeper understanding of the linear regression model. CO2: Evaluate R-square criteria for model selection CO3: Understand the forward, backward and stepwise methods for selecting the variables CO4: Understand the importance of multicollinearity in regression modelling   CO5: Ability to use and understand generalizations of the linear model to binary and count data
Unit-1
Teaching Hours:15
SIMPLE LINEAR REGRESSION

Introduction to regression analysis: Modelling a response, overview and applications of regression analysis, major steps in regression analysis. Simple linear regression (Two variables): assumptions, estimation and properties of regression coefficients, significance and confidence intervals of regression coefficients, measuring the quality of the fit.

Unit-2
Teaching Hours:15
MULTIPLE LINEAR REGRESSION

 Multiple linear regression model: assumptions, ordinary least square estimation of regression coefficients, interpretation and properties of regression coefficient, significance and confidence intervals of regression coefficients.
Unit-3
Teaching Hours:10
CRITERIA FOR MODEL SELECTION

Mean Square error criteria, R2 and  criteria for model selection; Need of the transformation of variables; Box-Cox transformation; Forward, Backward and Stepwise procedures.

Unit-4
Teaching Hours:10
RESIDUAL ANALYSIS

Residual analysis, Departures from underlying assumptions, Effect of outliers, Collinearity, Non-constant variance and serial correlation, Departures from normality, Diagnostics and remedies.

Unit-5
Teaching Hours:10
NON LINEAR REGRESSION

Introduction to nonlinear regression, Least squares in the nonlinear case and estimation of parameters, Models for binary response variables, estimation and diagnosis methods for logistic and Poisson regressions. Prediction and residual analysis.

Text Books And Reference Books:

a.     Wooldridge, J. M. (2015). Introductory econometrics: A modern approach. Cengage learning.

b.     Gujarati, D. N., Porter, D. C., & Gunasekar, S. (2012). Basic econometrics. Tata McGraw-Hill Education.

c. Studenmund, A. H. (2014). Using econometrics, a practical guide. Pearson

1. Iain Pardoe, Applied Regression Modelling, John Wiley and Sons, Inc, 2012.

2.  P. McCullagh, J.A. Nelder, Generalized Linear Models, Chapman & Hall, 1989.

3.   D.C Montgomery, E.A Peck and G.G Vining, Introduction to Linear Regression Analysis, John Wiley and Sons,Inc.NY, 2003.

4.  S. Chatterjee and AHadi, Regression Analysis by Example, 4th Ed., John Wiley and Sons, Inc, 2006

5. Seber, A.F. and Lee, A.J. (2003) Linear Regression Analysis, John Wiley, Relevant sections from chapters 3, 4, 5, 6, 7, 9, 10.

Evaluation Pattern

CIA I: 10%

CIA II: 25%

CIA III: 10%

Attendance: 5%

ESE: 50%

MDS241A - MULTIVARIATE ANALYSIS (2021 Batch)

Total Teaching Hours for Semester:60
No of Lecture Hours/Week:4
Max Marks:100
Credits:4

Course Objectives/Course Description

This course lays the foundation of Multivariate data analysis. The exposure provided to multivariate data structure, multinomial and multivariate normal distribution, estimation and testing of parameters, various data reduction methods would help the students in having a better understanding of research data, its presentation and analysis.

Course Outcome

CO1: Understand multivariate data structure, multinomial and multivariate normal distribution

CO2: Apply Multivariate analysis of variance (MANOVA) of one and two-way classified data.

 Unit-1 Teaching Hours:12 INTRODUCTION Basic concepts on multivariate variable. Multivariate normal distribution, Marginal and conditional distribution, Concept of random vector: Its expectation and Variance-Covariance matrix. Marginal and joint distributions. Conditional distributions and Independence of random vectors. Multinomial distribution. Sample mean vector and its distribution. Unit-2 Teaching Hours:12 DISTRIBUTION Sample mean vector and its distribution. Likelihood ratio tests: Tests of hypotheses about the mean vectors and covariance matrices for multivariate normal populations. Independence of sub vectors and sphericity test. Unit-3 Teaching Hours:12 MULTIVARIATE ANALYSIS Multivariate analysis of variance (MANOVA) of one and two- way classified data. Multivariate analysis of covariance.  Wishart distribution, Hotelling’s T2 and Mahalanobis’ D2 statistics, Null distribution of Hotelling’s T2. Rao’s U statistics and its distribution. Unit-4 Teaching Hours:12 CLASSIFICATION AND DISCRIMINANT PROCEDURES Bayes, minimax, and Fisher’s criteria for discrimination between two multivariate normal populations. Sample discriminant function. Tests associated with discriminant functions. Probabilities of misclassification and their estimation. Discrimination for several multivariate normal populations Unit-5 Teaching Hours:12 PRINCIPAL COMPONENT and FACTOR ANALYSIS Principal components, sample principal components asymptotic properties. Canonical variables and canonical correlations: definition, estimation, computations. Test for significance of canonical correlations. Factor analysis: Orthogonal factor model, factor loadings, estimation of factor loadings, factor scores.  Applications Text Books And Reference Books:. Anderson, T.W. 2009. An Introduction to Multivariate Statistical Analysis, 3rd Edition, John Wiley. . Everitt B, Hothorn T, 2011. An Introduction to Applied Multivariate Analysis with R, Springer. . Barry J. Babin, Hair, Rolph E Anderson, and William C. Blac, 2013,  Multivariate Data Analysis, Pearson New International Edition, Essential Reading / Recommended Reading Giri, N.C. 1977. Multivariate Statistical Inference. Academic Press.  Chatfield, C. and Collins, A.J. 1982. Introduction to Multivariate analysis. Prentice Hall  Srivastava, M.S. and Khatri, C.G. 1979. An Introduction to Multivariate Statistics. North Holland Evaluation PatternCIA - 50% ESE - 50% MDS241B - STOCHASTIC PROCESS (2021 Batch) Total Teaching Hours for Semester:60 No of Lecture Hours/Week:4 Max Marks:100 Credits:4 Course Objectives/Course Description This course is designed to introduce the concepts of theory of estimation and testing of hypothesis. This paper also deals with the concept of parametric tests for large and small samples. It also provides knowledge about non-parametric tests and its applications. Course Outcome CO1: Demonstrate the concepts of point and interval estimation of unknown parameters and their significance using large and small samples. CO2: Apply the idea of sampling distributions of difference statistics in testing of hypotheses. CO3: Infer the concept of nonparametric tests for single sample and two samples.
 Unit-1 Teaching Hours:12 INTRODUCTION TO STOCHASTIC PROCESSES Classification of Stochastic Processes, Markov Processes – Markov Chain - Countable State Markov Chain. Transition Probabilities, Transition Probability Matrix. Chapman - Kolmogorov's Equations, Calculation of n - step Transition Probability and its limit. Unit-2 Teaching Hours:12 POISSON PROCESS Classification of States, Recurrent and Transient States - Transient Markov Chain, Random Walk and Gambler's Ruin Problem. Continuous Time Markov Process:, Poisson Processes, Birth and Death Processes, Kolmogorov’s Differential Equations, Applications. Unit-3 Teaching Hours:12 BRANCHING PROCESS Branching Processes – Galton – Watson Branching Process - Properties of Generating Functions – Extinction Probabilities – Distribution of Total Number of Progeny. Concept of Weiner Process. Unit-4 Teaching Hours:12 RENEWAL PROCESS Renewal Processes – Renewal Process in Discrete and Continuous Time – Renewal Interval – Renewal Function and Renewal Density – Renewal Equation – Renewal theorems: Elementary Renewal Theorem. Probability Generating Function of Renewal Processes. Unit-5 Teaching Hours:12 STATIONARY PROCESS Stationary Processes: Discrete Parameter Stochastic Process – Application to Time Series. Auto-covariance and Auto-correlation functions and their properties. Moving Average, Autoregressive, Autoregressive Moving Average, Autoregressive Integrated Moving Average Processes. Basic ideas of residual analysis, diagnostic checking, forecasting. Text Books And Reference Books:. Stochastic Processes, R.G Gallager, Cambridge University Press, 2013. . Stochastic Processes, S.M Ross, Wiley India Pvt. Ltd, 2008. Essential Reading / Recommended Reading. Stochastic Processes from Applications to Theory, P.D Moral and S. Penev, CRC Press, 2016 . Introduction to Probability and Stochastic Processes with Applications, B..C. Liliana, A Viswanathan, S. Dharmaraja, Wiley Pvt. Ltd, 2012. Evaluation PatternCIA - 50% ESE - 50% MDS241C - CATEGORICAL DATA ANALYSIS (2021 Batch) Total Teaching Hours for Semester:60 No of Lecture Hours/Week:4 Max Marks:100 Credits:4 Course Objectives/Course Description Categorical data analysis deals with the study of information captured through expressions or verbal forms. This course equips the students with the theory and methods to analyse and categorical responses. Course Outcome CO1: Describe the categorical response. CO2: Identify tests for contingency tables. CO3: Apply regression models for categorical response variables. CO4: Analyse contingency tables using log-linear models.
 Unit-1 Teaching Hours:12 INTRODUCTION Categorical response data - Probability distributions for categorical data - Statistical inference for discrete data Unit-2 Teaching Hours:12 CONTINGENCY TABLES Probability structure for contingency tables - Comparing proportions with 2x2 tables - The odds ratio - Tests for independence - Exact inference - Extension to three-way and larger tables Unit-3 Teaching Hours:12 GENERALIZED LINEAR MODELS Components of a generalized linear model - GLM for binary and count data - Statistical inference and model checking - Fitting GLMs Unit-4 Teaching Hours:12 LOGISTIC REGRESSION Interpreting the logistic regression model - Inference for logistic regression - Logistic regression with categorical predictors - Multiple logistic regression - Summarising effects - Building and applying logistic regression models - Multicategory logit models Unit-5 Teaching Hours:12 LOGLINEAR MODELS FOR CONTINGENCY TABLES Loglinear models for two-way and three-way tables - Inference for Loglinear models - the log-linear-logistic connection - Independence graphs and collapsibility - Models for matched pairs: Comparing dependent proportions, Logistic regression for matched pairs - Comparing margins of square contingency tables - symmetry issues Text Books And Reference Books:1. Agresti, A. (2012). Categorical Data Analysis, 3rd Edition. New York: Wiley Essential Reading / Recommended Reading 1. Le, C.T. (2009). Applied Categorical Data Analysis and Translational Research, 2nd Ed., John Wiley and Sons.  2. Agresti, A. (2010). Analysis of ordinal categorical. John Wiley & Sons.  3. Stokes, M. E., Davis, C. S., & Koch, G. G. (2012). Categorical data analysis using SAS. SAS Institute.  4. Agresti, A. (2018). An introduction to categorical data analysis. John Wiley & Sons.  5. Bilder, C. R., & Loughin, T. M. (2014). Analysis of categorical data with R. Chapman and Hall/CRC. Evaluation PatternCIA:50% ESE:50% MDS241LA - MULTIVARIATE ANALYSIS (2021 Batch) Total Teaching Hours for Semester:60 No of Lecture Hours/Week:4 Max Marks:100 Credits:4 Course Objectives/Course Description Course Description and Course Objectives This course lays the foundation of Multivariate data analysis. The exposure provided to multivariate data structure, multinomial and multivariate normal distribution, estimation and testing of parameters, various data reduction methods would help the students in having a better understanding of research data, its presentation and analysis. Course Outcome Course Outcomes CO1: Understand multivariate data structure, multinomial and multivariate normal distribution CO2: Apply Multivariate analysis of variance (MANOVA) of one and two-way classified data.
Unit-1
Teaching Hours:12
INTRODUCTION

Basic concepts on multivariate variable. Multivariate normal distribution, Marginal and conditional distribution, Concept of random vector: Its expectation and VarianceCovariance matrix. Marginal and joint distributions. Conditional distributions and Independence of random vectors. Multinomial distribution. Sample mean vector and its distribution.

Unit-2
Teaching Hours:12
DISTRIBUTION

Sample mean vector and its distribution. Likelihood ratio tests: Tests of hypotheses about the mean vectors and covariance matrices for multivariate normal populations. Independence of sub vectors and sphericity test.

Unit-3
Teaching Hours:12
MULTIVARIATE ANALYSIS

Multivariate analysis of variance (MANOVA) of one and two- way classified data. Multivariate analysis of covariance. Wishart distribution, Hotelling’s T2 and Mahalanobis’ D2 statistics, Null distribution of Hotelling’s T2. Rao’s U statistics and its distribution.

Unit-4
Teaching Hours:12
CLASSIFICATION AND DISCRIMINANT PROCEDURES

Bayes, minimax, and Fisher’s criteria for discrimination between two multivariate normal populations. Sample discriminant function. Tests associated with discriminant functions. Probabilities of misclassification and their estimation. Discrimination for several multivariate normal populations

Unit-5
Teaching Hours:12
PRINCIPAL COMPONENT and FACTOR ANALYSIS

Principal components, sample principal components asymptotic properties. Canonical variables and canonical correlations: definition, estimation, computations. Test for significance of canonical correlations. Factor analysis: Orthogonal factor model, factor loadings, estimation of factor loadings, factor scores. Applications

Text Books And Reference Books:

. Anderson, T.W. 2009. An Introduction to Multivariate Statistical Analysis, 3rd Edition, John Wiley.

. Everitt B, Hothorn T, 2011. An Introduction to Applied Multivariate Analysis with R, Springer.

. Barry J. Babin, Hair, Rolph E Anderson, and William C. Blac, 2013, Multivariate Data Analysis, Pearson New International Edition.

 Giri, N.C. 1977. Multivariate Statistical Inference. Academic Press.

 Chatfield, C. and Collins, A.J. 1982. Introduction to Multivariate analysis. Prentice Hall

 Srivastava, M.S. and Khatri, C.G. 1979. An Introduction to Multivariate Statistics. North Holland

Evaluation Pattern

CIA - 50%

ESE - 50%

 CIA I - 1 CIA II CIA III Attendance ESE 10% 25% 10% 5% 50%

MDS241LB - STOCHASTIC PROCESS (2021 Batch)

Total Teaching Hours for Semester:60
No of Lecture Hours/Week:4
Max Marks:100
Credits:4

Course Objectives/Course Description

This course is designed to introduce the concepts of theory of estimation and testing of hypothesis. This paper also deals with the concept of parametric tests for large and small samples. It also provides knowledge about non-parametric tests and its applications.

Course Outcome

CO1: Demonstrate the concepts of point and interval estimation of unknown parameters and their significance using large and small samples.

CO2: Apply the idea of sampling distributions of the difference statistics in the testing of hypotheses.

CO3: Infer the concept of nonparametric tests for single sample and two samples.

Unit-1
Teaching Hours:12
INTRODUCTION TO STOCHASTIC PROCESSES

Classification of Stochastic Processes, Markov Processes – Markov Chain - Countable State Markov Chain. Transition Probabilities, Transition Probability Matrix. Chapman - Kolmogorov's Equations, Calculation of n - step Transition Probability and it's limit.

Unit-2
Teaching Hours:12
POISSON PROCESS

Classification of States, Recurrent and Transient States - Transient Markov Chain, Random Walk , and Gambler's Ruin Problem. Continuous-Time Markov Process: Poisson Processes, Birth and Death Processes, Kolmogorov’s Differential Equations, Applications.

Unit-3
Teaching Hours:12
BRANCHING PROCESS

Branching Processes – Galton – Watson Branching Process - Properties of Generating Functions – Extinction Probabilities – Distribution of Total Number of Progeny. Concept of Weiner Process.

Unit-4
Teaching Hours:12
RENEWAL PROCESS

Renewal Processes – Renewal Process in Discrete and Continuous Time – Renewal Interval – Renewal Function and Renewal Density – Renewal Equation – Renewal theorems: Elementary Renewal Theorem. Probability Generating Function of Renewal Processes.

Unit-5
Teaching Hours:12
STATIONARY PROCESS

Stationary Processes: Discrete Parameter Stochastic Process – Application to Time Series. Auto-covariance and Auto-correlation functions and their properties. Moving Average, Autoregressive, Autoregressive Moving Average, Autoregressive Integrated Moving Average Processes. Basic ideas of residual analysis, diagnostic checking, forecasting.

Text Books And Reference Books:

. Stochastic Processes, R.G Gallager, Cambridge University Press, 2013.

. Stochastic Processes, S.M Ross, Wiley India Pvt. Ltd, 2008.

 . Stochastic Processes from Applications to Theory, P.D Moral and S. Penev, CRC Press, 2016 . Introduction to Probability and Stochastic Processes with Applications, B..C. Liliana, A Viswanathan, S. Dharmaraja, Wiley Pvt. Ltd, 2012.
Evaluation Pattern
 CIA I CIA  II CIA III Attendance ESE 10% 25% 10% 5% 50%

MDS271 - MACHINE LEARNING (2021 Batch)

Total Teaching Hours for Semester:90
No of Lecture Hours/Week:6
Max Marks:150
Credits:5

Course Objectives/Course Description

Theobjectiveofthiscourseistoprovideintroductiontotheprinciplesanddesignofmachine learning algorithms. The course is aimed at providing foundations for conceptual aspects of machine learning algorithms along with their applications to solve real world problems.

Course Outcome

CO1: Understand the basic principles of machine learning techniques.

CO2:Understandhowmachinelearningproblemsareformulatedandsolved.

CO3:Applymachinelearningalgorithmstosolverealworldproblems.

Unit-1
Teaching Hours:18
INRTODUCTION

MachineLearning-ExamplesofMachineApplications-LearningAssociations-Classification- Regression-UnsupervisedLearning-Reinforcement Learning.Supervised Learning: Learning class from examples- Probably Approach Correct(PAC) Learning-Noise-Learning Multiple classes. Regression-Model Selection and Generalization.

IntroductiontoParametricmethods-MaximumLikelihood Estimation:Bernoulli Density- Multinomial Density-Gaussian Density, Nonparametric Density Estimation: Histogram Estimator-Kernel Estimator-K-Nearest NeighbourEstimator.

Lab Exercise:

1.      Data Exploration using parametric methods

2.      Data Exploration using non-parametric methods

3.      Regression analysis

Unit-2
Teaching Hours:18
DIMENSIONALITY REDUCTION

Dimensionality Reduction: Introduction- Subset Selection-Principal Component Analysis, Feature Embedding-Factor Analysis-Singular Value Decomposition-Multidimensional Scaling-Linear Discriminant Analysis- Bayesian Decision Theory.

Lab Exercise:

1.      Data reduction using Principal ComponentAnalysis

2.      Data reduction using multi-dimensional scaling

Unit-3
Teaching Hours:18
SUPERVISED LEARNING - I

Linear Discrimination: Introduction- Generalizing the Linear Model-Geometry of the Linear Discriminant- Pairwise Separation-Gradient Descent-Logistic Discrimination.

Kernel Machines: Introduction- optical separating hyperplane- v-SVM, kernel tricks- vertical kernel- vertical kernel- defining kernel- multiclass kernel machines- one-class kernel machines.

Lab Exercise

1.   Lineardiscrimination

2.    Logisticdiscrimination

3.   Classification using kernel machines

Unit-4
Teaching Hours:18
SUPERVISED LEARNING - II

## Multilayer Perceptron:

Introduction, training a perceptron- learning Boolean functions- multilayer perceptron- backpropogation algorithm- training procedures.

Combining Multiple Learners

Rationale-Generating diverse learners- Model combination schemes- voting, Bagging- Boosting- fine tuning an Ensemble.

Lab Exercise

1.  Classification using MLP

2.  Ensemble Learning

Unit-5
Teaching Hours:18
UNSUPERVISED LEARNING

Clustering

Introduction-Mixture Densities, K-Means Clustering- Expectation-Maximization algorithm- Mixtures of Latent Varaible Models-Supervised Learning after Clustering-Spectral Clustering- Hierachial Clustering-Clustering- Choosing the number of Clusters.

Lab Exercise

1.  K means clustering

2.  Hierarchical clustering

Text Books And Reference Books:

. E. Alpaydin, Introduction to Machine Learning, 3rd Edition, MIT Press, 2014.

1.  C.M.Bishop,PatternRecognitionandMachineLearning,Springer,2016.

2.   T. Hastie, R. Tibshirani and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference and Prediction, Springer, 2nd Edition,2009

3.  K.P.Murphy,MachineLearning:AProbabilisticPerspective,MITPress,2012.

Evaluation Pattern

CIA: 50%

ESE: 50%

MDS271L - MACHINE LEARNING (2021 Batch)

Total Teaching Hours for Semester:90
No of Lecture Hours/Week:6
Max Marks:150
Credits:5

Course Objectives/Course Description

The objectives of this course is to provide introduction to the principles and design of machine learning algorithms. The course is aimed at providing foundations for conceptual aspects of machine learning algorithms along with their applications to solve real world problems.

Course Outcome

This course enables students to -

• Understand the basic principles of machine learning techniques.
• Understand how machine learning problems are formulated and solved.
• Apply machine learning algorithms to solve real world problems.

 Unit-1 Teaching Hours:18 INTRODUCTION Machine Learning - Examples of Machine Applications - Learning Associations - Classification - Regression -Unsupervised Learning - Reinforcement Learning Supervised Learning: Learning class from examples - Probably Approach Correct (PAC) Learning - Noise - Learning Multiple classes. Regression-Model Selection and Generalization. Introduction to Parametric methods - Maximum Likelihood Estimation: Bernoulli Density - Multinomial Density-Gaussian Density, Nonparametric Density Estimation: Histogram Estimator-Kernel Estimator-K-Nearest Neighbour Estimator. Unit-1 Teaching Hours:18 Lab Exercises Data Exploration using Parametric Methods Data Exploration using Non-Parametric Methods Regression Analysis Unit-2 Teaching Hours:18 DIMENSIONALITY REDUCTION Dimensionality Reduction: Introduction- Subset Selection - Principal Component Analysis, Feature Embedding-Factor Analysis-Singular Value Decomposition-Multidimensional Scaling - Linear Discriminant Analysis - Bayesian Decision Theory. Unit-2 Teaching Hours:18 Lab Exercise Data reduction using Principal Component Analysis Data reduction using Multi-Dimensional Scaling Unit-3 Teaching Hours:18 KERNEL METHODS Introduction - optical separating hyperplane- v-SVM, kernel tricks - vertical kernel - vertical kernel - defining kernel - multiclass kernel machines - one-class kernel machines. Unit-3 Teaching Hours:18 SUPERVISED LEARNING Linear Discrimination: Introduction - Generalizing the Linear Model-Geometry of the Linear Discriminant - Pairwise Separation - Gradient Descent - Logistic Discrimination Unit-3 Teaching Hours:18 Lab Exercises Linear Discrimination Logistic Discrimination Classification using Kernel Machines Unit-4 Teaching Hours:18 MULTILAYER PERCEPTRON Introduction, training a perceptron - learning Boolean functions - multilayer perceptron - backpropogation algorithm - training procedures Unit-4 Teaching Hours:18 Lab Exercise Classification using MLP Enesemble Learning Unit-4 Teaching Hours:18 COMBINING MULTIPLE LEARNERS Rationale - Generating diverse learners - Model combination schemes - voting, Bagging- Boosting - fine tuning an Ensemble. Unit-5 Teaching Hours:18 UNSUPERVISED LEARNING Clustering - Introduction - Mixture Densities, K-Means Clustering - Expectation-Maximization algorithm - Mixtures of Latent Varaible Models - Supervised Learning after Clustering - Spectral Clustering - Hierachial Clustering - Clustering - Choosing the number of Clusters Unit-5 Teaching Hours:18 Lab Exercises K Means Clustering Hierarchical Clustering Text Books And Reference Books: E. Alpaydin, Introduction to Machine Learning, 3rd Edition, MIT Press, 2014. Essential Reading / Recommended Reading C.M.Bishop,PatternRecognitionandMachineLearning,Springer,2016. T. Hastie, R. Tibshirani and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference and Prediction, Springer, 2nd Edition,2009 K.P.Murphy,MachineLearning:AProbabilisticPerspective,MITPress,2012. Evaluation PatternCIA: 50%, ESE: 50% MDS272A - HADOOP (2021 Batch) Total Teaching Hours for Semester:90 No of Lecture Hours/Week:6 Max Marks:150 Credits:5 Course Objectives/Course Description The subject is intended to give the knowledge of Big Data evolving in every real-time applications and how they are manipulated using the emerging technologies. This course breaks down the walls of complexity in processing Big Data by providing a practical approach to developing Java applications on top of the Hadoop platform. It describes the Hadoop architecture and how to work with the Hadoop Distributed File System (HDFS) and HBase in Ubuntu platform. Course Outcome CO1: Understand the Big Data concepts in real time scenario CO2: Understand the big data systems and identify the main sources of Big Data in the real world. CO3: Demonstrate an ability to use Hadoop framework for processing Big Data for Analytics.  CO4: Evaluate the Map reduce approach for different domain problems.
Unit-1
Teaching Hours:15
INTRODUCTION

Distributed file system – Big Data and its importance, Four Vs, Drivers for Big data, Big data analytics, Big data applications, Algorithms using map reduce, Matrix-Vector Multiplication by Map Reduce.

Apache Hadoop– Moving Data in and out of Hadoop – Understanding inputs and outputs ofMapReduce - Data Serialization, Problems with traditional large-scale systems-Requirements for a new approach-Hadoop – Scaling-Distributed Framework-Hadoop v/s RDBMS-Brief history of Hadoop.

Lab Exercise

Unit-2
Teaching Hours:15

Hadoop Processes (NN, SNN, JT, DN, TT)-Temporary directory – UI-Common errors when running Hadoop cluster, solutions.

Understanding MapReduce:Key/value pairs,TheHadoop Java API for MapReduce, Writing MapReduce programs, Hadoop-specific data types, Input/output.

Developing MapReduce Programs: Using languages other than Java with Hadoop, Analysing a large dataset.

Lab Exercise

1.      1. Word count application in Hadoop.

2.      2. Sorting the data using MapReduce.

3.      3. Finding max and min value in Hadoop.

Unit-3
Teaching Hours:15

Simple, advanced, and in-between Joins, Graph algorithms, using language-independent data structures.

Hadoop configuration properties - Setting up a cluster, Cluster access control, managing the NameNode, Managing HDFS, MapReduce management, Scaling.

Lab Exercise:

1. Implementation of decision tree algorithms using MapReduce.

2. Implementation of K-means Clustering using MapReduce.

3. Generation of  Frequent Itemset using MapReduce.

Unit-4
Teaching Hours:15

Hadoop Streaming  -   Streaming  Command  Options - Specifying  a  Java  Class  as  the  Mapper/Reducer - Packaging Files With Job Submissions - Specifying Other Plug-ins for Jobs.

Lab Exercise:

1.      1. Count the number of missing and invalid values through joining two large given datasets.

2.      2. Using hadoop’s map-reduce, Evaluating Number of Products Sold in Each Country in the online shopping portal. Dataset is given.

3.      3. Analyze the sentiment for product reviews, this work proposes a MapReduce technique provided by Apache Hadoop.

Unit-5
Teaching Hours:15
HIVE & PIG

Architecture, Installation, Configuration, Hive vs RDBMS, Tables, DDL & DML, Partitioning & Bucketing, Hive Web Interface, Pig, Use case of Pig, Pig Components, Data Model, Pig Latin.

Lab Exercise

1. Trend Analysis based on Access Pattern over Web Logs using Hadoop.

2. Service Rating Prediction by Exploring Social Mobile Users Geographical Locations.

Unit-6
Teaching Hours:15
Hbase

RDBMS VsNoSQL, HBasics, Installation, Building an online query application – Schema design, Loading Data, Online Queries, Successful service.

Hands On: Single Node Hadoop Cluster Set up in any cloud service provider- How to create instance.How to connect that Instance Using putty.InstallingHadoop framework on this instance. Run sample programs which come with Hadoop framework.

Lab Exercise:

1.      1. Big Data Analytics Framework Based Simulated Performance and Operational Efficiencies Through Billons of Patient Records in Hospital System.

Text Books And Reference Books:

 Boris lublinsky, Kevin t. Smith, Alexey Yakubovich, Professional Hadoop Solutions, Wiley, 2015.

 Tom White, Hadoop: The Definitive Guide, O’Reilly Media Inc., 2015.

 Garry Turkington, Hadoop Beginner's Guide, Packt Publishing, 2013.

 Pethuru Raj, Anupama Raman, DhivyaNagaraj and Siddhartha Duggirala, High-Performance Big-Data Analytics: Computing Systems and Approaches, Springer, 2015.

 Jonathan R. Owens, Jon Lentz and Brian Femiano, Hadoop Real-World Solutions Cookbook, Packt Publishing, 2013.

 Tom White, HADOOP: The definitive Guide, O Reilly, 2012.

Evaluation Pattern

CIA - 50%

ESE - 50%

MDS272B - IMAGE AND VIDEO ANALYTICS (2021 Batch)

Total Teaching Hours for Semester:90
No of Lecture Hours/Week:6
Max Marks:150
Credits:5

Course Objectives/Course Description

This course will provide a basic foundation towards digital image processing and video analysis. This course will also provide brief introduction about various Object Detection, Recognition, Segmentation and Compression methods which will help the students to demonstrate real-time image and video analytics applications.

Course Outcome

#### CO2: Apply the image and video analysis approaches to solve real world problems

Unit-1
Teaching Hours:18
INTRODUCTION TO DIGITAL IMAGE AND VIDEO PROCESSING

#### 2. Program to implement contrast stretching.

Unit-2
Teaching Hours:18
IMAGE AND VIDEO ENHANCEMENT AND RESTORATION

#### 4. Program to implement Non-linear Spatial Filtering using Built-in and userdefined functions.

Unit-3
Teaching Hours:18
IMAGE AND VIDEO ANALYSIS

#### 6.     Extraction of frames from videos and analyzing frames

Unit-4
Teaching Hours:18
FEATURE DETECTION AND DESCRIPTION

#### 8.     Implement image compression using wavelets.

Unit-5
Teaching Hours:18
OBJECT DETECTION AND RECOGNITION

#### 10. Implement image classification using extracted relevant features.

Text Books And Reference Books:

 Rafael C. Gonzalez and Richard E. Woods, Digital Image Processing, 4th Edition, Pearson Education, 2018.

 Alan Bovik, Handbook of Image and Video Processing, Second Edition, Academic Press, 2005.

 Anil K Jain, Fundamentals of Digital Image Processing, PHI, 2011.

 RichardSzeliski,ComputerVision–AlgorithmsandApplications,Springer,2011.

 Oge Marques, Practical Image and Video Processing Using MatLab, Wiley, 2011.

 John W. Woods, Multidimensional Signal, Image, Video Processing and Coding, Academic Press, 2006.

Evaluation Pattern

CIA: 50%

ESE: 50%

MDS272C - INTERNET OF THINGS (2021 Batch)

Total Teaching Hours for Semester:90
No of Lecture Hours/Week:6
Max Marks:150
Credits:5

Course Objectives/Course Description

The explosive growth of the “Internet of Things” is changing our world and the rapid growth of IoT components is allowing people to innovate new designs and products at home. Wireless Sensor Networks form the basis of the Internet of Things. To latch on to the applications in the field of IoT of the recent times, this course provides a deeper understanding of the underlying concepts of IoT and Wireless Sensor Networks.

Course Outcome

CO1: Understand the concepts of IoT and IoT enabling technologies

CO2: Gain knowledge on IoT programming and able to develop IoT applications

CO3: Identify different issues in wireless ad hoc and sensor networks

CO4: Develop an understanding of sensor network architectures from a design and performance perspective

CO5: Understand the layered approach in sensor networks and WSN protocols