Department of
COMPUTER-SCIENCE






Syllabus for
Master of Science (Data Science)
Academic Year  (2019)

 
1 Semester - 2019 - Batch
Paper Code
Paper
Hours Per
Week
Credits
Marks
MDS131 MATHEMATICAL FOUNDATION FOR DATA SCIENCE - I 4 4 100
MDS131L MATHEMATICAL FOUNDATION FOR DATA SCIENCE - I 4 4 100
MDS132 PROBABILITY AND DISTRIBUTION THEORY 4 4 100
MDS132L PROBABILITY AND DISTRBUTION THEORY 4 4 100
MDS133 PRINCIPLES OF DATA SCIENCE 4 4 100
MDS133L PRINCIPLE OF DATA SCIENCE 4 4 100
MDS134 RESEARCH METHODOLOGY 2 2 50
MDS134L RESEARCH METHODOLOGY 2 2 50
MDS161A INTRODUCTION TO STATISTICS 2 2 50
MDS161B INTRODUCTION TO COMPUTERS AND PROGRAMMING 2 2 50
MDS161C LINUX ADMINISTRATION 2 2 50
MDS161L PROBLEM SOLVING AND PROGRAMMING CONCEPTS 2 2 50
MDS171 DATA BASE TECHNOLOGIES 6 5 150
MDS171L DATABASE TECHNOLOGY LABORATORY 6 5 50
MDS172 INFERENTIAL STATISTICS 6 5 150
MDS172L INFERENTIAL STATISTICAL LABORATORY 6 5 150
MDS173 PROGRAMMING FOR DATA SCIENCE IN PYTHON 6 4 100
MDS173L PROGRAMMING FOR DATA SCIENCE IN PYTHON 6 4 100
2 Semester - 2019 - Batch
Paper Code
Paper
Hours Per
Week
Credits
Marks
MDS231 MATHEMATICAL FOUNDATION FOR DATA SCIENCE - II 4 4 100
MDS231L MATHEMATICAL FOUNDATION FOR DATA SCIENCE - II 4 4 100
MDS232 REGRESSION ANALYSIS 4 4 100
MDS232L REGRESSION ANALYSIS 4 4 100
MDS233 DESIGN AND ANALYSIS OF ALGORITHMS 4 4 100
MDS233L DESIGN AND ANALYSIS OF ALGORITHMS 4 4 100
MDS234 MACHINE LEARNING 4 4 100
MDS234L MACHINE LEARNING 60 4 100
MDS241A MULTIVARIATE ANALYSIS 4 4 100
MDS241AL MULTIVARIATE ANALYSIS 4 4 100
MDS241B STOCHASTIC PROCESS 4 4 100
MDS251L PROGRAMMING FOR DATA SCIENCE IN R 6 4 100
MDS252AL HADOOP 6 5 150
MDS271 PROGRAMMING FOR DATA SCIENCE IN R 6 4 100
MDS272A HADOOP 6 5 150
MDS272B IMAGE AND VIDEO ANALYTICS 6 5 150
MDS272C INTERNET OF THINGS 6 5 150
MDS281 RESEARCH PROBLEM IDENTIFICATION AND DATA COLLECTION 1 0 0
        

          

  

Assesment Pattern

CIA - 50%

ESE - 50%

Examination And Assesments

CIA - 50%

ESE - 50%

Department Overview:
Department of Computer Science of CHRIST (Deemed to be University) strives to shape outstanding computer professionals with ethical and human values to reshape nation?s destiny. The training imparted aims to prepare young minds for the challenging opportunities in the IT industry with a global awareness rooted in the Indian soil, nourished and supported by experts in the field.
Mission Statement:
Vision The Department of Computer Science endeavours to imbibe the vision of the University ?Excellence and Service?. The department is committed to this philosophy which pervades every aspect and functioning of the department. Mission ?To develop IT professionals with ethical and human values?. To accomplish our mission, the department encourages students to apply their acquired knowledge and skills towards professional achievements in their career. The department also moulds the st
Introduction to Program:
Data Science is popular in all academia, business sectors, and research and development to make effective decision in day to day activities. MSc in Data Science is a two year programme with four semesters. This programme aims to provide opportunity to all candidates to master the skill sets specific to data science with research bent. The curriculum supports the students to obtain adequate knowledge in theory of data science with hands on experience in relevant domains and tools. Candidate gains exposure to research models and industry standard applications in data science through guest lectures, seminars, projects, internships, etc.
Program Objective:
Programme Objective ? To acquire in-depth understanding of the theoretical concepts in statistics, data analysis, data mining, machine learning and other advanced data science techniques. ? To gain practical experience in programming tools for data sciences, database systems, machine learning and big data tools. ? To strengthen the analytical and problem solving skill through developing real time applications. ? To empower students with tools and techniques for handling, managing, analyzing and interpreting data. ? To imbibe quality research and develop solutions to the social issues. Programme Specific Outcomes PSO1: Abstract thinking: Ability to understand the abstract concepts that lead to various data science theories in Mathematics, Statistics and Computer science. PSO2: Problem Analysis and Design Ability to identify analyze and design solutions for data science problems using fundamental principles of mathematics, Statistics, computing sciences, and relevant domain disciplines. PSO3: Modern software tool usage: Acquire the skills in handling data science programming tools towards problem solving and solution analysis for domain specific problems. PSO4: Innovation And Entrepreneurship: Produce innovative IT solutions and services based on global needs and trends. PSO5: Societal And Environmental Concern: Utilize the data science theories for societal and environmental concerns. PSO6: Professional Ethics: Understand and commit to professional ethics and

Assesment Pattern

60-40

Examination And Assesments

CIA-1

CIA-2

CIA-3

& OR MSE

Department Overview:
Department of Data Science of Christ (Deemed to be University), Lavasa is started to shape outstanding Data Scientist and Analytics professionals with ethical and human values. The department offers degrees Bachelors of Science, Master of Science in Data Science and Doctor of Philosophy in the areas of Computer Science and Engineering. The department has rich expertise in the term of faculty resource who are well trained in various fields like Data Science, Data Security, Data Analytics, Artificial Intelligence, Machine learning, Computer Vision, Algorithms Design, Computer Networking, Data mining, BIG DATA, text mining, knowledge representation, soft computing, Cloud computing, etc.. The department has wide variety of labs setup namely Machine learning lab, Data Analytics Lab, Open Source lab, etc... Dedicated for the hands-on training of the students for their lab curriculum and research. The department intermittently organize hands-on workshop on recent technology like Machine learning, Cloud Computing, Hadoop etc. for the students to keep them industry ready. The department equip students with holistic education to be better citizens.
Mission Statement:
*Vision Enrich Ethical Scientific Excellence? *Mission 1.To develop Data Science professionals with ethical and social values. 2. Divulge state-of-art knowledge in the area of Data Science and Analytics. 3. Encourages the research and innovation.? 4. Accustoms the students with current industry practices, team work and entrepreneurship.
Introduction to Program:
Data science is an interdisciplinary response to this demand, and in our BSc degree program-me students follow a carefully selected curriculum from Computer Science, Mathematics and Statistics. There are three general steps to becoming a data scientist: Earn a bachelor's degree in IT, computer science, math, physics, or another related field; Earn a master's degree in data or related field; Gain experience in the field you intend to work in (ex: healthcare, physics, business). The best path to becoming a data scientist depends on an individual's background. Many people currently working in data science come from backgrounds in math, statistics, or computer science. MSc Data Science The MSc Data Science will provide students with the technical and practical skills to analyse the big data that is the key to success in future business, digital media and science. The MSc Data Science provides training in data science methods, emphasising statistical perspectives. After the program students will receive a thorough grounding in theory, as well as the technical and practical skills of data science. Students theoretical learning will be at a high mathematical level, while the technical and practical skills students will gain will enable them to apply advanced methods of data science and statistics to investigate real world questions.
Program Objective:
Programme Educational Objectives (PEO) PEO1: Ability to understand, analyze and design solutions with professional competency for the real-world problems. PEO2: Ability to develop software solutions for the requirements, based on critical analysis and research. PEO3: Ability to Function effectively in a team and as an individual in a multidisciplinary / multicultural environment. PEO4: To provide a learning environment that fosters scientific excellence and promote lifelong learning with understanding of professional responsibilities and obligations to clients and public. *Programme Specific Outcomes PSO1: Abstract thinking: Ability to understand the abstract concepts that lead to various data science theories in Mathematics, Statistics and Computer science. PSO2: Problem Analysis and Design Ability to identify analyze and design solutions for data science problems using fundamental principles of mathematics, Statistics, computing sciences, and relevant domain disciplines. PSO3: Modern software tool usage: Acquire the skills in handling data science programming tools towards problem solving and solution analysis for domain specific problems. PSO4: Innovation and Entrepreneurship: Produce innovative IT solutions and services based on global needs and trends. PSO5: Societal and Environmental Concern: Utilize the data science theories for societal and environmental concerns. PSO6: Professional Ethics: Understand and commit to professional ethics and cyber regulatio

MDS131 - MATHEMATICAL FOUNDATION FOR DATA SCIENCE - I (2019 Batch)

Total Teaching Hours for Semester:60
No of Lecture Hours/Week:4
Max Marks:100
Credits:4

Course Objectives/Course Description

 

Linear Algebra plays a fundamental role in the theory of Data Science. This course aims at introducing the basic notions of vector spaces, Linear Algebra and the use of Linear Algebra in applications to Data Science.

Learning Outcome

CO1: Understand the properties of Vector spaces

CO2: Use the properties of Linear Maps in solving problems on Linear Algebra

CO3: Demonstrate proficiency on the topics Eigenvalues, Eigenvectors and Inner Product Spaces

CO4: Apply mathematics for some applications in Data Science.

Unit-1
Teaching Hours:15
INTRODUCTION TO VECTOR SPACES
 

Vector Spaces: Rn and Cn, lists, Fnand digression on Fields, Definition of Vector spaces, Subspaces, sums of Subspaces, Direct Sums, Span and Linear Independence, bases, dimension.

Unit-2
Teaching Hours:20
LINEAR MAPS
 

Definition of Linear Maps - Algebraic Operations on  - Null spaces and Injectivity - Range and Surjectivity - Fundamental Theorems of Linear Maps - Representing a Linear Map by a Matrix - Invertible Linear Maps - Isomorphic Vector spaces - Linear Map as Matrix Multiplication - Operators - Products of Vector Spaces - Product of Direct Sum - Quotients of Vector spaces.

Unit-3
Teaching Hours:10
EIGENVALUES, EIGENVECTORS, AND INNER PRODUCT SPACES
 

Eigenvalues and Eigenvectors - Eigenvectors and Upper Triangular matrices - Eigenspaces and Diagonal Matrices - Inner Products and Norms - Linear functionals on Inner Product spaces.

Unit-4
Teaching Hours:15
MATHEMATICS APPLIED TO DATA SCIENCE
 

Singular value decomposition - Handwritten digits and simple algorithm - Classification of handwritten digits using SVD bases - Tangent distance - Text Mining.

Text Books And Reference Books:

[1] S. Axler, Linear algebra done right, Springer, 2017.

[2] Eldén Lars, Matrix methods in data mining and pattern recognition, Society for Industrial and Applied Mathematics, 2007.

Essential Reading / Recommended Reading

[1] E. Davis, Linear algebra and probability for computer science applications, CRC Press, 2012.

[2] J. V. Kepner and J. R. Gilbert, Graph algorithms in the language of linear algebra, Society for Industrial and Applied Mathematics, 2011.

[3] D. A. Simovici, Linear algebra tools for data mining, World Scientific Publishing, 2012.

[4] P. N. Klein, Coding the matrix: linear algebra through applications to computer science, Newtonian Press, 2015.

Evaluation Pattern

CIA - 50%

ESE - 50%

MDS131L - MATHEMATICAL FOUNDATION FOR DATA SCIENCE - I (2019 Batch)

Total Teaching Hours for Semester:60
No of Lecture Hours/Week:4
Max Marks:100
Credits:4

Course Objectives/Course Description

 

Course Description

The course provides comprehensive understanding of vector spaces and the use of linear algebra for Data Science applications.

 

Course Objectives

Linear Algebra plays a fundamental role in the theory of Data Science. This course aims at introducing the basic notions of vector spaces, Linear Algebra and the use of Linear Algebra in applications to Data Science.

 

Learning Outcome

CO1: Understand the properties of Vector spaces

CO2: Use the properties of Linear Maps in solving problems on Linear Algebra

CO3: Demonstrate proficiency on the topics Eigenvalues, Eigenvectors and Inner Product Spaces

CO4: Apply mathematics for some applications in Data Science

Unit-1
Teaching Hours:15
INTRODUCTION TO VECTOR SPACES
 

Vector Spaces: Rn and Cn, lists, Fnand digression on Fields, Definition of Vector spaces, Subspaces, sums of Subspaces, Direct Sums, Span and Linear Independence, bases, dimension.

 

Unit-2
Teaching Hours:20
LINEAR MAPS
 

Definition of  Linear Maps  - Algebraic  Operationson L(V,W)  - Null spaces and Injectivity - Range and Surjectivity - Fundamental Theorems of Linear  Maps  -  Representing  a  Linear MapbyaMatrix-InvertibleLinearMaps-IsomorphicVectorspaces-LinearMapasMatrix Multiplication-Operators-ProductsofVectorSpaces-ProductofDirectSum-Quotientsof Vector spaces.

Unit-3
Teaching Hours:10
EIGENVALUES, EIGENVECTORS, AND INNER PRODUCT SPACES
 

Eigenvalues and Eigenvectors - Eigenvectors and Upper Triangular  matrices -  Eigenspaces  and Diagonal Matrices - Inner Products and Norms - Linear functionals on Inner Product spaces.

Unit-4
Teaching Hours:15
MATHEMATICS APPLIED TO DATA SCIENCE
 

Singular value decomposition - Handwritten digits and simple algorithm - Classification of handwritten digits using SVD bases - Tangent distance - Text Mining.

Text Books And Reference Books:

[1]  S. Axler, Linear algebra done right, Springer,2017.

[2]  EldénLars,Matrixmethodsindataminingandpatternrecognition,SocietyforIndustrial and Applied Mathematics,2007.

Essential Reading / Recommended Reading

[1]  E. Davis, Linear algebra and probability for computer science applications, CRC Press, 2012.

[2]   J. V. Kepner and J. R. Gilbert, Graph algorithms in the language of linear algebra, Society for Industrial and Applied Mathematics,2011.

[3]  D.A.Simovici,Linearalgebratoolsfordatamining,WorldScientificPublishing,2012.

[4]  P.N.Klein,Codingthematrix:linearalgebrathroughapplicationstocomputerscience, Newtonian Press,2015.

Evaluation Pattern

CIA I - A

CIA -I B

CIA II

CIA III

Attendance

ESE

5%

5%

25%

10%

5%

50%

MDS132 - PROBABILITY AND DISTRIBUTION THEORY (2019 Batch)

Total Teaching Hours for Semester:60
No of Lecture Hours/Week:4
Max Marks:100
Credits:4

Course Objectives/Course Description

 

To enable the students to understand the properties and applications of various probability functions.

Learning Outcome

CO1: Demonstrate the random variables and its functions

CO2: Infer the expectations for random variable functions and generating functions.

CO3: Demonstrate various discrete and continuous distributions and their usage

Unit-1
Teaching Hours:10
ALGEBRA OF PROBABILITY
 

Algebra of sets - fields and sigma - fields, Inverse function -Measurable function – Probability measure on a sigma field – simple properties - Probability space - Random variables and Random vectors – Induced Probability space – Distribution functions –Decomposition of distribution functions.

Unit-2
Teaching Hours:10
EXPECTATION AND MOMENTS OF RANDOM VARIABLES
 

Definitions and simple properties - Moment inequalities – Holder, Jenson Inequalities – Characteristic function – definition and properties – Inversion formula. Convergence of a sequence of random variables - convergence in distribution - convergence in probability almost sure convergence and convergence in quadratic mean - Weak and Complete convergence of distribution functions – Helly - Bray theorem

Unit-3
Teaching Hours:10
LAW OF LARGE NUMBERS
 

Khintchin's weak law of large numbers, Kolmogorov strong law of large numbers (statement only) – Central Limit Theorem – Lindeberg – Levy theorem, Linderberg – Feller theorem (statement only), Liapounov theorem – Relation between Liapounov and Linderberg –Feller forms – Radon Nikodym theorem and derivative (without proof) – Conditional expectation – definition and simple properties.

Unit-4
Teaching Hours:10
DISTRIBUTION THEORY
 

Distribution of functions of random variables – Laplace, Cauchy, Inverse Gaussian, Lognormal, Logarithmic series and Power series distributions - Multinomial distribution - Bivariate Binomial – Bivariate Poisson – Bivariate Normal - Bivariate Exponential of Marshall and Olkin - Compound, truncated and mixture of distributions, Concept of convolution - Multivariate normal distribution (Definition and Concept only)

Unit-5
Teaching Hours:10
SAMPLING DISTRIBUTION
 

Sampling distributions: Non - central chi - square, t and F distributions and their properties - Distributions of quadratic forms under normality -independence of quadratic form and a linear form - Cochran’s theorem.

Unit-6
Teaching Hours:10
ORDER STATISTICS
 

Order statistics, their distributions and properties - Joint and marginal distributions of order statistics - Distribution of range and mid range -Extreme values and their asymptotic distributions (concepts only) - Empirical distribution function and its properties – Kolmogorov - Smirnov distributions – Life time distributions -Exponential and Weibull distributions - Mills ratio – Distributions classified by hazard rate

Text Books And Reference Books:

[1]. Modern Probability Theory, B.R Bhat, New Age International, 4th Edition, 2014.  

[2]. An Introduction to Probability and Statistics, V.K Rohatgi and Saleh, 3rd Edition, 2015.

Essential Reading / Recommended Reading

]1]. Introduction to the theory of statistics, A.M Mood, F.A Graybill and D.C Boes, Tata McGraw-Hill, 3rd Edition (Reprint), 2017.

[2]. Order Statistics, H.A David and H.N Nagaraja, John Wiley & Sons, 3rd Edition, 2003.

Evaluation Pattern

CIA - 50%

ESE - 50%

MDS132L - PROBABILITY AND DISTRBUTION THEORY (2019 Batch)

Total Teaching Hours for Semester:60
No of Lecture Hours/Week:4
Max Marks:100
Credits:4

Course Objectives/Course Description

 

To enable the students to understand the properties and applications of various probability functions.

Learning Outcome

CO1: Demonstrate the random variables and its functions CO2: Infer the expectations for random variable functions and generating functions. CO3: Demonstrate various discrete and continuous distributions and their usage

Unit-1
Teaching Hours:10
ALGEBRA OF PROBABILITY
 

 Algebra of sets - fields and sigma - fields, Inverse function -Measurable function – Probability measure on a sigma field – simple properties - Probability space - Random variables and Random vectors – Induced Probability space – Distribution functions – Decomposition of distribution functions.

Unit-2
Teaching Hours:10
EXPECTATION AND MOMENTS OF RANDOM VARIABLES
 

 Definitions and simple properties - Moment inequalities – Holder, Jenson Inequalities – Characteristic function – definition and properties – Inversion formula. Convergence of a sequence of random variables - convergence in distribution - convergence in probability almost sure convergence and convergence in quadratic mean - Weak and Complete convergence of distribution functions – Helly - Bray theorem.
 

Unit-3
Teaching Hours:10
LAW OF LARGE NUMBERS
 

  Khintchin's weak law of large numbers, Kolmogorov strong law of large numbers (statement only) – Central Limit Theorem – Lindeberg – Levy theorem, Linderberg – Feller theorem (statement only), Liapounov theorem – Relation between Liapounov and Linderberg –Feller forms – Radon Nikodym theorem and derivative (without proof) – Conditional expectation – definition and simple properties.

Unit-4
Teaching Hours:10
DISTRIBUTION THEORY
 

Distribution of functions of random variables – Laplace, Cauchy, Inverse Gaussian, Lognormal, Logarithmic series and Power series distributions - Multinomial distribution - Bivariate Binomial – Bivariate Poisson – Bivariate Normal - Bivariate Exponential of Marshall and Olkin - Compound, truncated and mixture of distributions, Concept of convolution - Multivariate normal distribution (Definition and Concept only)
 

Unit-5
Teaching Hours:10
SAMPLING DISTRIBUTION
 

 Sampling distributions: Non - central chi - square, t and F distributions and their properties - Distributions of quadratic forms under normality -independence of quadratic form and a linear form - Cochran’s theorem

Unit-6
Teaching Hours:10
ORDER STATISTICS
 

 Order statistics, their distributions and properties - Joint and marginal distributions of order statistics - Distribution of range and mid range -Extreme values and their asymptotic distributions (concepts only) - Empirical distribution function and its properties – Kolmogorov - Smirnov distributions – Life time distributions -Exponential and Weibull distributions - Mills ratio – Distributions classified by hazard rate

Text Books And Reference Books:

 Modern Probability Theory, B.R Bhat, New Age International, 4th Edition, 2014.   2. An Introduction to Probability and Statistics, V.K Rohatgi and Saleh, 3rd Edition, 2015.
 

Essential Reading / Recommended Reading

 Introduction to the theory of statistics, A.M Mood, F.A Graybill and D.C Boes, Tata McGraw-Hill, 3rd Edition (Reprint), 2017.

Order Statistics, H.A David and H.N Nagaraja, John Wiley & Sons, 3rd Edition, 2003.
 
 

Evaluation Pattern

CIA-1

MSE

CIA-2

ESE

MDS133 - PRINCIPLES OF DATA SCIENCE (2019 Batch)

Total Teaching Hours for Semester:60
No of Lecture Hours/Week:4
Max Marks:100
Credits:4

Course Objectives/Course Description

 

To provide strong foundation for data science and application area related to it and understand the underlying core concepts and emerging technologies in data science.

Learning Outcome

CO1: Understand the fundamental concepts of data science

CO2: Evaluate the data analysis techniques for applications handling large data

CO3: Demonstrate the various machine learning algorithms used in data science process

CO4: Understand the ethical practices of data science 

CO4:Visualize and present the inference using various tools

CO5:Learn to think through the ethics surrounding privacy, data sharing and algorithmic decision-making

Unit-1
Teaching Hours:10
INTRODUCTION TO DATA SCIENCE
 

Definition – Big Data and Data Science Hype – Why data science – Getting Past the Hype – The Current Landscape – Data Scientist - Data Science Process Overview – Defining goals – Retrieving data – Data preparation – Data exploration – Data modeling – Presentation.

Unit-2
Teaching Hours:10
BIG DATA
 

Problems when handling large data – General techniques for handling large data – Case study – Steps in big data – Distributing data storage and processing with Frameworks – Case study.

Unit-3
Teaching Hours:10
MACHINE LEARNING
 

Machine learning – Modeling Process – Training model – Validating model – Predicting new observations –Supervised learning algorithms – Unsupervised learning algorithms.

Unit-4
Teaching Hours:10
DEEP LEARNING
 

Introduction – Deep Feedforward Networks – Regularization – Optimization of Deep Learning – Convolutional Networks – Recurrent and Recursive Nets – Applications of Deep Learning.

Unit-5
Teaching Hours:10
DATA VISUALIZATION
 

Introduction to data visualization – Data visualization options – Filters – MapReduce – Dashboard development tools – Creating an interactive dashboard with dc.js-summary.

Unit-6
Teaching Hours:10
ETHICS AND RECENT TRENDS
 

Data Science Ethics – Doing good data science – Owners of the data - Valuing different aspects of privacy - Getting informed consent - The Five Cs – Diversity – Inclusion – Future Trends.

Text Books And Reference Books:

[1]. Introducing Data Science, Davy Cielen, Arno D. B. Meysman, Mohamed Ali, Manning Publications Co., 1st edition, 2016

[2]. An Introduction to Statistical Learning: with Applications in R, Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani, Springer, 1st edition, 2013

[3]. Deep Learning, Ian Goodfellow, Yoshua Bengio, Aaron Courville, MIT Press, 1st edition, 2016

[4]. Ethics and Data Science, D J Patil, Hilary Mason, Mike Loukides, O’ Reilly, 1st edition, 2018

Essential Reading / Recommended Reading

[1]. Data Science from Scratch: First Principles with Python, Joel Grus, O’Reilly, 1st edition, 2015

[2]. Doing Data Science, Straight Talk from the Frontline, Cathy O'Neil, Rachel Schutt, O’ Reilly, 1st edition, 2013

[3]. Mining of Massive Datasets, Jure Leskovec, Anand Rajaraman, Jeffrey David Ullman, Cambridge University Press, 2nd edition, 2014

Evaluation Pattern

CIA - 50%

ESE - 50%

MDS133L - PRINCIPLE OF DATA SCIENCE (2019 Batch)

Total Teaching Hours for Semester:60
No of Lecture Hours/Week:4
Max Marks:100
Credits:4

Course Objectives/Course Description

 

To provide strong foundation for data science and application area related to it and understand the underlying core concepts and emerging technologies in data science.
 

Learning Outcome

CO1:Explore the fundamental concepts of data science

CO2:Understand data analysis techniques for applications handling large data

CO3:Understand various machine learning algorithms used in data science process

CO4:Visualize and present the inference using various tools  CO5:Learn to think through the ethics surrounding privacy, data sharing and algorithmic decision-making

Unit-1
Teaching Hours:10
INTRODUCTION TO DATA SCIENCE
 

Definition – Big Data and Data Science Hype – Why data science – Getting Past the Hype – The Current Landscape – Who is Data Scientist? - Data Science Process Overview – Defining goals – Retrieving data – Data preparation – Data exploration – Data modeling – Presentation

Unit-2
Teaching Hours:10
BIG DATA
 

Problems when handling large data – General techniques for handling large data – Case study – Steps in big data – Distributing data storage and processing with Frameworks – Case study.

Unit-3
Teaching Hours:10
MACHINE LEARNING
 

Machine learning – Modeling Process – Training model – Validating model – Predicting new observations –Supervised learning algorithms – Unsupervised learning algorithms.

Unit-4
Teaching Hours:10
DEEP LEARNING
 

Introduction – Deep Feedforward Networks – Regularization – Optimization of Deep Learning – Convolutional Networks – Recurrent and Recursive Nets – Applications of Deep Learning.

Unit-5
Teaching Hours:10
DATA VISUALIZATION
 

Introduction to data visualization – Data visualization options – Filters – MapReduce – Dashboard development tools – Creating an interactive dashboard with dc.js-summary.

Unit-6
Teaching Hours:10
ETHICS AND RECENT TRENDS
 

Data Science Ethics – Doing good data science – Owners of the data - Valuing different aspects of privacy - Getting informed consent - The Five Cs – Diversity – Inclusion – Future Trends.

Text Books And Reference Books:

[1]. Introducing Data Science, Davy Cielen, Arno D. B. Meysman, Mohamed Ali, Manning Publications Co., 1st edition, 2016 

[2]. An Introduction to Statistical Learning: with Applications in R, Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani, Springer, 1st edition, 2013

[3]. Deep Learning, Ian Goodfellow, Yoshua Bengio, Aaron Courville, MIT Press, 1st edition, 2016

[4]. Ethics and Data Science, D J Patil, Hilary Mason, Mike Loukides, O’ Reilly, 1st edition, 2018

Essential Reading / Recommended Reading

[1]. Data Science from Scratch: First Principles with Python, Joel Grus, O’Reilly, 1st edition, 2015

[2]. Doing Data Science, Straight Talk from the Frontline, Cathy O'Neil, Rachel Schutt, O’ Reilly, 1st edition, 2013

[3]. Mining of Massive Datasets, Jure Leskovec, Anand Rajaraman, Jeffrey David Ullman, Cambridge University Press, 2nd edition, 2014
 
 

Evaluation Pattern

60-40

MDS134 - RESEARCH METHODOLOGY (2019 Batch)

Total Teaching Hours for Semester:30
No of Lecture Hours/Week:2
Max Marks:50
Credits:2

Course Objectives/Course Description

 

This course is intended to assist students in planning and carrying out research. The students are exposed to the principles, procedures and techniques of implementing a research project. The course starts with an introduction to research and leads through the various methodologies involved in the research process. It focus on finding out the research gap from the literature using computer technology,introduces basic statistics required for research and report the research outcomes scientifically with emphasis on research ethics.

Learning Outcome

CO1: Understand the essense of research and the necessity of defining a research problem.

CO2: Apply research methods and methodology including research design, data analysis, and interpretation.

CO3: Create scientific reports according to specified standards.

Unit-1
Teaching Hours:8
RESEARCH METHODOLOGY
 

Defining research problem:Selecting the problem, Necessity of defining the problem ,Techniques involved in defining a problem- Ethics in Research.

Unit-2
Teaching Hours:8
RESEARCH DESIGN
 

Principles of experimental design,Working with Literature: Importance, finding literature, Using your resources, Managing the literature, Keep track of references,Using the literature, Literature review,On-line Searching: Database ,SCIFinder, Scopus, Science Direct ,Searching research articles , Citation Index ,Impact Factor ,H-index.

Unit-3
Teaching Hours:7
RESEARCH DATA
 

Measurement of Scaling: Quantitative, Qualitative, Classification of Measure scales, Data Collection, Data Preparation. 

Unit-4
Teaching Hours:7
REPORT WRITING
 

Scientific Writing and Report Writing: Significance, Steps, Layout, Types, Mechanics and Precautions, Latex: Introduction, Text, Tables, Figures, Equations, Citations, Referencing, and Templates (IEEE style), Paper writing for international journals, Writing scientific report. 

Text Books And Reference Books:

[1] C. R. Kothari, Research Methodology Methods and Techniques, 3rd. ed. New Delhi: New Age International Publishers, Reprint 2014.

[2] Zina O’Leary, The Essential Guide of Doing Research, New Delhi: PHI, 2005. 

Essential Reading / Recommended Reading

[1] J. W. Creswell, Research Design: Qualitative, Quantitative, and Mixed Methods Approaches, 4thed. SAGE Publications, 2014.

[2] Kumar, Research Methodology: A Step by Step Guide for Beginners, 3rd. ed. Indian: PE, 2010. 

Evaluation Pattern

CIA - 50%

ESE - 50%

MDS134L - RESEARCH METHODOLOGY (2019 Batch)

Total Teaching Hours for Semester:30
No of Lecture Hours/Week:2
Max Marks:50
Credits:2

Course Objectives/Course Description

 

The research methodology module is intended to assist students in planning and carrying out research projects.

The students are exposed to the principles, procedures and techniques of implementing a research project.

The course starts with an introduction to research and carries through the various methodologies involved.

It continues with finding out the literature using computer technology, basic statistics required for research and ends with linear regression.

Learning Outcome

CO1: Define research and describe the research process and research methods

CO2: Understand and apply basic research methods including research design, data analysis, and interpretation

Unit-1
Teaching Hours:8
RESEARCH METHODOLOGY
 

Defining research problem

- selecting the problem

- necessity of defining the problem

- techniques involved in defining a problem

- Ethics in Research.

Unit-2
Teaching Hours:8
RESEARCH DESIGN
 

Principles of experimental design

Working with Literature: Importance, finding literature, using your resources, managing the literature, keep track of references, using the literature, literature review.

On-line Searching: Database – SCIFinder – Scopus - Science Direct - Searching research articles - Citation Index - Impact Factor - H-index etc.

Unit-3
Teaching Hours:7
RESEARCH DATA
 

Measurement of Scaling: Quantitative, Qualitative, Classification of Measure scales, Data Collection, Data Preparation.

Unit-4
Teaching Hours:7
REPORT WRITING
 

Scientific Writing and Report Writing:

Significance, Steps, Layout, Types, Mechanics and Precautions, Latex: Introduction, text, tables, figures, equations, citations, referencing, and

templates (IEEE style), paper writing for international journals, Writing scientific report.

Text Books And Reference Books:

[1] C. R. Kothari, Research Methodology Methods and Techniques, 3rd. ed. New Delhi: New Age International Publishers, Reprint 2014.

[2] Zina O’Leary, The Essential Guide of Doing Research, New Delhi: PHI, 2005.

Essential Reading / Recommended Reading

[1] J. W. Creswell, Research Design: Qualitative, Quantitative, and Mixed Methods Approaches, 4thed. SAGE Publications, 2014.

[2] Kumar, Research Methodology: A Step by Step Guide for Beginners, 3rd. ed. Indian: PE, 2010.

Evaluation Pattern

CIA-1 

Evaluated out of = 20

Marks Converted to = 10

CIA-2 

Evaluated out of = 50

Marks Converted to = 25

CIA-3 

Evaluated out of = 20

Marks Converted to = 10

 

Total CIA marks after conversion = 45

Attendance Marks = 5

ESE final Marks = 50

 

 

 

MDS161A - INTRODUCTION TO STATISTICS (2019 Batch)

Total Teaching Hours for Semester:30
No of Lecture Hours/Week:2
Max Marks:50
Credits:2

Course Objectives/Course Description

 

To enable the students to understand the fundamentals of statistics to apply descriptive measures and probability for data analysis.

Learning Outcome

CO1: Demonstrate the history of statistics and present the data in various forms.

CO2: Infer the concept of correlation and regression for relating two or more related variables.

CO3: Demonstrate the probabilities for various events.

Unit-1
Teaching Hours:8
ORGANIZATION AND PRESENTATION OF DATA
 

Origin and development of Statistics, Scope, limitation and misuse of statistics. Types of data: primary, secondary, quantitative and qualitative data. Types of Measurements: nominal, ordinal, discrete and continuous data. Presentation of data by tables: construction of frequency distributions for discrete and continuous data, graphical representation of a frequency distribution by histogram and frequency polygon, cumulative frequency distributions

Unit-2
Teaching Hours:8
DESCRIPTIVE STATISTICS
 

Measures of location or central tendency: Arthimetic mean, Median, Mode, Geometric mean, Harmonic mean. Partition values: Quartiles, Deciles and percentiles. Measures of dispersion: Mean deviation, Quartile deviation, Standard deviation, Coefficient of variation. Moments: measures of skewness, Kurtosis.

Unit-3
Teaching Hours:7
CORRELATION AND REGRESSION
 

Correlation: Scatter plot, Karl Pearson coefficient of correlation, Spearman's rank correlation coefficient, multiple and partial correlations (for 3 variates only). Regression: Concept of errors, Principles of Least Square, Simple linear regression and its properties.

Unit-4
Teaching Hours:7
BASICS OF PROBABILITY
 

Random experiment, sample point and sample space, event, algebra of events. Definition of Probability: classical, empirical and axiomatic approaches to probability, properties of probability. Theorems on probability, conditional probability and independent events, Laws of total probability, Baye’s theorem and its applications

Text Books And Reference Books:

[1]. Rohatgi V.K and Saleh E, An Introduction to Probability and Statistics, 3rd edition, John Wiley & Sons Inc., New Jersey, 2015.

[2]. Gupta S.C and Kapoor V.K, Fundamentals of Mathematical Statistics, 11th edition, Sultan Chand & Sons, New Delhi, 2014.

Essential Reading / Recommended Reading

[1]. Mukhopadhyay P, Mathematical Statistics, Books and Allied (P) Ltd, Kolkata, 2015.

[2]. Walpole R.E, Myers R.H, and Myers S.L, Probability and Statistics for Engineers and Scientists, Pearson, New Delhi, 2017.

[3]. Montgomery D.C and Runger G.C, Applied Statistics and Probability for Engineers, Wiley India, New Delhi, 2013.

[4]. Mood A.M, Graybill F.A and Boes D.C, Introduction to the Theory of Statistics, McGraw Hill, New Delhi, 2008.

Evaluation Pattern

CIA - 50%

ESE - 50%

MDS161B - INTRODUCTION TO COMPUTERS AND PROGRAMMING (2019 Batch)

Total Teaching Hours for Semester:30
No of Lecture Hours/Week:2
Max Marks:50
Credits:2

Course Objectives/Course Description

 

To enable the students to understand the fundamental concepts of problem solving and programming structures.  

Learning Outcome

CO1: Demonstrate the systematic approach for problem solving using computers.

CO2: Apply different programming structure with suitable logic for computational problems.  

Unit-1
Teaching Hours:10
COMPUTERS AND DIGITAL BASICS
 

Number Representation – Decimal, Binary, Octal, Hexadecimal and BCD numbers – Binary Arithmetic – Binary addition – Unsigned and Signed numbers – one’s and two’s complements of Binary numbers – Arithmetic operations with signed numbers - Number system conversions – Boolean Algebra – Logic gates – Design of Circuits – K - Map

Unit-2
Teaching Hours:5
GENERAL PROBLEM SOLVING CONCEPTS
 

Types of Problems – Problem solving with Computers – Difficulties with problem solving – problem solving concepts for the Computer – Constants and Variables – Rules for Naming and using variables – Data types – numeric data – character data – logical data – rules for data types – examples of data types – storing the data in computer - Functions – Operators – Expressions and Equations

Unit-3
Teaching Hours:5
PLANNING FOR SOLUTION
 

Communicating with computer – organizing the solution – Analyzing the problem – developing the interactivity chart – developing the IPO chart – Writing the algorithms – drawing the flow charts – pseudocode – internal and external documentation – testing the solution – coding the solution – software development life cycle.

Unit-4
Teaching Hours:10
PROBLEM SOLVING
 

Introduction to programming structure – pointers for structuring a solution – modules and their functions – cohesion and coupling – problem solving with logic structure. Problem solving with decisions – the decision logic structure – straight through logic – positive logic – negative logic – logic conversion – decision tables – case logic structure -  examples.

Text Books And Reference Books:

[1] Thomas L.Floyd and R.P.Jain,“Digital Fundamentals”,8th Edition, Pearson Education,2007.

[2] Peter Norton “Introduction to Computers”,6th Edition, Tata Mc Graw Hill, New Delhi,2006.

[3] Maureen Sprankle and Jim Hubbard, Problem solving and programming concepts, PHI, 9th Edition, 2012

Essential Reading / Recommended Reading

[1].  E Balagurusamy, Fundamentals of Computers, TMH, 2011

Evaluation Pattern

CIA - 50%

ESE - 50%

MDS161C - LINUX ADMINISTRATION (2019 Batch)

Total Teaching Hours for Semester:30
No of Lecture Hours/Week:2
Max Marks:50
Credits:2

Course Objectives/Course Description

 

To Enable the students to excel in the Linux Platform

Learning Outcome

CO1: Demostrate the systematic approach for configure the Liux environment

CO2: Manage the Linux environment to work with open source data science tools

Unit-1
Teaching Hours:10
Unit I
 

RHEL7.5,breaking root password, Understand and use essential tools for handling files, directories, command-line environments, and documentation - Configure local storage using partitions and logical volumes

Unit-2
Teaching Hours:10
UNIT II
 

Swapping, Extend LVM Partitions,LVM Snapshot - Manage users and groups, including use of a centralized directory for authentication

Unit-3
Teaching Hours:10
UNIT - III
 

Kernel updations,yum and nmcli configuration, Scheduling jobs,at,crontab -  Configure firewall settings using firewall config, firewall-cmd, or iptables , Configure key-based authentication for SSH ,Set enforcing and permissive modes for SELinux , List and identify SELinux file and process context ,Restore default file contexts

Text Books And Reference Books:

https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/

Essential Reading / Recommended Reading

https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/

Evaluation Pattern

CIA - 50%

ESE - 50%

MDS161L - PROBLEM SOLVING AND PROGRAMMING CONCEPTS (2019 Batch)

Total Teaching Hours for Semester:30
No of Lecture Hours/Week:2
Max Marks:50
Credits:2

Course Objectives/Course Description

 

To enable the students to understand the fundamental concepts of problem solving and programming structures.  

Learning Outcome

CO1: Demonstrate the systematic approach for problem solving using computers.  CO2: Apply different programming structure with suitable logic for computational problems.  
 

Unit-1
Teaching Hours:8
GENERAL PROBLEM SOLVING CONCEPTS
 

Types of Problems – Problem solving with Computers – Difficulties with problem solving – problem solving concepts for the Computer – Constants and Variables – Rules for Naming and using variables – Data types – numeric data – character data – logical data – rules for data types – examples of data types – storing the data in computer - Functions – Operators – Expressions and Equations

Unit-2
Teaching Hours:8
PLANNING FOR SOLUTION
 

Communicating with computer – organizing the solution – Analyzing the problem – developing the interactivity chart – developing the IPO chart – Writing the algorithms – drawing the flow charts – pseudocode – internal and external documentation – testing the solution – coding the solution – software development life cycle.

Unit-3
Teaching Hours:7
PROBLEM SOLVING - I
 

Introduction to programming structure – pointers for structuring a solution – modules and their functions – cohesion and coupling – problem solving with logic structure.
 

Unit-4
Teaching Hours:7
PROBLEM SOLVING - II
 


 
Problem solving with decisions – the decision logic structure – straight through logic – positive logic – negative logic – logic conversion – decision tables – case logic structure -  examples.  

Text Books And Reference Books:

[1]. Maureen Sprankle and Jim Hubbard, Problem solving and programming concepts, PHI, 9th Edition, 2012 
 

Essential Reading / Recommended Reading

 E Balagurusamy, Fundamentals of Computers, TMH, 2011

Evaluation Pattern

CIA-1

CIA-2

CIA-3

MDS171 - DATA BASE TECHNOLOGIES (2019 Batch)

Total Teaching Hours for Semester:90
No of Lecture Hours/Week:6
Max Marks:150
Credits:5

Course Objectives/Course Description

 

The main objective of this course is to fundamental knowledge and practical experience with, database concepts. It includes the concepts and terminologies which facilitate the construction of database tables and write effective queries. Also, to Comprehend Data warehouse and its functions.

Learning Outcome

CO1: Design conceptual models of a database using ER modeling

CO2: Create and populate a RDBMS for a real life application, with constraints and keys, using SQL

CO3: Retrieve any type of information from a data base by formulating complex queries in SQL

CO4: Demonstrate various databases

CO5: Distinguish database from data warehouse and examine ETL process

Unit-1
Teaching Hours:16
INTRODUCTION
 

Concept & Overview of DBMS, Data Models, Database Languages, Database Administrator, Database Users, Three Schema architecture of DBMS. Basic concepts, Design Issues, Mapping Constraints, Keys, Entity-Relationship Diagram, Weak Entity Sets, Extended E-R features

Lab Exercises

1.      Data Definition,

2.      Table Creation

3.      Specification of Constraints

Unit-2
Teaching Hours:16
RELATIONAL MODEL AND DATABASE DESIGN
 

SQL and Integrity Constraints, Concept of DDL, DML, DCL. Basic Structure, Set operations, Aggregate Functions, Null Values, Domain Constraints, Referential Integrity Constraints, assertions, views, Nested Subqueries, Functional Dependency, Different anomalies in designing a Database, Normalization : using functional dependencies, Boyce-Codd Normal Form, 4NF, 5NF

Lab Exercises

1.   Insert, Select, Update & Delete Commands

2.   Nested Queries & Join Queries

3.  Views

Unit-3
Teaching Hours:10
INTELLIGENT DATABASES
 

Active databases, Deductive Databases, Knowledge bases, Multimedia Databases, Multidimensional Data Structures, Image Databases, Text/Document Databases, Video Databases, Audio Databases, Multimedia Database Design.

Unit-4
Teaching Hours:16
DATA WAREHOUSE: THE BUILDING BLOCKS
 

Defining Features, Data Warehouses and Data Marts, Architectural Types, Overview of the Components, Metadata in the Data warehouse, Data Design and Data Preparation: Principles of Dimensional Modeling, Dimensional Modeling Advanced Topics From Requirements To Data Design, The Star Schema, Star Schema Keys, Advantages of the Star Schema, Star Schema: Examples, Dimensional Modeling: Advanced Topics, Updates to the Dimension Tables, Miscellaneous Dimensions, The Snowflake Schema, Aggregate Fact Tables, Families Oo Stars

Unit-5
Teaching Hours:16
REQUIREMENTS, REALITIES, ARCHITECTURE AND DATA FLOW
 

Requirements, ETL Data Structures, Extracting, Cleaning and Conforming, Delivering Dimension Tables, Delivering Fact Tables (CH:1,2,3,4,5,6)

Lab Exercises:

1.      Importing source data structures

2.      Design Target Data Structures

3.      Create target structure

4.      Design and build the ETL mapping

Unit-6
Teaching Hours:16
IMPLEMENTATION, OPERATIONS AND ETL SYSTEMS:
 

Development, Operations, Metadata, Real-Time ETL Systems. (CH:7,8,9,11)

 

Lab Exercises:

1.      Perform the ETL process and transform into data map

2.      Create the cube and process it

3.      Generating Reports

4.      Creating the Pivot table and pivot chart using some existing data

Text Books And Reference Books:

[1]. Henry F. Korth and Silberschatz Abraham, “Database System Concepts”, Mc.Graw Hill.

[2]. Thomas Cannolly and Carolyn Begg, “Database Systems, A Practical Approach to Design, Implementation and Management”, Third Edition, Pearson Education, 2007.

[3]. The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling, 2nd John Wiley & Sons, Inc. New York, USA, 2002

Essential Reading / Recommended Reading

[1] LiorRokach and OdedMaimon, Data Mining and Knowledge Discovery Handbook, Springer, 2nd edition, 2010.

Evaluation Pattern

CIA - 50%

ESE - 50%

MDS171L - DATABASE TECHNOLOGY LABORATORY (2019 Batch)

Total Teaching Hours for Semester:90
No of Lecture Hours/Week:6
Max Marks:50
Credits:5

Course Objectives/Course Description

 

 

The main objective of this course is to fundamental knowledge and practical experience with, database concepts. It includes the concepts and terminologies which facilitate the construction of database tables and write effective queries. Also, to Comprehend Data warehouse and its functions.

Learning Outcome

 

CO1: Design conceptual models of a database using ER modeling

CO2: Create and populate a RDBMS for a real life application, with constraints and keys, using SQL

CO3: Retrieve any type of information from a data base by formulating complex queries in SQL

CO4: Demonstrate various databases

CO5: Distinguish database from data warehouse and examine ETL process

Unit-1
Teaching Hours:14
INTRODUCTION
 

Concept & Overview of DBMS, Data Models, Database Languages, Database Administrator, Database Users, Three Schema architecture of DBMS. Basic concepts, Design Issues, Mapping Constraints, Keys, Entity-Relationship Diagram, Weak Entity Sets, Extended E-R features

Unit-1
Teaching Hours:14
LAB EXERCISES
 

1.      Data Definition 

2.      Table Creation 

3.      Constraints

Unit-2
Teaching Hours:16
RELATIONAL MODEL AND DATABASE DESIGN
 

SQL and Integrity Constraints, Concept of DDL, DML, DCL. Basic Structure, Set operations, Aggregate Functions, Null Values, Domain Constraints, Referential Integrity Constraints, assertions, views, Nested Subqueries, Functional Dependency, Different anomalies in designing a Database, Normalization : using functional dependencies, Boyce-Codd Normal Form, 4NF, 5NF

Unit-2
Teaching Hours:16
LAB EXERCISES
 

1.      Insert, Select, Update & Delete Commands 

2.      Nested Queries & Join Queries 

3.      Views

 

Unit-3
Teaching Hours:14
INTELLIGENT DATABASES
 

Active databases, Deductive Databases, Knowledge bases, Multimedia Databases, Multidimensional Data Structures, Image Databases, Text/Document Databases, Video Databases, Audio Databases, Multimedia Database Design.

Unit-4
Teaching Hours:16
DATA WAREHOUSE: THE BUILDING BLOCKS
 

Defining Features, Data Warehouses and Data Marts, Architectural Types, Overview of the Components, Metadata in the Data warehouse, Data Design and Data Preparation: Principles of Dimensional Modeling, Dimensional Modeling Advanced Topics From Requirements To Data Design, The Star Schema, Star Schema Keys, Advantages of the Star Schema, Star Schema: Examples, Dimensional Modeling: Advanced Topics, Updates to the Dimension Tables, Miscellaneous Dimensions, The Snowflake Schema, Aggregate Fact Tables, Families Oo Stars

Unit-5
Teaching Hours:14
REQUIREMENTS, REALITIES, ARCHITECTURE AND DATA FLOW
 

ETL Data Structures, Extracting, Cleaning and Conforming, Delivering

 

Dimension Tables, Delivering Fact Tables (CH:1,2,3,4,5,6)

Unit-5
Teaching Hours:14
LAB EXERCISES
 

1.      Importing source data structures 

2.      Design Target Data Structures 

3.      Create target structure 

4.      Design and build the ETL mapping

Unit-6
Teaching Hours:16
IMPLEMENTATION, OPERATIONS AND ETL SYSTEMS
 

Development, Operations, Metadata, Real-Time ETL Systems. (CH:7,8,9,11)

Unit-6
Teaching Hours:16
LAB EXERCISES
 

1.      Perform the ETL process and transform into data map 

2.      Create the cube and process it 

3.      Generating Reports 

4.      Creating the Pivot table and pivot chart using some existing data

Text Books And Reference Books:

[1].  Henry F. Korth and Silberschatz Abraham, “Database System Concepts”, Mc.Graw Hill.

[2].   Thomas Cannolly and Carolyn Begg, “Database Systems, A Practical Approach to Design, Implementation and Management”, Third Edition, Pearson Education, 2007.

[3].  The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling, 2nd  John Wiley & Sons, Inc. New York, USA, 2002

Essential Reading / Recommended Reading

[1]   Lior Rokach and Oded Maimon, Data Mining and Knowledge Discovery Handbook, Springer, 2nd edition, 2010.

Evaluation Pattern

CIA -50%

ESE - 50%

 

MDS172 - INFERENTIAL STATISTICS (2019 Batch)

Total Teaching Hours for Semester:90
No of Lecture Hours/Week:6
Max Marks:150
Credits:5

Course Objectives/Course Description

 

This course is designed to introduce the concepts of theory of estimation and testing of hypothesis. This paper also deals with the concept of parametric tests for large and small samples. It also provides knowledge about non-parametric tests and its applications.

Learning Outcome

CO1: Demonstrate the concepts of point and interval estimation of unknown parameters and their significance using large and small samples.

CO2: Apply the idea of sampling distributions of difference statistics in testing of hypotheses.

CO3: Infer the concept of nonparametric tests for single sample and two samples.

Unit-1
Teaching Hours:15
SUFFICIENT STATISTICS
 

 

Neyman - Fisher Factorisation theorem - the existence and construction of minimal sufficient statistics - Minimal sufficient statistics and exponential family - sufficiency and completeness - sufficiency and invariance.

Lab Exercise

  1. Drawing random samples using random number tables .
  2. Point estimation of parameters and obtaining estimates of standard errors.

 

Unit-2
Teaching Hours:15
UNBIASED ESTIMATION
 

Minimum variance unbiased estimation - locally minimum variance unbiased estimators - Rao Blackwell – theorem – Completeness: Lehmann Scheffe theorems - Necessary and sufficient condition for unbiased estimators - Cramer- Rao lower bound - Bhattacharya system of lower bounds in the 1-parameter regular case - Chapman -Robbins inequality

Lab Exercise

  1. Comparison of estimators by plotting mean square error.
  2. Computing maximum likelihood estimates -1
  3. Computing maximum likelihood estimates - 2
  4. Computing moment estimates
Unit-3
Teaching Hours:15
MAXIMUM LIKELIHOOD ESTIMATION
 

Computational routines - strong consistency of maximum likelihood estimators - Asymptotic Efficiency of maximum likelihood estimators - Best Asymptotically Normal estimators - Method of moments - Bayes’ and minimax estimation: The structure of Bayes’ rules - Bayes’ estimators for quadratic and convex loss functions - minimax estimation - interval estimation.

Lab Exercise

  1. Constructing confidence intervals based on large samples.
  2. Constructing confidence intervals based on small samples.
  3. Generating random samples from discrete distributions.
  4. Generating random samples from continuous distributions.
Unit-4
Teaching Hours:15
HYPOTHESIS TESTING
 

Uniformly most powerful tests - the Neyman-Pearson fundamental Lemma - Distributions with monotone likelihood ratio - Problems - Generalization of the fundamental lemma, two sided hypotheses - testing the mean and variance of a normal distribution.

Lab Exercise

  1. Evaluation of probabilities of Type-I and Type-II errors and powers of tests.
  2. MP test for parameters of binomial and Poisson distributions.
  3. MP test for the mean of a normal distribution and power curve.
  4. Tests for mean, equality of means when variance is (i) known, (ii) unknown under normality (small and large samples)
Unit-5
Teaching Hours:15
MEAN TESTS
 

Unbiasedness for hypotheses testing - similarity and completeness - UMP unbiased tests for multi parameter exponential families - comparing two Poisson or Binomial populations - testing the parameters of a normal distribution (unbiased tests) - comparing the mean and variance of two normal distributions - Symmetry and invariance - maximal invariance - most powerful invariant tests.

Lab Exercise

  1. Tests for single proportion and equality of two proportions.
  2. Tests for variance and equality of two variances under normality
  3. Tests for correlation and regression coefficients.
Unit-6
Teaching Hours:15
SEQUENTIAL TESTS
 

SPRT procedures - likelihood ratio tests - locally most powerful tests - the concept of confidence sets - non parametric tests.

Lab Exercise

  1. Tests for the independence of attributes, analysis of categorical data and tests for the goodness of fit.(For uniform, binomial and Poisson distributions)
  2. Nonparametric tests.
  3. SPRT for binomial proportion and mean of a normal distribution.
Text Books And Reference Books:

[1]. Rajagopalan M and Dhanavanthan P, Statistical Inference, PHI Learning (P) Ltd, New Delhi, 2012.

[2]. An Introduction to Probability and Statistics, V.K Rohatgi and Saleh, 3rd Edition, 2015.

Essential Reading / Recommended Reading

[1]. Introduction to the theory of statistics, A.M Mood, F.A Graybill and D.C Boes, Tata McGraw-Hill, 3rd Edition (Reprint), 2017.

[2]. Linear Statistical Inference and its Applications, Rao C.R, Willy Publications, 2nd Edition, 2001.

Evaluation Pattern

CIA - 50%

ESE - 50%

MDS172L - INFERENTIAL STATISTICAL LABORATORY (2019 Batch)

Total Teaching Hours for Semester:90
No of Lecture Hours/Week:6
Max Marks:150
Credits:5

Course Objectives/Course Description

 

Course Objectives

This course is designed to introduce the concepts of theory of estimation and testing of hypothesis. This paper also deals with the concept of parametric tests for large and small samples. It also provides knowledge about non-parametric tests and its applications.

 

Learning Outcome

Course Learning Outcomes

CO1: Demonstrate the concepts of point and interval estimation of unknown parameters and their significance using large and small samples.

CO2: Apply the idea of sampling distributions of difference statistics in testing of hypotheses.

CO3: Infer the concept of nonparametric tests for single sample and two samples.

Unit-1
Teaching Hours:15
SUFFICIENT STATISTICS
 

Neyman - Fisher Factorisation theorem - the existence and construction of minimal sufficient statistics - Minimal sufficient statistics and exponential family - sufficiency and completeness

- sufficiency and invariance

Unit-2
Teaching Hours:15
UNBIASED ESTIMATION
 

Minimum variance unbiased estimation - locally minimum variance unbiased estimators - Rao Blackwell – theorem – Completeness: Lehmann Scheffe theorems - Necessary and sufficient condition for unbiased estimators - Cramer- Rao lower bound - Bhattacharya system of lower bounds in the 1-parameter regular case - Chapman -Robbins inequality

Unit-3
Teaching Hours:15
MAXIMUM LIKELIHOOD ESTIMATION
 

Computational routines - strong consistency of maximum likelihood estimators - Asymptotic Efficiency of maximum likelihood estimators - Best Asymptotically Normal estimators - Method of moments - Bayes’ and minimax estimation: The structure of Bayes’ rules - Bayes’ estimators for quadratic and convex loss functions - minimax estimation - interval estimation.

Unit-4
Teaching Hours:15
HYPOTHESIS TESTING
 

Uniformly most powerful tests - the Neyman-Pearson fundamental Lemma - Distributions with monotone likelihood ratio - Problems - Generalization of the fundamental lemma, two sided hypotheses - testing the mean and variance of a normal distribution.

Unit-5
Teaching Hours:15
MEAN TESTS
 

Unbiased ness for hypotheses testing - similarity and completeness - UMP unbiased tests for multi parameter exponential families - comparing two Poisson or Binomial populations - testing the parameters of a normal distribution (unbiased tests) - comparing the mean and variance of two normal distributions - Symmetry and invariance - maximal invariance - most powerful invariant tests.

Unit-6
Teaching Hours:15
SEQUENCTIAL TESTS
 

SPRT procedures - likelihood ratio tests - locally most powerful tests - the concept of confidence sets - non parametric tests.

Text Books And Reference Books:

Essential Reading

[1]. Rajagopalan M and Dhanavanthan P, Statistical Inference, PHI Learning (P) Ltd, New Delhi, 2012.

[2]. An Introduction to Probability and Statistics, V.K Rohatgi and Saleh, 3rd Edition, 2015.

Essential Reading / Recommended Reading

Recommended Reading

[1]. Introduction to the theory of statistics, A.M Mood, F.A Graybill and D.C Boes, Tata McGraw-Hill, 3rd Edition (Reprint), 2017.

[2]. Linear Statistical Inference and its Applications, Rao C.R, Willy Publications, 2nd Edition, 2001.

Evaluation Pattern

1)      CIA COMPONENTS – EVALUATION RUBRICS

CIA 1:Component 1

Assignment Title: Multiple Choice question test for basics of interval estimation and point estimation

Assignment type: Individual

No of Test - 3

Assignment details:

1. Each learner will be given 20 questions in the classroom

2. Quiz will be taken on Moodle.

3. Each learner will be able to take up the test only once. A re-test will be conducted only for absentees, who have a genuine reason to justify.

4.  Maximum time limit for answering the questions is 30 minutes.

 

Tentative Date: 3rd week of August 2019

Venue: Classroom

Submission Format: To be handwritten in A4 size answer sheets.

Assignment Learning Objectives:

1. To enable the learner understand the point and interval estimation concept.

2. To enable learner to critically examine the sampling distributions.

3. To enable learner to understand how to obtain estimate of standard errors.

Assessment Strategies aligned to LO:

1. The correct answers will be evaluated and marks will be given to each question.

2.  Evaluation of the various scenarios included in problem solving.

 

CIA 1: Component 2

Assignment Title: Theory assignment on unbiased estimation and Maximum likelihood estimation.

No of Assignment - 6

Assignment type: Individual

Assignment details:

1. Each learner will be given 10 questions in the classroom edium problem solving.

2. The learners are expected to answer in an A4 size sheet and hand it over to the course instructor before the timeline.

 

Tentative Date: Last week of September 2019

Submission Format: To be handwritten in A4 size sheets.

Assignment Learning Objectives:

1. To enable the learners to critically examine a real time problem and depict it in the form of sampling. 

2. To analyze the interpretation capability of the problem and its explanation capability by discussing all the scenarios.

 

Assessment Strategies aligned to LO:

1. Learner will be evaluated as per the following criteria.

2. Understanding and explanation of each concept is analyzed.

 

 

 

Rubrics

 

 

Criteria

 

 

Points

Area of

Evaluation

4

3

2

1

 

Assignment

Completeness

All Questions

are attempted

75% of the work

Completed

50% of the work

completed

< 50%

 

Accuracy

All correct

75% correct

50% correct

< 50%

 

Knowledge

Shows complete

Understanding of the concept,

Shows substantial

understanding,  of Concept

Response

shows some

understanding of

the problem

Completely

lacking

 

Submissions

On or Before the

due date

Submits but a

day or two

delays

Needs constants

reminders

Doesn’t submit

 

Legibility

Legible hand

writing, neat

diagrams

Marginally

Legible

Writing is not

legible in places

Not legible

 

Total

 

 

 

CIA 3: Component 1

Assignment Title: Multiple Choice question test for basics of interval estimation and point estimation

Assignment type: Individual

No of Test - 3

Assignment details:

1. Each learner will be given 20 questions in the classroom

2. Quiz will be taken on Moodle.

3. Each learner will be able to take up the test only once. A re-test will be conducted only for absentees, who have a genuine reason to justify.

4.  Maximum time limit for answering the questions is 30 minutes.

 

Tentative Date: 3rd week of August 2019

Venue: Classroom

Submission Format: To be handwritten in A4 size answer sheets.

Assignment Learning Objectives:

1. To enable the learner understand the point and interval estimation concept.

2. To enable learner to critically examine the sampling distributions.

3. To enable learner to understand how to obtain estimate of standard errors.

Assessment Strategies aligned to LO:

1. The correct answers will be evaluated and marks will be given to each question.

2.  Evaluation of the various scenarios included in problem solving.

 

CIA 3: Component 2

Assignment Title: Theory assignment on unbiased estimation and Maximum likelihood estimation.

No of Assignment - 6

Assignment type: Individual

Assignment details:

1. Each learner will be given 10 questions in the classroom edium problem solving.

2. The learners are expected to answer in an A4 size sheet and hand it over to the course instructor before the timeline.

 

Tentative Date: Last week of September 2019

Submission Format: To be handwritten in A4 size sheets.

Assignment Learning Objectives:

1. To enable the learners to critically examine a real time problem and depict it in the form of sampling. 

2. To analyze the interpretation capability of the problem and its explanation capability by discussing all the scenarios.

 

Assessment Strategies aligned to LO:

1. Learner will be evaluated as per the following criteria.

2. Understanding and explanation of each concept is analyzed.

 

 

 

Rubrics

 

 

Criteria

 

 

Points

Area of

Evaluation

4

3

2

1

 

Assignment

Completeness

All Questions

are attempted

75% of the work

Completed

50% of the work

completed

< 50%

 

Accuracy

All correct

75% correct

50% correct

< 50%

 

Knowledge

Shows complete

Understanding of the concept,

Shows substantial

understanding,  of Concept

Response

shows some

understanding of

the problem

Completely

lacking

 

Submissions

On or Before the

due date

Submits but a

day or two

delays

Needs constants

reminders

Doesn’t submit

 

Legibility

Legible hand

writing, neat

diagrams

Marginally

Legible

Writing is not

legible in places

Not legible

 

Total

 

 

 

Laboratory Practices :( coding) : ( 2hrs/Week)

 

1. Drawing random samples using random number tables .

2. Point estimation of parameters and obtaining estimates of standard errors.

3. Comparison of estimators by plotting mean square error.

4. Computing maximum likelihood estimates -1

5. Computing maximum likelihood estimates - 2

6. Computing moment estimates

7. Constructing confidence intervals based on large samples.

8. Constructing confidence intervals based on small samples.

9. Generating random samples from discrete distributions.

10. Generating random samples from continuous distributions.

11. Evaluation of probabilities of Type-I and Type-II errors and powers of tests.

12. MP test for parameters of binomial and Poisson distributions.

13. MP test for the mean of a normal distribution and power curve.

14. Tests for mean, equality of means when variance is (i) known, (ii) unknown under normality

(small and large samples)

15. Tests for single proportion and equality of two proportions.

16. Tests for variance and equality of two variances under normality

17. Tests for correlation and regression coefficients.

18. Tests for the independence of attributes, analysis of categorical data and tests for the goodness

of fit.(For uniform, binomial and Poisson distributions)

19. Nonparametric tests.

20. SPRT for binomial proportion and mean of a normal distribution..

 

Tentative Date:

Venue: Classroom and Laboratory

Laboratory: coding ( 2 hrs/week )

Submission Format: Program to be executed.

Laboratory   : Through observation and Record                         

Assignment Learning Objectives:

1. To recognize and apply the sampling distribution in the given sample.

2. To understand and implement a problem logically. 

3. To critically examine a problem and infer the correct inference.

Assessment Strategies aligned to LO:

1. The usage of concepts to evaluate and solve a specific problem is assessed. Inference of the results to be accurately provided. The accuracy and the relevance of results yielded is assessed.

Technology and Tools used: LMS to upload the screenshot of the result

1.       Evaluation Rubrics

 

         OBSERVATION   : 25 marks

 

Submission deadlines

Evaluated out of 25 if executed within the lab hours

Evaluated out of 20 if shown on the same day after the lab

Evaluated out of 15 if shown before the next lab after which 0 marks will be awarded.

 

 

 


                         Evaluation Matrix

 

Parameters

 

 

 

4 – 5

 

 

 

3

 

 

 

1-2

Sampling Distribution

 Exemplary

Competent

Needs improvement

R Functions

 Exemplary

Competent

Needs improvement

Conceptual Clarity

 Exemplary

Competent

Needs improvement

Inference

 Exemplary

Competent

Needs improvement

MDS173 - PROGRAMMING FOR DATA SCIENCE IN PYTHON (2019 Batch)

Total Teaching Hours for Semester:90
No of Lecture Hours/Week:6
Max Marks:100
Credits:4

Course Objectives/Course Description

 

The objective of this course is to provide comprehensive knowledge of python programming paradigms required for Data Science.

Learning Outcome

CO1: Demonstrate the usage of built-in objects in Python

CO2: Analyze the significance of  python program development environment by working on real world examples

CO3: Implement numerical programming, data handling and visualization through NumPy, Pandas and MatplotLib modules.

Unit-1
Teaching Hours:17
INTRODUCTION TO PYTHON
 

Structure of Python Program-Underlying mechanism of Module Execution-Branching and Looping-Problem Solving Using Branches and Loops-Functions - Lists and Mutability- Problem Solving Using Lists and Functions

Lab Exercises

1.      Demonstrate usage of branching and looping statements

2.      Demonstrate Recursive functions

3.      Demonstrate Lists

Unit-2
Teaching Hours:17
SEQUENCE DATATYPES AND OBJECT-ORIENTED PROGRAMMING
 

Sequences, Mapping and Sets- Dictionaries- -Classes: Classes and Instances-Inheritance-Exceptional Handling-Introduction to Regular Expressions using “re” module.

Lab Exercises

1.      Demonstrate Tuples and Sets

2.      Demonstrate Dictionaries

3.      Demonstrate inheritance and exceptional handling

4.   Demonstrate use of “re”.

Unit-3
Teaching Hours:13
USING NUMPY
 

Basics of NumPy-Computation on NumPy-Aggregations-Computation on Arrays-Comparisons, Masks and Boolean Arrays-Fancy Indexing-Sorting Arrays-Structured Data: NumPy’s Structured Array.

Lab Exercises

1.      Demonstrate Aggregation

2.      Demonstrate Indexing and Sorting

Unit-4
Teaching Hours:13
DATA MANIPULATION WITH PANDAS -I
 

Introduction to Pandas Objects-Data indexing and Selection-Operating on Data in Pandas-Handling Missing Data-Hierarchical Indexing - Combining Data Sets

Lab Exercises

1.      Demonstrate handling of missing data

2.      Demonstrate hierarchical indexing

Unit-5
Teaching Hours:17
DATA MANIPULATION WITH PANDAS -II
 

Aggregation and Grouping-Pivot Tables-Vectorized String Operations -Working with Time Series-High Performance Pandas- and query()

Lab Exercises

1.      Demonstrate usage of Pivot table

2.      Demonstrate use of and query()

Unit-6
Teaching Hours:13
VISUALIZATION AND MATPLOTLIB
 

Basic functions of matplotlib-Simple Line Plot, Scatter Plot-Density and Contour Plots-Histograms, Binnings and Density-Customizing Plot Legends, Colour Bars-Three-Dimensional Plotting in Matplotlib.

Lab Exercises

1.      Demonstrate Scatter Plot

2.      Demonstrate 3D plotting

Text Books And Reference Books:

[1]. Jake VanderPlas ,Python Data Science Handbook - Essential Tools for Working with Data, O’Reily Media,Inc, 2016

[2]. Zhang.Y ,An Introduction to Python and Computer Programming, Springer Publications,2016

Essential Reading / Recommended Reading

[1]. Joel Grus ,Data Science from Scratch First Principles with Python, O’Reilly Media,2016

[2]. T.R.Padmanabhan, Programming with Python,Springer Publications,2016

Evaluation Pattern

CIA -100%

MDS173L - PROGRAMMING FOR DATA SCIENCE IN PYTHON (2019 Batch)

Total Teaching Hours for Semester:90
No of Lecture Hours/Week:6
Max Marks:100
Credits:4

Course Objectives/Course Description

 

The objective of this course is to provide knowledge of python programming paradigms required for Data Science.

Learning Outcome

CO1: Understand and demonstrate the usage of built-in objects in Python

CO2:Analyze the significance of python program development environment and apply it to solve real world applications

CO3: Implement numerical programming, data handling and visualization through NumPy, Pandas and MatplotLib modules.

Unit-1
Teaching Hours:17
INTRODUCTION TO PYTHON
 

Structure of Python Program-Underlying mechanism of Module Execution-Branching and Looping-Problem Solving Using Branches and Loops-Functions - Lists and Mutability- Problem Solving Using Lists and Functions

Unit-2
Teaching Hours:17
SEQUENCE DATATYPES AND OBJECT-ORIENTED PROGRAMMING
 

Sequences, Mapping and Sets- Dictionaries- -Classes: Classes and Instances-Inheritance- Exceptional Handling-Introduction to Regular Expressions using “re” module.

Unit-3
Teaching Hours:13
USING NUMPY
 

Basics of NumPy-Computation on NumPy-Aggregations-Computation on Arrays- Comparisons, Masks and Boolean Arrays-Fancy Indexing-Sorting Arrays-Structured Data: NumPy’s Structured Array.

Unit-4
Teaching Hours:13
DATA MANIPULATION WITH PANDAS -I