|
|
|
1 Semester - 2019 - Batch | Course Code |
Course |
Type |
Hours Per Week |
Credits |
Marks |
MDS131 | MATHEMATICAL FOUNDATION FOR DATA SCIENCE - I | - | 4 | 4 | 100 |
MDS132 | PROBABILITY AND DISTRIBUTION THEORY | - | 4 | 4 | 100 |
MDS133 | PRINCIPLES OF DATA SCIENCE | - | 4 | 4 | 100 |
MDS134 | RESEARCH METHODOLOGY | - | 2 | 2 | 50 |
MDS161A | INTRODUCTION TO STATISTICS | - | 2 | 2 | 50 |
MDS161B | INTRODUCTION TO COMPUTERS AND PROGRAMMING | - | 2 | 2 | 50 |
MDS161C | LINUX ADMINISTRATION | - | 2 | 2 | 50 |
MDS171 | DATA BASE TECHNOLOGIES | - | 6 | 5 | 150 |
MDS172 | INFERENTIAL STATISTICS | - | 6 | 5 | 150 |
MDS173 | PROGRAMMING FOR DATA SCIENCE IN PYTHON | - | 6 | 4 | 100 |
2 Semester - 2019 - Batch | Course Code |
Course |
Type |
Hours Per Week |
Credits |
Marks |
MDS231 | MATHEMATICAL FOUNDATION FOR DATA SCIENCE - II | - | 4 | 4 | 100 |
MDS232 | REGRESSION ANALYSIS | - | 4 | 4 | 100 |
MDS233 | DESIGN AND ANALYSIS OF ALGORITHMS | - | 4 | 4 | 100 |
MDS234 | MACHINE LEARNING | - | 4 | 4 | 100 |
MDS241A | MULTIVARIATE ANALYSIS | - | 4 | 4 | 100 |
MDS241B | STOCHASTIC PROCESS | - | 4 | 4 | 100 |
MDS271 | PROGRAMMING FOR DATA SCIENCE IN R | - | 6 | 4 | 100 |
MDS272A | HADOOP | - | 6 | 5 | 150 |
MDS272B | IMAGE AND VIDEO ANALYTICS | - | 6 | 5 | 150 |
MDS272C | INTERNET OF THINGS | - | 6 | 5 | 150 |
MDS281 | RESEARCH PROBLEM IDENTIFICATION AND DATA COLLECTION | - | 1 | 0 | 0 |
| |
Introduction to Program: | |
Data Science is popular in all academia, business sectors, and research and development to make effective decision in day to day activities. MSc in Data Science is a two year programme with four semesters. This programme aims to provide opportunity to all candidates to master the skill sets specific to data science with research bent. The curriculum supports the students to obtain adequate knowledge in theory of data science with hands on experience in relevant domains and tools. Candidate gains exposure to research models and industry standard applications in data science through guest lectures, seminars, projects, internships, etc. | |
Assesment Pattern | |
CIA - 50% ESE - 50% | |
Examination And Assesments | |
CIA - 50% ESE - 50% |
MDS131 - MATHEMATICAL FOUNDATION FOR DATA SCIENCE - I (2019 Batch) | |
Total Teaching Hours for Semester:60 |
No of Lecture Hours/Week:4 |
Max Marks:100 |
Credits:4 |
Course Objectives/Course Description |
|
Linear Algebra plays a fundamental role in the theory of Data Science. This course aims at introducing the basic notions of vector spaces, Linear Algebra and the use of Linear Algebra in applications to Data Science. |
|
Course Outcome |
|
CO1: Understand the properties of Vector spaces CO2: Use the properties of Linear Maps in solving problems on Linear Algebra CO3: Demonstrate proficiency on the topics Eigenvalues, Eigenvectors and Inner Product Spaces CO4: Apply mathematics for some applications in Data Science. |
Unit-1 |
Teaching Hours:15 |
INTRODUCTION TO VECTOR SPACES
|
|
Vector Spaces: Rn and Cn, lists, Fnand digression on Fields, Definition of Vector spaces, Subspaces, sums of Subspaces, Direct Sums, Span and Linear Independence, bases, dimension. | |
Unit-2 |
Teaching Hours:20 |
LINEAR MAPS
|
|
Definition of Linear Maps - Algebraic Operations on - Null spaces and Injectivity - Range and Surjectivity - Fundamental Theorems of Linear Maps - Representing a Linear Map by a Matrix - Invertible Linear Maps - Isomorphic Vector spaces - Linear Map as Matrix Multiplication - Operators - Products of Vector Spaces - Product of Direct Sum - Quotients of Vector spaces. | |
Unit-3 |
Teaching Hours:10 |
EIGENVALUES, EIGENVECTORS, AND INNER PRODUCT SPACES
|
|
Eigenvalues and Eigenvectors - Eigenvectors and Upper Triangular matrices - Eigenspaces and Diagonal Matrices - Inner Products and Norms - Linear functionals on Inner Product spaces. | |
Unit-4 |
Teaching Hours:15 |
MATHEMATICS APPLIED TO DATA SCIENCE
|
|
Singular value decomposition - Handwritten digits and simple algorithm - Classification of handwritten digits using SVD bases - Tangent distance - Text Mining. | |
Text Books And Reference Books: [1] S. Axler, Linear algebra done right, Springer, 2017. [2] Eldén Lars, Matrix methods in data mining and pattern recognition, Society for Industrial and Applied Mathematics, 2007. | |
Essential Reading / Recommended Reading [1] E. Davis, Linear algebra and probability for computer science applications, CRC Press, 2012. [2] J. V. Kepner and J. R. Gilbert, Graph algorithms in the language of linear algebra, Society for Industrial and Applied Mathematics, 2011. [3] D. A. Simovici, Linear algebra tools for data mining, World Scientific Publishing, 2012. [4] P. N. Klein, Coding the matrix: linear algebra through applications to computer science, Newtonian Press, 2015. | |
Evaluation Pattern CIA - 50% ESE - 50% | |
MDS132 - PROBABILITY AND DISTRIBUTION THEORY (2019 Batch) | |
Total Teaching Hours for Semester:60 |
No of Lecture Hours/Week:4 |
Max Marks:100 |
Credits:4 |
Course Objectives/Course Description |
|
To enable the students to understand the properties and applications of various probability functions. |
|
Course Outcome |
|
CO1: Demonstrate the random variables and its functions CO2: Infer the expectations for random variable functions and generating functions. CO3: Demonstrate various discrete and continuous distributions and their usage |
Unit-1 |
Teaching Hours:10 |
ALGEBRA OF PROBABILITY
|
|
Algebra of sets - fields and sigma - fields, Inverse function -Measurable function – Probability measure on a sigma field – simple properties - Probability space - Random variables and Random vectors – Induced Probability space – Distribution functions –Decomposition of distribution functions. | |
Unit-2 |
Teaching Hours:10 |
EXPECTATION AND MOMENTS OF RANDOM VARIABLES
|
|
Definitions and simple properties - Moment inequalities – Holder, Jenson Inequalities – Characteristic function – definition and properties – Inversion formula. Convergence of a sequence of random variables - convergence in distribution - convergence in probability almost sure convergence and convergence in quadratic mean - Weak and Complete convergence of distribution functions – Helly - Bray theorem | |
Unit-3 |
Teaching Hours:10 |
LAW OF LARGE NUMBERS
|
|
Khintchin's weak law of large numbers, Kolmogorov strong law of large numbers (statement only) – Central Limit Theorem – Lindeberg – Levy theorem, Linderberg – Feller theorem (statement only), Liapounov theorem – Relation between Liapounov and Linderberg –Feller forms – Radon Nikodym theorem and derivative (without proof) – Conditional expectation – definition and simple properties. | |
Unit-4 |
Teaching Hours:10 |
DISTRIBUTION THEORY
|
|
Distribution of functions of random variables – Laplace, Cauchy, Inverse Gaussian, Lognormal, Logarithmic series and Power series distributions - Multinomial distribution - Bivariate Binomial – Bivariate Poisson – Bivariate Normal - Bivariate Exponential of Marshall and Olkin - Compound, truncated and mixture of distributions, Concept of convolution - Multivariate normal distribution (Definition and Concept only) | |
Unit-5 |
Teaching Hours:10 |
SAMPLING DISTRIBUTION
|
|
Sampling distributions: Non - central chi - square, t and F distributions and their properties - Distributions of quadratic forms under normality -independence of quadratic form and a linear form - Cochran’s theorem. | |
Unit-6 |
Teaching Hours:10 |
ORDER STATISTICS
|
|
Order statistics, their distributions and properties - Joint and marginal distributions of order statistics - Distribution of range and mid range -Extreme values and their asymptotic distributions (concepts only) - Empirical distribution function and its properties – Kolmogorov - Smirnov distributions – Life time distributions -Exponential and Weibull distributions - Mills ratio – Distributions classified by hazard rate | |
Text Books And Reference Books: [1]. Modern Probability Theory, B.R Bhat, New Age International, 4th Edition, 2014. [2]. An Introduction to Probability and Statistics, V.K Rohatgi and Saleh, 3rd Edition, 2015. | |
Essential Reading / Recommended Reading ]1]. Introduction to the theory of statistics, A.M Mood, F.A Graybill and D.C Boes, Tata McGraw-Hill, 3rd Edition (Reprint), 2017. [2]. Order Statistics, H.A David and H.N Nagaraja, John Wiley & Sons, 3rd Edition, 2003. | |
Evaluation Pattern CIA - 50% ESE - 50% | |
MDS133 - PRINCIPLES OF DATA SCIENCE (2019 Batch) | |
Total Teaching Hours for Semester:60 |
No of Lecture Hours/Week:4 |
Max Marks:100 |
Credits:4 |
Course Objectives/Course Description |
|
To provide strong foundation for data science and application area related to it and understand the underlying core concepts and emerging technologies in data science. |
|
Course Outcome |
|
CO1: Understand the fundamental concepts of data science CO2: Evaluate the data analysis techniques for applications handling large data CO3: Demonstrate the various machine learning algorithms used in data science process CO4: Understand the ethical practices of data science CO4:Visualize and present the inference using various tools CO5:Learn to think through the ethics surrounding privacy, data sharing and algorithmic decision-making |
Unit-1 |
Teaching Hours:10 |
INTRODUCTION TO DATA SCIENCE
|
|
Definition – Big Data and Data Science Hype – Why data science – Getting Past the Hype – The Current Landscape – Data Scientist - Data Science Process Overview – Defining goals – Retrieving data – Data preparation – Data exploration – Data modeling – Presentation. | |
Unit-2 |
Teaching Hours:10 |
BIG DATA
|
|
Problems when handling large data – General techniques for handling large data – Case study – Steps in big data – Distributing data storage and processing with Frameworks – Case study. | |
Unit-3 |
Teaching Hours:10 |
MACHINE LEARNING
|
|
Machine learning – Modeling Process – Training model – Validating model – Predicting new observations –Supervised learning algorithms – Unsupervised learning algorithms. | |
Unit-4 |
Teaching Hours:10 |
DEEP LEARNING
|
|
Introduction – Deep Feedforward Networks – Regularization – Optimization of Deep Learning – Convolutional Networks – Recurrent and Recursive Nets – Applications of Deep Learning. | |
Unit-5 |
Teaching Hours:10 |
DATA VISUALIZATION
|
|
Introduction to data visualization – Data visualization options – Filters – MapReduce – Dashboard development tools – Creating an interactive dashboard with dc.js-summary. | |
Unit-6 |
Teaching Hours:10 |
ETHICS AND RECENT TRENDS
|
|
Data Science Ethics – Doing good data science – Owners of the data - Valuing different aspects of privacy - Getting informed consent - The Five Cs – Diversity – Inclusion – Future Trends. | |
Text Books And Reference Books: [1]. Introducing Data Science, Davy Cielen, Arno D. B. Meysman, Mohamed Ali, Manning Publications Co., 1st edition, 2016 [2]. An Introduction to Statistical Learning: with Applications in R, Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani, Springer, 1st edition, 2013 [3]. Deep Learning, Ian Goodfellow, Yoshua Bengio, Aaron Courville, MIT Press, 1st edition, 2016 [4]. Ethics and Data Science, D J Patil, Hilary Mason, Mike Loukides, O’ Reilly, 1st edition, 2018 | |
Essential Reading / Recommended Reading [1]. Data Science from Scratch: First Principles with Python, Joel Grus, O’Reilly, 1st edition, 2015 [2]. Doing Data Science, Straight Talk from the Frontline, Cathy O'Neil, Rachel Schutt, O’ Reilly, 1st edition, 2013 [3]. Mining of Massive Datasets, Jure Leskovec, Anand Rajaraman, Jeffrey David Ullman, Cambridge University Press, 2nd edition, 2014 | |
Evaluation Pattern CIA - 50% ESE - 50% | |
MDS134 - RESEARCH METHODOLOGY (2019 Batch) | |
Total Teaching Hours for Semester:30 |
No of Lecture Hours/Week:2 |
Max Marks:50 |
Credits:2 |
Course Objectives/Course Description |
|
This course is intended to assist students in planning and carrying out research. The students are exposed to the principles, procedures and techniques of implementing a research project. The course starts with an introduction to research and leads through the various methodologies involved in the research process. It focus on finding out the research gap from the literature using computer technology,introduces basic statistics required for research and report the research outcomes scientifically with emphasis on research ethics. |
|
Course Outcome |
|
CO1: Understand the essense of research and the necessity of defining a research problem. CO2: Apply research methods and methodology including research design, data analysis, and interpretation. CO3: Create scientific reports according to specified standards. |
Unit-1 |
Teaching Hours:8 |
RESEARCH METHODOLOGY
|
|
Defining research problem:Selecting the problem, Necessity of defining the problem ,Techniques involved in defining a problem- Ethics in Research. | |
Unit-2 |
Teaching Hours:8 |
RESEARCH DESIGN
|
|
Principles of experimental design,Working with Literature: Importance, finding literature, Using your resources, Managing the literature, Keep track of references,Using the literature, Literature review,On-line Searching: Database ,SCIFinder, Scopus, Science Direct ,Searching research articles , Citation Index ,Impact Factor ,H-index. | |
Unit-3 |
Teaching Hours:7 |
RESEARCH DATA
|
|
Measurement of Scaling: Quantitative, Qualitative, Classification of Measure scales, Data Collection, Data Preparation. | |
Unit-4 |
Teaching Hours:7 |
REPORT WRITING
|
|
Scientific Writing and Report Writing: Significance, Steps, Layout, Types, Mechanics and Precautions, Latex: Introduction, Text, Tables, Figures, Equations, Citations, Referencing, and Templates (IEEE style), Paper writing for international journals, Writing scientific report. | |
Text Books And Reference Books: [1] C. R. Kothari, Research Methodology Methods and Techniques, 3rd. ed. New Delhi: New Age International Publishers, Reprint 2014. [2] Zina O’Leary, The Essential Guide of Doing Research, New Delhi: PHI, 2005. | |
Essential Reading / Recommended Reading [1] J. W. Creswell, Research Design: Qualitative, Quantitative, and Mixed Methods Approaches, 4thed. SAGE Publications, 2014. [2] Kumar, Research Methodology: A Step by Step Guide for Beginners, 3rd. ed. Indian: PE, 2010. | |
Evaluation Pattern CIA - 50% ESE - 50% | |
MDS161A - INTRODUCTION TO STATISTICS (2019 Batch) | |
Total Teaching Hours for Semester:30 |
No of Lecture Hours/Week:2 |
Max Marks:50 |
Credits:2 |
Course Objectives/Course Description |
|
To enable the students to understand the fundamentals of statistics to apply descriptive measures and probability for data analysis. |
|
Course Outcome |
|
CO1: Demonstrate the history of statistics and present the data in various forms. CO2: Infer the concept of correlation and regression for relating two or more related variables. CO3: Demonstrate the probabilities for various events. |
Unit-1 |
Teaching Hours:8 |
ORGANIZATION AND PRESENTATION OF DATA
|
|
Origin and development of Statistics, Scope, limitation and misuse of statistics. Types of data: primary, secondary, quantitative and qualitative data. Types of Measurements: nominal, ordinal, discrete and continuous data. Presentation of data by tables: construction of frequency distributions for discrete and continuous data, graphical representation of a frequency distribution by histogram and frequency polygon, cumulative frequency distributions | |
Unit-2 |
Teaching Hours:8 |
DESCRIPTIVE STATISTICS
|
|
Measures of location or central tendency: Arthimetic mean, Median, Mode, Geometric mean, Harmonic mean. Partition values: Quartiles, Deciles and percentiles. Measures of dispersion: Mean deviation, Quartile deviation, Standard deviation, Coefficient of variation. Moments: measures of skewness, Kurtosis. | |
Unit-3 |
Teaching Hours:7 |
CORRELATION AND REGRESSION
|
|
Correlation: Scatter plot, Karl Pearson coefficient of correlation, Spearman's rank correlation coefficient, multiple and partial correlations (for 3 variates only). Regression: Concept of errors, Principles of Least Square, Simple linear regression and its properties. | |
Unit-4 |
Teaching Hours:7 |
BASICS OF PROBABILITY
|
|
Random experiment, sample point and sample space, event, algebra of events. Definition of Probability: classical, empirical and axiomatic approaches to probability, properties of probability. Theorems on probability, conditional probability and independent events, Laws of total probability, Baye’s theorem and its applications | |
Text Books And Reference Books: [1]. Rohatgi V.K and Saleh E, An Introduction to Probability and Statistics, 3rd edition, John Wiley & Sons Inc., New Jersey, 2015. [2]. Gupta S.C and Kapoor V.K, Fundamentals of Mathematical Statistics, 11th edition, Sultan Chand & Sons, New Delhi, 2014. | |
Essential Reading / Recommended Reading [1]. Mukhopadhyay P, Mathematical Statistics, Books and Allied (P) Ltd, Kolkata, 2015. [2]. Walpole R.E, Myers R.H, and Myers S.L, Probability and Statistics for Engineers and Scientists, Pearson, New Delhi, 2017. [3]. Montgomery D.C and Runger G.C, Applied Statistics and Probability for Engineers, Wiley India, New Delhi, 2013. [4]. Mood A.M, Graybill F.A and Boes D.C, Introduction to the Theory of Statistics, McGraw Hill, New Delhi, 2008. | |
Evaluation Pattern CIA - 50% ESE - 50% | |
MDS161B - INTRODUCTION TO COMPUTERS AND PROGRAMMING (2019 Batch) | |
Total Teaching Hours for Semester:30 |
No of Lecture Hours/Week:2 |
Max Marks:50 |
Credits:2 |
Course Objectives/Course Description |
|
To enable the students to understand the fundamental concepts of problem solving and programming structures. |
|
Course Outcome |
|
CO1: Demonstrate the systematic approach for problem solving using computers. CO2: Apply different programming structure with suitable logic for computational problems. |
Unit-1 |
Teaching Hours:10 |
COMPUTERS AND DIGITAL BASICS
|
|
Number Representation – Decimal, Binary, Octal, Hexadecimal and BCD numbers – Binary Arithmetic – Binary addition – Unsigned and Signed numbers – one’s and two’s complements of Binary numbers – Arithmetic operations with signed numbers - Number system conversions – Boolean Algebra – Logic gates – Design of Circuits – K - Map | |
Unit-2 |
Teaching Hours:5 |
GENERAL PROBLEM SOLVING CONCEPTS
|
|
Types of Problems – Problem solving with Computers – Difficulties with problem solving – problem solving concepts for the Computer – Constants and Variables – Rules for Naming and using variables – Data types – numeric data – character data – logical data – rules for data types – examples of data types – storing the data in computer - Functions – Operators – Expressions and Equations | |
Unit-3 |
Teaching Hours:5 |
PLANNING FOR SOLUTION
|
|
Communicating with computer – organizing the solution – Analyzing the problem – developing the interactivity chart – developing the IPO chart – Writing the algorithms – drawing the flow charts – pseudocode – internal and external documentation – testing the solution – coding the solution – software development life cycle. | |
Unit-4 |
Teaching Hours:10 |
PROBLEM SOLVING
|
|
Introduction to programming structure – pointers for structuring a solution – modules and their functions – cohesion and coupling – problem solving with logic structure. Problem solving with decisions – the decision logic structure – straight through logic – positive logic – negative logic – logic conversion – decision tables – case logic structure - examples. | |
Text Books And Reference Books: [1] Thomas L.Floyd and R.P.Jain,“Digital Fundamentals”,8th Edition, Pearson Education,2007. [2] Peter Norton “Introduction to Computers”,6th Edition, Tata Mc Graw Hill, New Delhi,2006. [3] Maureen Sprankle and Jim Hubbard, Problem solving and programming concepts, PHI, 9th Edition, 2012 | |
Essential Reading / Recommended Reading [1]. E Balagurusamy, Fundamentals of Computers, TMH, 2011 | |
Evaluation Pattern CIA - 50% ESE - 50% | |
MDS161C - LINUX ADMINISTRATION (2019 Batch) | |
Total Teaching Hours for Semester:30 |
No of Lecture Hours/Week:2 |
Max Marks:50 |
Credits:2 |
Course Objectives/Course Description |
|
To Enable the students to excel in the Linux Platform |
|
Course Outcome |
|
CO1: Demostrate the systematic approach for configure the Liux environment CO2: Manage the Linux environment to work with open source data science tools |
Unit-1 |
Teaching Hours:10 |
Unit I
|
|
RHEL7.5,breaking root password, Understand and use essential tools for handling files, directories, command-line environments, and documentation - Configure local storage using partitions and logical volumes | |
Unit-2 |
Teaching Hours:10 |
UNIT II
|
|
Unit-3 |
Teaching Hours:10 |
UNIT - III
|
|
Kernel updations,yum and nmcli configuration, Scheduling jobs,at,crontab - Configure firewall settings using firewall config, firewall-cmd, or iptables , Configure key-based authentication for SSH ,Set enforcing and permissive modes for SELinux , List and identify SELinux file and process context ,Restore default file contexts | |
Text Books And Reference Books: https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/ | |
Essential Reading / Recommended Reading https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/ | |
Evaluation Pattern CIA - 50% ESE - 50% | |
MDS171 - DATA BASE TECHNOLOGIES (2019 Batch) | |
Total Teaching Hours for Semester:90 |
No of Lecture Hours/Week:6 |
Max Marks:150 |
Credits:5 |
Course Objectives/Course Description |
|
The main objective of this course is to fundamental knowledge and practical experience with, database concepts. It includes the concepts and terminologies which facilitate the construction of database tables and write effective queries. Also, to Comprehend Data warehouse and its functions. |
|
Course Outcome |
|
CO1: Design conceptual models of a database using ER modeling CO2: Create and populate a RDBMS for a real life application, with constraints and keys, using SQL CO3: Retrieve any type of information from a data base by formulating complex queries in SQL CO4: Demonstrate various databases CO5: Distinguish database from data warehouse and examine ETL process |
Unit-1 |
Teaching Hours:16 |
INTRODUCTION
|
|
Concept & Overview of DBMS, Data Models, Database Languages, Database Administrator, Database Users, Three Schema architecture of DBMS. Basic concepts, Design Issues, Mapping Constraints, Keys, Entity-Relationship Diagram, Weak Entity Sets, Extended E-R features Lab Exercises 1. Data Definition, 2. Table Creation 3. Specification of Constraints | |
Unit-2 |
Teaching Hours:16 |
RELATIONAL MODEL AND DATABASE DESIGN
|
|
SQL and Integrity Constraints, Concept of DDL, DML, DCL. Basic Structure, Set operations, Aggregate Functions, Null Values, Domain Constraints, Referential Integrity Constraints, assertions, views, Nested Subqueries, Functional Dependency, Different anomalies in designing a Database, Normalization : using functional dependencies, Boyce-Codd Normal Form, 4NF, 5NF Lab Exercises 1. Insert, Select, Update & Delete Commands 2. Nested Queries & Join Queries 3. Views | |
Unit-3 |
Teaching Hours:10 |
INTELLIGENT DATABASES
|
|
Active databases, Deductive Databases, Knowledge bases, Multimedia Databases, Multidimensional Data Structures, Image Databases, Text/Document Databases, Video Databases, Audio Databases, Multimedia Database Design. | |
Unit-4 |
Teaching Hours:16 |
DATA WAREHOUSE: THE BUILDING BLOCKS
|
|
Defining Features, Data Warehouses and Data Marts, Architectural Types, Overview of the Components, Metadata in the Data warehouse, Data Design and Data Preparation: Principles of Dimensional Modeling, Dimensional Modeling Advanced Topics From Requirements To Data Design, The Star Schema, Star Schema Keys, Advantages of the Star Schema, Star Schema: Examples, Dimensional Modeling: Advanced Topics, Updates to the Dimension Tables, Miscellaneous Dimensions, The Snowflake Schema, Aggregate Fact Tables, Families Oo Stars | |
Unit-5 |
Teaching Hours:16 |
REQUIREMENTS, REALITIES, ARCHITECTURE AND DATA FLOW
|
|
Requirements, ETL Data Structures, Extracting, Cleaning and Conforming, Delivering Dimension Tables, Delivering Fact Tables (CH:1,2,3,4,5,6) Lab Exercises: 1. Importing source data structures 2. Design Target Data Structures 3. Create target structure 4. Design and build the ETL mapping | |
Unit-6 |
Teaching Hours:16 |
IMPLEMENTATION, OPERATIONS AND ETL SYSTEMS:
|
|
Development, Operations, Metadata, Real-Time ETL Systems. (CH:7,8,9,11)
Lab Exercises: 1. Perform the ETL process and transform into data map 2. Create the cube and process it 3. Generating Reports 4. Creating the Pivot table and pivot chart using some existing data | |
Text Books And Reference Books: [1]. Henry F. Korth and Silberschatz Abraham, “Database System Concepts”, Mc.Graw Hill. [2]. Thomas Cannolly and Carolyn Begg, “Database Systems, A Practical Approach to Design, Implementation and Management”, Third Edition, Pearson Education, 2007. [3]. The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling, 2nd John Wiley & Sons, Inc. New York, USA, 2002 | |
Essential Reading / Recommended Reading [1] LiorRokach and OdedMaimon, Data Mining and Knowledge Discovery Handbook, Springer, 2nd edition, 2010. | |
Evaluation Pattern CIA - 50% ESE - 50% | |
MDS172 - INFERENTIAL STATISTICS (2019 Batch) | |
Total Teaching Hours for Semester:90 |
No of Lecture Hours/Week:6 |
Max Marks:150 |
Credits:5 |
Course Objectives/Course Description |
|
This course is designed to introduce the concepts of theory of estimation and testing of hypothesis. This paper also deals with the concept of parametric tests for large and small samples. It also provides knowledge about non-parametric tests and its applications. |
|
Course Outcome |
|
CO1: Demonstrate the concepts of point and interval estimation of unknown parameters and their significance using large and small samples. CO2: Apply the idea of sampling distributions of difference statistics in testing of hypotheses. CO3: Infer the concept of nonparametric tests for single sample and two samples. |
Unit-1 |
Teaching Hours:15 |
SUFFICIENT STATISTICS
|
|
Neyman - Fisher Factorisation theorem - the existence and construction of minimal sufficient statistics - Minimal sufficient statistics and exponential family - sufficiency and completeness - sufficiency and invariance. Lab Exercise
| |
Unit-2 |
Teaching Hours:15 |
UNBIASED ESTIMATION
|
|
Minimum variance unbiased estimation - locally minimum variance unbiased estimators - Rao Blackwell – theorem – Completeness: Lehmann Scheffe theorems - Necessary and sufficient condition for unbiased estimators - Cramer- Rao lower bound - Bhattacharya system of lower bounds in the 1-parameter regular case - Chapman -Robbins inequality Lab Exercise
| |
Unit-3 |
Teaching Hours:15 |
MAXIMUM LIKELIHOOD ESTIMATION
|
|
Computational routines - strong consistency of maximum likelihood estimators - Asymptotic Efficiency of maximum likelihood estimators - Best Asymptotically Normal estimators - Method of moments - Bayes’ and minimax estimation: The structure of Bayes’ rules - Bayes’ estimators for quadratic and convex loss functions - minimax estimation - interval estimation. Lab Exercise
| |
Unit-4 |
Teaching Hours:15 |
HYPOTHESIS TESTING
|
|
Uniformly most powerful tests - the Neyman-Pearson fundamental Lemma - Distributions with monotone likelihood ratio - Problems - Generalization of the fundamental lemma, two sided hypotheses - testing the mean and variance of a normal distribution. Lab Exercise
| |
Unit-5 |
Teaching Hours:15 |
MEAN TESTS
|
|
Unbiasedness for hypotheses testing - similarity and completeness - UMP unbiased tests for multi parameter exponential families - comparing two Poisson or Binomial populations - testing the parameters of a normal distribution (unbiased tests) - comparing the mean and variance of two normal distributions - Symmetry and invariance - maximal invariance - most powerful invariant tests. Lab Exercise
| |
Unit-6 |
Teaching Hours:15 |
SEQUENTIAL TESTS
|
|
SPRT procedures - likelihood ratio tests - locally most powerful tests - the concept of confidence sets - non parametric tests. Lab Exercise
| |
Text Books And Reference Books: [1]. Rajagopalan M and Dhanavanthan P, Statistical Inference, PHI Learning (P) Ltd, New Delhi, 2012. [2]. An Introduction to Probability and Statistics, V.K Rohatgi and Saleh, 3rd Edition, 2015. | |
Essential Reading / Recommended Reading [1]. Introduction to the theory of statistics, A.M Mood, F.A Graybill and D.C Boes, Tata McGraw-Hill, 3rd Edition (Reprint), 2017. [2]. Linear Statistical Inference and its Applications, Rao C.R, Willy Publications, 2nd Edition, 2001. | |
Evaluation Pattern CIA - 50% ESE - 50% | |
MDS173 - PROGRAMMING FOR DATA SCIENCE IN PYTHON (2019 Batch) | |
Total Teaching Hours for Semester:90 |
No of Lecture Hours/Week:6 |
Max Marks:100 |
Credits:4 |
Course Objectives/Course Description |
|
The objective of this course is to provide comprehensive knowledge of python programming paradigms required for Data Science. |
|
Course Outcome |
|
CO1: Demonstrate the usage of built-in objects in Python CO2: Analyze the significance of python program development environment by working on real world examples CO3: Implement numerical programming, data handling and visualization through NumPy, Pandas and MatplotLib modules. |
Unit-1 |
Teaching Hours:17 |
INTRODUCTION TO PYTHON
|
|
Structure of Python Program-Underlying mechanism of Module Execution-Branching and Looping-Problem Solving Using Branches and Loops-Functions - Lists and Mutability- Problem Solving Using Lists and Functions Lab Exercises 1. Demonstrate usage of branching and looping statements 2. Demonstrate Recursive functions 3. Demonstrate Lists | |
Unit-2 |
Teaching Hours:17 |
SEQUENCE DATATYPES AND OBJECT-ORIENTED PROGRAMMING
|
|
Sequences, Mapping and Sets- Dictionaries- -Classes: Classes and Instances-Inheritance-Exceptional Handling-Introduction to Regular Expressions using “re” module. Lab Exercises 1. Demonstrate Tuples and Sets 2. Demonstrate Dictionaries 3. Demonstrate inheritance and exceptional handling 4. Demonstrate use of “re”. | |
Unit-3 |
Teaching Hours:13 |
USING NUMPY
|
|
Basics of NumPy-Computation on NumPy-Aggregations-Computation on Arrays-Comparisons, Masks and Boolean Arrays-Fancy Indexing-Sorting Arrays-Structured Data: NumPy’s Structured Array. Lab Exercises 1. Demonstrate Aggregation 2. Demonstrate Indexing and Sorting | |
Unit-4 |
Teaching Hours:13 |
DATA MANIPULATION WITH PANDAS -I
|
|
Introduction to Pandas Objects-Data indexing and Selection-Operating on Data in Pandas-Handling Missing Data-Hierarchical Indexing - Combining Data Sets Lab Exercises 1. Demonstrate handling of missing data 2. Demonstrate hierarchical indexing | |
Unit-5 |
Teaching Hours:17 |
DATA MANIPULATION WITH PANDAS -II
|
|
Aggregation and Grouping-Pivot Tables-Vectorized String Operations -Working with Time Series-High Performance Pandas- and query() Lab Exercises 1. Demonstrate usage of Pivot table 2. Demonstrate use of and query() | |
Unit-6 |
Teaching Hours:13 |
VISUALIZATION AND MATPLOTLIB
|
|
Basic functions of matplotlib-Simple Line Plot, Scatter Plot-Density and Contour Plots-Histograms, Binnings and Density-Customizing Plot Legends, Colour Bars-Three-Dimensional Plotting in Matplotlib. Lab Exercises 1. Demonstrate Scatter Plot 2. Demonstrate 3D plotting | |
Text Books And Reference Books: [1]. Jake VanderPlas ,Python Data Science Handbook - Essential Tools for Working with Data, O’Reily Media,Inc, 2016 [2]. Zhang.Y ,An Introduction to Python and Computer Programming, Springer Publications,2016 | |
Essential Reading / Recommended Reading [1]. Joel Grus ,Data Science from Scratch First Principles with Python, O’Reilly Media,2016 [2]. T.R.Padmanabhan, Programming with Python,Springer Publications,2016 | |
Evaluation Pattern CIA -100% | |
MDS231 - MATHEMATICAL FOUNDATION FOR DATA SCIENCE - II (2019 Batch) | |
Total Teaching Hours for Semester:60 |
No of Lecture Hours/Week:4 |
Max Marks:100 |
Credits:4 |
Course Objectives/Course Description |
|
This course aims at introducing the basic notions of multivariable calculus and graph theory in applications to Data Science. |
|
Course Outcome |
|
CO1: Understand the properties of multivariable calculus CO2: Understand the properties of graphs CO2: Apply mathematics for some applications in Data Science |
Unit-1 |
Teaching Hours:15 |
CALCULUS OF SEVERAL VARIABLES
|
|
Functions of Several Variables - Limits and continuity in HIgher Dimensions - Partial Derivatives - The Chain Rule - Directional Derivative and Gradient vectors - Tangent Planes and Differentials - Extreme Values and Saddle Points - Lagrange Multipliers. | |
Unit-2 |
Teaching Hours:10 |
INTRODUCTION TO CONVEX OPTIMIZATION
|
|
Affine and Convex Sets - Hyperplanes and half-spaces - Euclidean balls and ellipsoids - Norm balls and Norm cones - polyhedra - simplexs - The positive definite cone.- separating and supporting hyperplanes. | |
Unit-3 |
Teaching Hours:10 |
NORMS AND INNER PRODUCT SPACES
|
|
Introduction - Inequalities on Linear Spaces - Norms on Linear Spaces - Inner products - Orthogonality - Unitary and Orthogonal Matrices - norms for matrices | |
Unit-4 |
Teaching Hours:13 |
BASIC GRAPH THEORY
|
|
Graphs - subgraphs - factors - Paths - cycles - connectedness - trees - Euler tours - Hamiltonian cycles - Planar Graphs - Digraphs. | |
Unit-5 |
Teaching Hours:12 |
ALGORITHMS AND COMPLEXITY
|
|
Algorithms - Representing Graphs - The algorithm of Hierholzer - Writing algorithms - Complexity of Algorithms. | |
Text Books And Reference Books: [1]. M. D. Weir, J. Hass, and G. B. Thomas, Thomas' calculus. Pearson, 2016. [2]. S. P. Boyd and L. Vandenberghe, Convex optimization. Cambridge Univ. Pr., 2011. [3]. D. Jungnickel, Graphs, networks and algorithms. Springer, 2014. | |
Essential Reading / Recommended Reading [1]. J. Patterson and A. Gibson, Deep learning: a practitioner's approach. O'Reilly Media, 2017. [2]. S. Sra, S. Nowozin, and S. J. Wright, Optimization for machine learning. MIT Press, 2012. | |
Evaluation Pattern CIA - 50% ESE - 50% | |
MDS232 - REGRESSION ANALYSIS (2019 Batch) | |
Total Teaching Hours for Semester:60 |
No of Lecture Hours/Week:4 |
Max Marks:100 |
Credits:4 |
Course Objectives/Course Description |
|
This course aims to provide the grounding knowledge about the regression model building of simple and multiple regression. |
|
Course Outcome |
|
CO1: Demonstrate deeper understanding of the linear regression model. CO2: Evaluate R-square criteria for model selection CO3: Understand the forward, backward and stepwise methods for selecting the variables CO4: Understand the importance of multicollinearity in regression modelling CO5: Ability touse and understand generalizations of the linear model to binary and count data |
Unit-1 |
Teaching Hours:15 |
SIMPLE LINEAR REGRESSION
|
|
Introduction to regression analysis: Modelling a response, overview and applications of regression analysis, major steps in regression analysis. Simple linear regression (Two variables): assumptions, estimation and properties of regression coefficients, significance and confidence intervals of regression coefficients, measuring the quality of the fit. | |
Unit-2 |
Teaching Hours:15 |
MULTIPLE LINEAR REGRESSION
|
|
Multiple linear regression model: assumptions, ordinary least square estimation of regression coefficients, interpretation and properties of regression coefficient, significance and confidence intervals of regression coefficients. | |
Unit-3 |
Teaching Hours:10 |
CRITERIA FOR MODEL SELECTION
|
|
Mean Square error criteria, R2 and criteria for model selection; Need of the transformation of variables; Box-Cox transformation; Forward, Backward and Stepwise procedures. | |
Unit-4 |
Teaching Hours:10 |
RESIDUAL ANALYSIS
|
|
Residual analysis, Departures from underlying assumptions, Effect of outliers, Collinearity, Non-constant variance and serial correlation, Departures from normality, Diagnostics and remedies. | |
Unit-5 |
Teaching Hours:10 |
NON LINEAR REGRESSION
|
|
Introduction to nonlinear regression, Least squares in the nonlinear case and estimation of parameters, Models for binary response variables, estimation and diagnosis methods for logistic and Poisson regressions. Prediction and residual analysis. | |
Text Books And Reference Books: [1].D.C Montgomery, E.A Peck and G.G Vining, Introduction to Linear Regression Analysis, John Wiley and Sons,Inc.NY, 2003. [2]. S. Chatterjee and AHadi, Regression Analysis by Example, 4th Ed., John Wiley and Sons, Inc, 2006 [3].Seber, A.F. and Lee, A.J. (2003) Linear Regression Analysis, John Wiley, Relevant sections from chapters 3, 4, 5, 6, 7, 9, 10. | |
Essential Reading / Recommended Reading [1]. Iain Pardoe, Applied Regression Modeling, John Wiley and Sons, Inc, 2012. [2]. P. McCullagh, J.A. Nelder, Generalized Linear Models, Chapman & Hall, 1989. | |
Evaluation Pattern CIA - 50% ESE - 50% | |
MDS233 - DESIGN AND ANALYSIS OF ALGORITHMS (2019 Batch) | |
Total Teaching Hours for Semester:60 |
No of Lecture Hours/Week:4 |
Max Marks:100 |
Credits:4 |
Course Objectives/Course Description |
|
This course aims to introduce the methods to analyze and evaluate the performance of an algorithm. It introduces the different design techniques for designing efficient algorithms. |
|
Course Outcome |
|
CO1: Demonstrate their ability to apply appropriate Data Structures to solve problems. CO2: Design and develop algorithms using various design techniques. CO3: Evaluate the efficiency of Algorithms by analyzing the running time of algorithms for problems in various domain |
Unit-1 |
Teaching Hours:10 |
INTRODUCTION
|
|
Algorithm Specification – Analysis of Insertion sort – Performance Analysis - Space complexity – Time Complexity – Asymptotic notations – Amortized Analysis | |
Unit-2 |
Teaching Hours:10 |
DIVIDE & CONQUER and GREEDY APPROACH
|
|
Divide and Conquer – Binary search – Quick sort – Strassen’s Matrix Multiplication Greedy Approach – Knapsack problem – Minimum cost spanning tree – PRIM’s and Kruskal’s Algorithm – single source shortest path. | |
Unit-3 |
Teaching Hours:10 |
DYNAMIC PROGRAMMING AND BACKTRACKING
|
|
– Dynamic Programming – All pairs shortest path – longest common sequence - The general method – 8 Queens problem – Sum of subsets | |
Unit-4 |
Teaching Hours:10 |
BRANCH AND BOUND TECHNIQUES
|
|
Branch and Bound – 0/1 knapsack problem – Travelling salesperson problem | |
Unit-5 |
Teaching Hours:10 |
NP HARD and NP COMPLETE PROBLEMS
|
|
Basic Concepts – NP hard Graph problems - NP Hard Scheduling problems – NP hard code generation problems | |
Unit-6 |
Teaching Hours:10 |
ADVANCED TECHNIQUES
|
|
Approximation Algorithms – Polynomial time approximation schemes – PRAM Algorithms – Computational model – merge sort – Mesh Algorithms – Computational model – odd–even merge in a mesh – Hypercube Algorithms – Computational model – merge sort | |
Text Books And Reference Books: [1]. Horowitz, Sahni, Rajasekaran, Fundamentals of Computer Algorithms, Universities Press Pvt Ltd, second edition , 2010. [2]. Coremen T H, Leiserson C E, Rivest R L and Stein, Clifford, Introduction to Algorithms, PHI, Third Edition, 2010. | |
Essential Reading / Recommended Reading [1]. Donald E. Knuth, The Art of Computer Programming Volume 3, Sorting and Searching, 2nd Edition, Pearson Education, Addison-Wesley, 1997. [2]. GAV PAI, Data structures and Algorithms, Tata McGraw Hill, Jan 2008 | |
Evaluation Pattern CIA - 50% ESE - 50% | |
MDS234 - MACHINE LEARNING (2019 Batch) | |
Total Teaching Hours for Semester:60 |
No of Lecture Hours/Week:4 |
Max Marks:100 |
Credits:4 |
Course Objectives/Course Description |
|
This course aims to provide sound foundation to fundamental concepts of machine learning and its application and prepare students for advanced research and real time problem solving in machine learning and related fields. |
|
Course Outcome |
|
CO1: Understand a wide variety of learning algorithms. CO2: Apply a variety of learning algorithms to different domain data. CO3: Analyse and perform evaluation of learning algorithms and model selection. |
Unit-1 |
Teaching Hours:8 |
INTRODUCTION
|
|
Machine Learning-Examples of Machine Applications-Learning Associations-Classification-Regression-Unsupervised Learning-Reinforcement Learning. Supervised Learning: Learning class from examples-Vapnik-Chervonenkis Dimension-Probably Approach Corre(PAC)Learning-Noise-Learning Multiple classes.Regression-Model Selection and Generalization. | |
Unit-2 |
Teaching Hours:8 |
PARAMETRIC METHODS
|
|
Introduction to Parametric methods-Maximum Likelihood Estimation:Bernoulli Density-Multinomial Density-Gaussian Density.Evaluating an Estimator:Bias and Variance-The Bayes Estimator-Parametric Classification. | |
Unit-3 |
Teaching Hours:8 |
NONPARAMETRIC METHODS
|
|
Introduction-Nonparametric Density Estimation: Histogram Estimator-Kernel Estimator-K-Nearest Neighbour Estimator-Generalization to Multivariate Data-Nonparametric Classification-Distance Based Classification-Outlier Detection. | |
Unit-4 |
Teaching Hours:12 |
MULTIVARIATE METHODS & DIMENSIONALITY REDUCTION
|
|
Multivariate Data-Parameter Estimation-Estimation of Missing Values-Multivariate Normal Distribution- Multivariate Classification-Tuning Complexity-Discrete Features. Dimensionality Reduction: Introduction- Subset Selection-Principal Component Analysis, Feature Embedding-Factor Analysis-Singular Value Decomposition-Multidimensional Scaling-Linear Discriminant Analysis-Canonical Correlation Analysis-Laplacian Eigenmaps | |
Unit-5 |
Teaching Hours:12 |
SUPERVISED LEARNING
|
|
Linear Discrimination:Introduction- Generalizing the Linear Model-Geometry of the Linear Discriminant- Pairwise Separation-Gradient Descent-Logistic Discrimination. Bayesian Estimation:Introduction-Estimating the Parameter of a Discrete Distribution-Bayesian Estimation of the Parameters of a Gaussian Distribution-Bayesian Estiamtion of the Parameters of a Function-Bayesian Classification-Bayesian Models Comparison. | |
Unit-6 |
Teaching Hours:12 |
UNSUPERVISED LEARNING
|
|
Clustering:Introduction-Mixture Densities, K-Means Clustering- Expectation-Maximization algorithm- Mixtures of Latent Varaible Models-Supervised Learning after Clustering-Spectral Clustering-Hierachial Clustering-Clustering, Choosing the number of Clusters. | |
Text Books And Reference Books: [1]. E. Alpaydin, Introduction to Machine Learning, 3rd Edition, MIT Press, 2014. | |
Essential Reading / Recommended Reading [1]. C.M. Bishop, Pattern Recognition and Machine Learning, Springer, 2016. [2]. T. Hastie, R. Tibshirani and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference and Prediction, Springer, 2nd Edition, 2009 [3]. K. P. Murphy, Machine Learning:A Probabilistic Perspective, MIT Press, 2012. | |
Evaluation Pattern CIA - 50% ESE - 50% | |
MDS241A - MULTIVARIATE ANALYSIS (2019 Batch) | |
Total Teaching Hours for Semester:60 |
No of Lecture Hours/Week:4 |
Max Marks:100 |
Credits:4 |
Course Objectives/Course Description |
|
This course lays the foundation of Multivariate data analysis. The exposure provided to multivariate data structure, multinomial and multivariate normal distribution, estimation and testing of parameters, various data reduction methods would help the students in having a better understanding of research data, its presentation and analysis. |
|
Course Outcome |
|
CO1: Understand multivariate data structure, multinomial and multivariate normal distribution CO2: Apply Multivariate analysis of variance (MANOVA) of one and two-way classified data. |
Unit-1 |
Teaching Hours:12 |
INTRODUCTION
|
|
Basic concepts on multivariate variable. Multivariate normal distribution, Marginal and conditional distribution, Concept of random vector: Its expectation and Variance-Covariance matrix. Marginal and joint distributions. Conditional distributions and Independence of random vectors. Multinomial distribution. Sample mean vector and its distribution. | |
Unit-2 |
Teaching Hours:12 |
DISTRIBUTION
|
|
Sample mean vector and its distribution. Likelihood ratio tests: Tests of hypotheses about the mean vectors and covariance matrices for multivariate normal populations. Independence of sub vectors and sphericity test. | |
Unit-3 |
Teaching Hours:12 |
MULTIVARIATE ANALYSIS
|
|
Multivariate analysis of variance (MANOVA) of one and two- way classified data. Multivariate analysis of covariance. Wishart distribution, Hotelling’s T2 and Mahalanobis’ D2 statistics, Null distribution of Hotelling’s T2. Rao’s U statistics and its distribution. | |
Unit-4 |
Teaching Hours:12 |
CLASSIFICATION AND DISCRIMINANT PROCEDURES
|
|
Bayes, minimax, and Fisher’s criteria for discrimination between two multivariate normal populations. Sample discriminant function. Tests associated with discriminant functions. Probabilities of misclassification and their estimation. Discrimination for several multivariate normal populations | |
Unit-5 |
Teaching Hours:12 |
PRINCIPAL COMPONENT and FACTOR ANALYSIS
|
|
Principal components, sample principal components asymptotic properties. Canonical variables and canonical correlations: definition, estimation, computations. Test for significance of canonical correlations. Factor analysis: Orthogonal factor model, factor loadings, estimation of factor loadings, factor scores. Applications | |
Text Books And Reference Books: [1]. Anderson, T.W. 2009. An Introduction to Multivariate Statistical Analysis, 3rd Edition, John Wiley. [2]. Everitt B, Hothorn T, 2011. An Introduction to Applied Multivariate Analysis with R, Springer. [3]. Barry J. Babin, Hair, Rolph E Anderson, and William C. Blac, 2013, Multivariate Data Analysis, Pearson New International Edition, | |
Essential Reading / Recommended Reading [1] Giri, N.C. 1977. Multivariate Statistical Inference. Academic Press. [2] Chatfield, C. and Collins, A.J. 1982. Introduction to Multivariate analysis. Prentice Hall [3] Srivastava, M.S. and Khatri, C.G. 1979. An Introduction to Multivariate Statistics. North Holland | |
Evaluation Pattern CIA - 50% ESE - 50% | |
MDS241B - STOCHASTIC PROCESS (2019 Batch) | |
Total Teaching Hours for Semester:60 |
No of Lecture Hours/Week:4 |
Max Marks:100 |
Credits:4 |
Course Objectives/Course Description |
|
This course is designed to introduce the concepts of theory of estimation and testing of hypothesis. This paper also deals with the concept of parametric tests for large and small samples. It also provides knowledge about non-parametric tests and its applications. |
|
Course Outcome |
|
CO1: Demonstrate the concepts of point and interval estimation of unknown parameters and their significance using large and small samples. CO2: Apply the idea of sampling distributions of difference statistics in testing of hypotheses. CO3: Infer the concept of nonparametric tests for single sample and two samples. |
Unit-1 |
Teaching Hours:12 |
INTRODUCTION TO STOCHASTIC PROCESSES
|
|
Classification of Stochastic Processes, Markov Processes – Markov Chain - Countable State Markov Chain. Transition Probabilities, Transition Probability Matrix. Chapman - Kolmogorov's Equations, Calculation of n - step Transition Probability and its limit. | |
Unit-2 |
Teaching Hours:12 |
POISSON PROCESS
|
|
Classification of States, Recurrent and Transient States - Transient Markov Chain, Random Walk and Gambler's Ruin Problem. Continuous Time Markov Process:, Poisson Processes, Birth and Death Processes, Kolmogorov’s Differential Equations, Applications. | |
Unit-3 |
Teaching Hours:12 |
BRANCHING PROCESS
|
|
Branching Processes – Galton – Watson Branching Process - Properties of Generating Functions – Extinction Probabilities – Distribution of Total Number of Progeny. Concept of Weiner Process. | |
Unit-4 |
Teaching Hours:12 |
RENEWAL PROCESS
|
|
Renewal Processes – Renewal Process in Discrete and Continuous Time – Renewal Interval – Renewal Function and Renewal Density – Renewal Equation – Renewal theorems: Elementary Renewal Theorem. Probability Generating Function of Renewal Processes. | |
Unit-5 |
Teaching Hours:12 |
STATIONARY PROCESS
|
|
Stationary Processes: Discrete Parameter Stochastic Process – Application to Time Series. Auto-covariance and Auto-correlation functions and their properties. Moving Average, Autoregressive, Autoregressive Moving Average, Autoregressive Integrated Moving Average Processes. Basic ideas of residual analysis, diagnostic checking, forecasting. | |
Text Books And Reference Books: [1]. Stochastic Processes, R.G Gallager, Cambridge University Press, 2013. [2]. Stochastic Processes, S.M Ross, Wiley India Pvt. Ltd, 2008. | |
Essential Reading / Recommended Reading [1]. Stochastic Processes from Applications to Theory, P.D Moral and S. Penev, CRC Press, 2016 [2]. Introduction to Probability and Stochastic Processes with Applications, B..C. Liliana, A Viswanathan, S. Dharmaraja, Wiley Pvt. Ltd, 2012. | |
Evaluation Pattern CIA - 50% ESE - 50% | |
MDS271 - PROGRAMMING FOR DATA SCIENCE IN R (2019 Batch) | |
Total Teaching Hours for Semester:90 |
No of Lecture Hours/Week:6 |
Max Marks:100 |
Credits:4 |
Course Objectives/Course Description |
|
This lab is designed to introduce implementation of practical machine learning algorithms using R programming language. The lab will extensively use datasets from real life situations. |
|
Course Outcome |
|
CO1: Demonstrate to use R in any OS (Windows / Mac / Linux). CO2: Analyse the use of basic functions of R Package. CO3: Demonstrate exploratory data analysis (EDA) for a given data set. CO4: Create and edit visualizations with R CO5: Implement and assess relevance and effectiveness of machine learning algorithms for a given dataset. |
Unit-1 |
Teaching Hours:18 |
R INSTALLTION, SETUP AND LINEAR REGRESSION
|
|
Download and install R – R IDE environments – Why R – Getting started with R – Vectors and Data Frames – Loading Data Frames – Data analysis with summary statistics and scatter plots – Summary tables - Working with Script Files Linear Regression – Introduction – Regression model for one variable regression – Selecting best model – Error measures SSE, SST, RMSE, R2 – Interpreting R2 – Multiple linear regression – Lasso and ridge regression – Correlation – Recitation – A minimum of 3 data sets for practice | |
Unit-2 |
Teaching Hours:18 |
LOGISTIC REGRESSION
|
|
Logistic Regression – The Logit – Confusion matrix – sensitivity, specificity – ROC curve – Threshold selection with ROC curve – Making predictions – Area under the ROC curve (AUC) - Recitation – A minimum of 3 data sets for practice | |
Unit-3 |
Teaching Hours:18 |
DECISION TREES
|
|
Approaches to missing data – Data imputation – Multiple imputation – Classification and Regression Tress (CART) – CART with Cross Validation – Predictions from CART – ROC curve for CART – Random Forests – Building many trees – Parameter selection – K-fold Cross Validation – Recitation – A minimum of 3 data sets for practice | |
Unit-4 |
Teaching Hours:18 |
TEXT ANALYTICS AND NLP
|
|
Using text as data – Text analytics – Natural language processing – Bag of words – Stemming – word clouds – Recitation – min 3 data sets for practice – Time series analysis – Clustering – k-mean clustering – Random forest with clustering – Understanding cluster patterns – Impact of clustering – Heatmaps – Recitation – min 3 data sets for practice | |
Unit-5 |
Teaching Hours:18 |
ENSEMBLE MODELING
|
|
Support Vector Machines – Gradient Boosting – Naive Bayes - Bayesian GLM – GLMNET - Ensemble modeling – Experimenting with all of the above approaches (Units 1-5) with and without data imputation and assessing predictive accuracy – Recitation – min 3 data sets for practice PROJECT – A concluding project work carried out individually for a common data set | |
Text Books And Reference Books: [1].Hands-on programming with R, Garrett Grolemund, O’Reilley, 1st Edition, 2014 [2]. R for everyone, Jared Lander, Pearson, 1st Edition, 2014 | |
Essential Reading / Recommended Reading [1]. Statistics : An Introduction Using R, Michael J. Crawley, WILEY, Second Edition, 2015. | |
Evaluation Pattern CIA - 50% ESE - 50% | |
MDS272A - HADOOP (2019 Batch) | |
Total Teaching Hours for Semester:90 |
No of Lecture Hours/Week:6 |
Max Marks:150 |
Credits:5 |
Course Objectives/Course Description |
|
The subject is intended to give the knowledge of Big Data evolving in every real-time applications and how they are manipulated using the emerging technologies. This course breaks down the walls of complexity in processing Big Data by providing a practical approach to developing Java applications on top of the Hadoop platform. It describes the Hadoop architecture and how to work with the Hadoop Distributed File System (HDFS) and HBase in Ubuntu platform. |
|
Course Outcome |
|
CO1: Understand the Big Data concepts in real time scenario CO2: Understand the big data systems and identify the main sources of Big Data in the real world. CO3: Demonstrate an ability to use Hadoop framework for processing Big Data for Analytics. CO4: Evaluate the Map reduce approach for different domain problems. |
Unit-1 |
Teaching Hours:15 |
INTRODUCTION
|
|
Distributed file system – Big Data and its importance, Four Vs, Drivers for Big data, Big data analytics, Big data applications, Algorithms using map reduce, Matrix-Vector Multiplication by Map Reduce. Apache Hadoop– Moving Data in and out of Hadoop – Understanding inputs and outputs ofMapReduce - Data Serialization, Problems with traditional large-scale systems-Requirements for a new approach-Hadoop – Scaling-Distributed Framework-Hadoop v/s RDBMS-Brief history of Hadoop.
Lab Exercise
1. Installing and Configuring Hadoop | |
Unit-2 |
Teaching Hours:15 |
CONFIGURATIONS OF HADOOP
|
|
Hadoop Processes (NN, SNN, JT, DN, TT)-Temporary directory – UI-Common errors when running Hadoop cluster, solutions. Setting up Hadoop on a local Ubuntu host: Prerequisites, downloading Hadoop, setting up SSH, configuring the pseudo-distributed mode, HDFS directory, NameNode, Examples of MapReduce, Using Elastic MapReduce, Comparison of local versus EMR Hadoop. Understanding MapReduce:Key/value pairs,TheHadoop Java API for MapReduce, Writing MapReduce programs, Hadoop-specific data types, Input/output. Developing MapReduce Programs: Using languages other than Java with Hadoop, Analysing a large dataset. Lab Exercise 1. 1. Word count application in Hadoop. 2. 2. Sorting the data using MapReduce. 3. 3. Finding max and min value in Hadoop. | |
Unit-3 |
Teaching Hours:15 |
ADVANCED MAPREDUCE TECHNIQUES
|
|
Simple, advanced, and in-between Joins, Graph algorithms, using language-independent data structures. Hadoop configuration properties - Setting up a cluster, Cluster access control, managing the NameNode, Managing HDFS, MapReduce management, Scaling. Lab Exercise: 1. Implementation of decision tree algorithms using MapReduce. 2. Implementation of K-means Clustering using MapReduce. 3. Generation of Frequent Itemset using MapReduce. | |
Unit-4 |
Teaching Hours:15 |
HADOOP STREAMING
|
|
Hadoop Streaming - Streaming Command Options - Specifying a Java Class as the Mapper/Reducer - Packaging Files With Job Submissions - Specifying Other Plug-ins for Jobs. Lab Exercise: 1. 1. Count the number of missing and invalid values through joining two large given datasets. 2. 2. Using hadoop’s map-reduce, Evaluating Number of Products Sold in Each Country in the online shopping portal. Dataset is given. 3. 3. Analyze the sentiment for product reviews, this work proposes a MapReduce technique provided by Apache Hadoop. | |
Unit-5 |
Teaching Hours:15 |
HIVE & PIG
|
|
Architecture, Installation, Configuration, Hive vs RDBMS, Tables, DDL & DML, Partitioning & Bucketing, Hive Web Interface, Pig, Use case of Pig, Pig Components, Data Model, Pig Latin. Lab Exercise 1. Trend Analysis based on Access Pattern over Web Logs using Hadoop. 2. Service Rating Prediction by Exploring Social Mobile Users Geographical Locations. | |
Unit-6 |
Teaching Hours:15 |
Hbase
|
|
RDBMS VsNoSQL, HBasics, Installation, Building an online query application – Schema design, Loading Data, Online Queries, Successful service. Hands On: Single Node Hadoop Cluster Set up in any cloud service provider- How to create instance.How to connect that Instance Using putty.InstallingHadoop framework on this instance. Run sample programs which come with Hadoop framework. Lab Exercise: 1. 1. Big Data Analytics Framework Based Simulated Performance and Operational Efficiencies Through Billons of Patient Records in Hospital System. | |
Text Books And Reference Books: [1] Boris lublinsky, Kevin t. Smith, Alexey Yakubovich, Professional Hadoop Solutions, Wiley, 2015. [2] Tom White, Hadoop: The Definitive Guide, O’Reilly Media Inc., 2015. [3] Garry Turkington, Hadoop Beginner's Guide, Packt Publishing, 2013. | |
Essential Reading / Recommended Reading [1] Pethuru Raj, Anupama Raman, DhivyaNagaraj and Siddhartha Duggirala, High-Performance Big-Data Analytics: Computing Systems and Approaches, Springer, 2015. [2] Jonathan R. Owens, Jon Lentz and Brian Femiano, Hadoop Real-World Solutions Cookbook, Packt Publishing, 2013. [3] Tom White, HADOOP: The definitive Guide, O Reilly, 2012. | |
Evaluation Pattern CIA - 50% ESE - 50% | |
MDS272B - IMAGE AND VIDEO ANALYTICS (2019 Batch) | |
Total Teaching Hours for Semester:90 |
No of Lecture Hours/Week:6 |
Max Marks:150 |
Credits:5 |
Course Objectives/Course Description |
|
This course will provide a basic foundation towards digital image processing and video analysis. This course will also provide brief introduction about various Object Detection, Recognition, Segmentation and Compression methods which will help the students to demonstrate real-time image and video analytics applications. |
|
Course Outcome |
|
CO1: Understand the fundamental principles of image and video analysis CO2: Apply the image and video analysis approaches to solve real world problems |
Unit-1 |
Teaching Hours:15 |
INTRODUCTION TO DIGITAL IMAGE AND VIDEO PROCESSING
|
|
Digital image representation, Sampling and Quantization, Types of Images, Basic Relations between Pixels - Neighbors, Connectivity, Distance Measures between pixels, Linear and Non Linear Operations, Introduction to Digital Video, Sampled Video, Video Transmission. Gray-Level Processing: Image Histogram, Linear and Non-linear point operations on Images, Arithmetic Operations between Images, Geometric Image Operations. Binary Image Processing: Image Thresholding, Region labeling, Binary Image Morphology. Lab Exercise: 1. Implement basic gray-scale and binary processing - image histogram, image labeling, image thresholding 2. Implementing Image Database Analysis | |
Unit-2 |
Teaching Hours:15 |
IMAGE AND VIDEO ENHANCEMENT AND RESTORATION
|
|
Spatial domain - Linear and Non-linear Filtering, Morphological filtering, Frequency domain – Homomorphic Filtering, Blotch Detection and Removal - Blotch Detection, Motion Vector Repair and Interpolating Corrupted Intensities, Intensity Flicker Correction - Flicker Parameter Estimation, Brief introduction towards Wavelets, Wavelet based image denoising, Basic methods for image restoration using deconvolution filters. Lab Exercise: 1. Extraction of frames from videos and analyzing frames 2. Implement spatial domain - linear and non-linear filtering | |
Unit-3 |
Teaching Hours:15 |
IMAGE ANALYSIS
|
|
Image Compression: Huffman coding, Run length coding, LZW coding, Lossless Coding, Wavelets based image compression. Lab Exercise 1. 1. Frequency domain – homomorphic filtering on gray scale and color images 2. 2. mplement image restoration methods on images | |
Unit-4 |
Teaching Hours:15 |
VIDEO ANALYSIS
|
|
Video Compression: Basic Concepts and Techniques of Video Coding and the H.264 Standard, MPEG-1 and MPEG-2 Video Standards Lab Exercise: 1. 1. Implement flicker correction on video datasets 2. 2. mplement multi-resolution image decomposition and reconstruction using wavelet | |
Unit-5 |
Teaching Hours:15 |
FEATURE DETECTION AND DESCRIPTION
|
|
Introduction to feature detectors, descriptors, matching and tracking, Basic edge detectors – canny, sobel, prewitt etc., Image Segmentation - Region Based Segmentation – Region Growing and Region Splitting and Merging, Thresholding – Basic global thresholding, optimum global thresholding using Otsu’s Method. Lab Exercise: 1. Implement image compression using wavelets 2. Implement image segmentation using thresholding | |
Unit-6 |
Teaching Hours:15 |
OBJECT DETECTION AND RECOGNITION
|
|
Object detection and recognition in image and video, basic texture descriptors –GLCM, LBP and its applications in image and video analysis, object tracking in videos. Lab Exercise 1. Implement Local Binary Pattern texture descriptor | |
Text Books And Reference Books: [1] Alan Bovik, Handbook of Image and Video Processing, Second Edition, Academic Press, 2005. [2] Rafael C. Gonzalez and Richard E. Woods, Digital Image Processing, Third Edition, Pearson Education, 2008. [3] Richard Szeliski, Computer Vision – Algorithms and Applications, Springer, 2011. | |
Essential Reading / Recommended Reading [1] Anil K Jain, Fundamentals of Digital Image Processing, PHI, 2011. [2] Oge Marques, Practical Image and Video Processing Using MatLab, Wiley, 2011. [3] John W. Woods, Multidimensional Signal, Image, Video Processing and Coding, Academic Press, 2006. | |
Evaluation Pattern CIA - 50% ESE - 50% | |
MDS272C - INTERNET OF THINGS (2019 Batch) | |
Total Teaching Hours for Semester:90 |
No of Lecture Hours/Week:6 |
Max Marks:150 |
Credits:5 |
Course Objectives/Course Description |
|
The explosive growth of the “Internet of Things” is changing our world and the rapid growth of IoT components is allowing people to innovate new designs and products at home. Wireless Sensor Networks form the basis of the Internet of Things. To latch on to the applications in the field of IoT of the recent times, this course provides a deeper understanding of the underlying concepts of IoT and Wireless Sensor Networks. |
|
Course Outcome |
|
CO1: Understand the concepts of IoT and IoT enabling technologies CO2: Apply knowledge on IoT programming to develop IoT applications CO3: Identify different issues in wireless ad hoc and sensor networks CO4: Analyse the different sensor network architectures from a design and performance perspective CO5: Understand the layered approach in sensor networks and WSN protocols |
Unit-1 |
Teaching Hours:18 |
INTRODUCTION TO IoT
|
|
Introduction to IoT - Definition and Characteristics, Physical Design Things- Protocols, Logical Design- Functional Blocks, Communication Models- Communication APIs- Introduction to measure the physical quantities, IoT Enabling Technologies - Wireless Sensor Networks, Cloud Computing Big Data Analytics, Communication Protocols- Embedded System- IoT Levels and Deployment Templates. lab Exercise: 1. Introduction to ICs and Sensors. A basic program can be shown which makes use of logic gates ICs for understanding the basics of sensor nodes. Different sensors which find application in IoT projects can be shown, their working explained. 2. Introduction to Arduino/Raspberry Pi. Sample sketches or code can be selected from the Arduino software and executed, making use of different sensors. | |
Unit-2 |
Teaching Hours:18 |
IoT PROGRAMMING
|
|
Introduction to Smart Systems using IoT - IoT Design Methodology- IoT Boards (Rasberry Pi, Arduino) and IDE - Case Study: Weather Monitoring- Logical Design using Python, Data types & Data Structures- Control Flow, Functions- Modules- Packages, File Handling - Date/Time Operations, Classes- Python Packages of Interest for IoT. Lab Exercise 1. Use of sensors to detect the temperature/humidity in a room and having appropriate actions performed such as changing the LED color and turning the speaker on as an alarm and using serial monitor to see these values. 2. A basic parking system making use of multiple IR sensors, Ultrasonic Sensors, LED bulbs, Speakers etc, to identify if a slot is empty or full and using the LED and speakers to alert the user about the availability. | |
Unit-3 |
Teaching Hours:18 |
IoT APPLICATIONS
|
|
Home Automation – Smart Cities- Environment, Energy- Retail, Logistics- Agriculture, Industry- Health and Lifestyle- IoT and M2M. Lab Exercise: 1. An Agricultural System (Greenhouse System) that makes use of sensors like humidity, temperature etc, to identify the current situation of the agricultural area and taking necessary measures such as activating the water spraying motor, the alarm system (to indicate if there is excess heat) etc. 2. Create a basic sound system by making use of knobs, speakers, LED bulbs etc., to mimic the sound produced by a race car, ambulance, siren etc. | |
Unit-4 |
Teaching Hours:18 |
NETWORK OF WIRELESS SENSOR NODES
|
|
Sensing and Sensors - Wireless Sensor Networks, Challenges and Constraints - Applications: Structural Health Monitoring, Traffic Control, Health Care - Node Architecture - Operating system. Lab Exercise: 1. A basic obstacle avoiding robot by making use of Ultrasonic sensors, dc motors, and the chassis kit for robotic car. 2. Making use of GSM for communication in the obstacle avoiding robot. Using sensors such as flame sensors, PIR human motion sensor, IR sensor, LED bulbs etc for better inputs regarding the environment. 3. A garbage level indicator which makes use of IR proximity sensors, WiFi modules etc to detect the rising amount of garbage and sending data to a server and channelling that data to the owner of the module. Can be introduced as the application IoT. If needed, IoT introduction can be done much earlier and the sharing of data can be shown, for better functionality of later projects. | |
Unit-5 |
Teaching Hours:18 |
MAC, ROUTING AND TRANSPORT CONTROL IN WSN
|
|
Introduction – Fundamentals of MAC Protocols – MAC protocols for WSN – Sensor MAC Case Study – Routing Challenges and Design Issues – Routing Strategies – Transport Control Protocols – Transport Protocol Design Issues – Performance of Transport Protocols Lab Exercise: 1. Elderly care: We want to monitor very senior citizens whether they had a sudden fall. If a very senior citizen falls suddenly while walking, due to stroke or slippery ground etc, a notification should be sent out so that he/she can get immediate medical attention. 2. Smart street lights: The street lights should increase or decrease their intensity based on the actual requirements of the amount of light needed at that time of the day. This will save a lot of energy for the municipal corporation. 3. Implement 3-bit Binary Counter using 3 LED Module.
i. For example:
ii. 000 = 0 (all LED should be RED)
iii. 001 = 1 (Two LEDs Should be RED , and one LED should be GREEN)
iv. If Button is pressed in between, Reset the counter and Re-start from 0. Theft prevention system for night: When the room is dark and Board is moved or tilted (say around 90 degree), it should alarm. | |
Text Books And Reference Books: [1] ArshdeepBahga and Vijay Madisetti, Internet of Things: Hands-on Approach, Hyderabad University Press, 2015. [2] KazemSohraby, Daniel Minoli and TaiebZnati, Wireless Sensor Networks: Technology. Protocols and Application, Wiley Publications, 2010. [3] WaltenegusDargie and Christian Poellabauer, Fundamentals of Wireless Sensor Networks: Theory and Practice, AJohn Wiley and Sons Ltd., 2010. | |
Essential Reading / Recommended Reading [1] Edgar Callaway, Wireless Sensor Networks: Architecture and Protocols, Auerbach Publications, 2003. [2] Michael Miller, The Internet of Things, Pearson Education, 2015. [3] Holger Karl and Andreas Willig, Protocols and Architectures for Wireless Sensor Networks, John Wiley & Sons Inc., 2005. [4] ErdalÇayırcıandChunmingRong, Security in Wireless Ad Hoc and Sensor Networks, John Wiley and Sons, 2009. [5] Carlos De MoraisCordeiro and Dharma PrakashAgrawal, Ad Hoc and Sensor Networks: Theory and Applications, World Scientific Publishing, 2011. [6] WaltenegusDargie and Christian Poellabauer, Fundamentals of Wireless Sensor Networks Theory and Practice, John Wiley and Sons, 2010 [7] Adrian Perrig and J. D. Tygar, Secure Broadcast Communication: In Wired and Wireless Networks, Springer, 2006. | |
Evaluation Pattern CIA - 50% ESE - 50% | |
MDS281 - RESEARCH PROBLEM IDENTIFICATION AND DATA COLLECTION (2019 Batch) | |
Total Teaching Hours for Semester:15 |
No of Lecture Hours/Week:1 |
Max Marks:0 |
Credits:0 |
Course Objectives/Course Description |
|
This research inclusive curriculum is designed with two main objectives: 1. Inculcating research culture among the post graduate students. 2.Enhancing employability skills of students by providing necessary research foundation |
|
Course Outcome |
|
CO1: Carry out research work with data collection and result validations CO2: Understand the basics of research data collection and research paper writing. |
Unit-1 |
Teaching Hours:15 |
Research-Problem Identification
|
|
There is only CIA for this course. Students should do a thorough literature review in their research area. They should give a presentation and submit a document containing the following: Introduction to topic, existing scenario and applications (5 marks) Literature review (Minimum 25 references) (15 marks) Existing Model and Methodology (5 marks) Concrete problem statement definition (10 marks) | |
Text Books And Reference Books: - | |
Essential Reading / Recommended Reading - | |
Evaluation Pattern CIA - 50% ESE - 50% |