CHRIST (Deemed to University), BangaloreDEPARTMENT OF STATISTICS_AND _DATA_SCIENCESchool of Business and Management 

Syllabus for

1 Semester  2023  Batch  
Course Code 
Course 
Type 
Hours Per Week 
Credits 
Marks 
MDS131  RESEARCH METHODS IN DATA SCIENCE  Core Courses  5  4  100 
MDS132  PROBABILITY AND DISTRIBUTION THEORY  Core Courses  5  4  100 
MDS133  MATHEMATICAL FOUNDATIONS FOR DATA SCIENCEI  Core Courses  4  3  100 
MDS151  APPLIED EXCEL  Discipline Specific Elective Courses  3  1  50 
MDS161A  PRINCIPLES OF PROGRAMMING  Discipline Specific Elective Courses  3  2  50 
MDS161B  INTRODUCTION TO PROBABILITY AND STATISTICS  Discipline Specific Elective Courses  3  2  50 
MDS161C  LINUX ESSENTIALS  Discipline Specific Elective Courses  3  2  50 
MDS171  PROGRAMMING USING PYTHON  Core Courses  8  5  150 
2 Semester  2023  Batch  
Course Code 
Course 
Type 
Hours Per Week 
Credits 
Marks 
MDS231  DESIGN AND ANALYSIS OF ALGORITHMS  Core Courses  4  3  100 
MDS232  MATHEMATICAL FOUNDATIONS FOR DATA SCIENCEII  Core Courses  3  3  100 
MDS271  DATABASE TECHNOLOGIES  Core Courses  4  4  100 
MDS272  INFERENTIAL STATISTICS USING R  Core Courses  7  4  100 
MDS273  FULL STACK WEB DEVELOPMENT  Core Courses  7  4  100 
3 Semester  2022  Batch  
Course Code 
Course 
Type 
Hours Per Week 
Credits 
Marks 
MDS311  PROGRAMMING FOR DATA SCIENCE IN R  Core Courses  2  2  50 
MDS331  NEURAL NETWORKS AND DEEP LEARNING  Core Courses  4  4  100 
MDS341A  TIME SERIES ANALYSIS AND FORECASTING TECHNIQUES  Discipline Specific Elective Courses  4  4  100 
MDS341B  BAYESIAN INFERENCE  Discipline Specific Elective Courses  4  4  100 
MDS341C  ECONOMETRICS  Discipline Specific Elective Courses  4  4  100 
MDS341D  BIOSTATISTICS  Discipline Specific Elective Courses  4  4  100 
MDS371  CLOUD ANALYTICS  Core Courses  6  5  150 
MDS372  BUSINESS INTELLIGENCE  Core Courses  5  4  4 
MDS373A  NATURAL LANGUAGE PROCESSING  Discipline Specific Elective Courses  6  5  150 
MDS373B  HADOOP  Discipline Specific Elective Courses  6  5  150 
MDS373C  BIO INFORMATICS  Discipline Specific Elective Courses  6  5  150 
MDS373D  EVOLUTIONARY ALGORITHMS  Discipline Specific Elective Courses  6  5  150 
MDS373E  OPTIMIZATION TECHNIQUE  Discipline Specific Elective Courses  6  5  150 
MDS381  SPECIALIZATION PROJECT  Core Courses  4  2  100 
4 Semester  2022  Batch  
Course Code 
Course 
Type 
Hours Per Week 
Credits 
Marks 
MDS481  INDUSTRY PROJECT  Core Courses  2  12  300 
 
Introduction to Program:  
Data Science is popular in all academia, business sectors, and research and development to makeeffective decision in day to day activities. MSc in Data Science is a two year programme with six trimesters. This programme aims to provideopportunity to all candidates to master the skill setsspecific to data science with research bent. The curriculum supports the students to obtain adequateknowledge in theory of data science with hands on experience in relevant domains and tools. Candidategains exposure to research models and industry standard applications in data science through guestlectures,seminars,projects,internships,etc.  
Programme Outcome/Programme Learning Goals/Programme Learning Outcome: PO1: Problem Analysis and Design: Ability to identify analyze and design solutions for data science problems using fundamental principles of mathematics, Statistics, computing sciences, and relevant domain disciplines.PO2: Enhance disciplinary competency and employability: Acquire the skills in handling data science programming tools towards problem solving and solution analysis for domain specific problems. PO3: Societal and Environmental Concern: Utilize the data science theories for societal and environmental concerns. PO4: Professional Ethics: Understand and commit to professional ethics and professional computing practices to enhance research culture and uphold the scientific integrity and objectivity. PO5: Individual and Team work: Function effectively as an individual and as a member or leader in diverse teams and in multidisciplinary environments. PO6: Engage in continuous reflective learning in the context of technology advancement: Understand the evolving data and analysis paradigms and apply the same to solve the real life problems in the fields of data science.  
Assesment Pattern  
CIA  50% ESE  50%  
Examination And Assesments  
Evaluation pattern for full CIA courses:
The “Theory and Practical” Type of courses offered in all UG/PG programmes will be considered as Full CIA courses.
For this type of courses, there is no exclusive Mid Semester Examination and End Semester Examination; instead there shall be a continuous evaluation during the semester as,
CAC – Continuous Assessment Component Assessment components such as Hard copy / Soft copy Assignment, Quiz, Presentation, Video Making, MOOC, Project, Demonstration, Service Learning, etc CAT – Continuous Assessment Test A written / Lab test would be conducted on any working day
The total marks for the full CIA courses would vary based on the number of hours allocated in a week for the respective course. Out of the maximum marks allotted to the respective course, 50% marks will be considered as CIA and remaining 50% as ESE based on the combinations of the evaluation components (CAC and CAT) . 
MDS131  RESEARCH METHODS IN DATA SCIENCE (2023 Batch)  
Total Teaching Hours for Semester:60 
No of Lecture Hours/Week:5 
Max Marks:100 
Credits:4 
Course Objectives/Course Description 

To assist students in planning and carrying out research work in the field of data science. The students are exposed to the basic principles, procedures and techniques of implementing a research project. The course provides a strong foundation for data science and the application area related to it. Students are trained to understand the underlying core concepts and the importance of ethics while handling data and problems in data science. 

Course Outcome 

CO1: Understand the essence of research and the importance of research methods and methodology CO2: Explore the fundamental concepts of data science CO3: Understand various machine learning algorithms used in data science process CO4: Learn to think through the ethics surrounding privacy, data sharing and algorithmic decision making CO5: Create scientific reports according to specified standards 
Unit1 
Teaching Hours:12 
Research Methodology


Introduction: Objectivesof Research, Types of Research,Research Approaches, Significanceof Research, Research Methods versus Methodology. Defining research problem: Selecting the problem, Necessity of defining the problem, Techniques involved in defining a problem, Research Design: Different Research Designs, Basic Principles of Experimental Designs, Developing a Research Plan.  
Unit2 
Teaching Hours:12 
Sampling, Measurement and Scaling Techniques


Sampling: Steps in Sampling Design, Different Types of Sample Designs, Measurement and Scaling: Measurement in Research, Measurement Scales, Technique of Developing Measurement Tools, Scaling, Important Scaling Techniques  
Unit2 
Teaching Hours:12 
Introduction to Data Science


Definition – Big Data and Data Science Hype – Why data science – Getting Past the Hype – The Current Landscape – Who is a Data Scientist?  Data Science Process Overview – Defining goals – Retrieving data – Data preparation – Data exploration – Data modeling – Presentation.  
Unit3 
Teaching Hours:12 
Machine Learning


Machine learning – Modeling Process – Training model – Validating model – Predicting new observations – Supervised learning algorithms–Unsupervised learning algorithms.  
Unit4 
Teaching Hours:12 
Report Writing


Working with Literature: Importance, finding literature, Using the resources, Managing the literature, Keep track of references, Literature review. Scientific Writing and Report Writing: Significance, Steps, Layout, Types, Mechanics and Precautions, Latex: Introduction, Text, Tables, Figures, Equations, Citations, Referencing, and Templates (IEEE style), Paper writing for international journals, Writing scientific report.  
Unit5 
Teaching Hours:12 
Ethics in Research and Data Science


Research ethics, Data Science ethics – Doing good data science – Owners of the data  Valuing different aspects of privacy  Getting informed consent  The Five Cs – Diversity – Inclusion.  
Text Books And Reference Books:
 
Essential Reading / Recommended Reading
 
Evaluation Pattern CIA  50% ESE  50%  
MDS132  PROBABILITY AND DISTRIBUTION THEORY (2023 Batch)  
Total Teaching Hours for Semester:60 
No of Lecture Hours/Week:5 
Max Marks:100 
Credits:4 
Course Objectives/Course Description 

Probability and probability distributions play an essential role in modeling data from the realworld phenomenon. This course will equip students with thorough knowledge in probability and various probability distributions and model reallife data sets with an appropriate probability distribution 

Course Outcome 

CO1: Describe random event and probability of events. CO2: Identify various discrete and continuous distributions and their usage. CO3: Evaluate condition probabilities and conditional expectations. greedy algorithm etc. CO4: Apply Chebychevs inequality to verify the convergence of sequence in probability. 
Unit1 
Teaching Hours:12 
Descriptive Statistics and Probability


Descriptive Statistics and Probability Data – types of variables: numeric vs categorical  measures of central tendency – measures of dispersion  random experiment  sample space and random events – probability  probability axioms  finite sample space with equally likely outcomes  conditional probability  independent events  Baye’s theorem  
Unit2 
Teaching Hours:12 
Probability Distributions for Discrete Data


Probability Distributions for Discrete Data Random variable – data as observed values of a random variable  expectation – moments & moment generating function  mean and variance in terms of moments  discrete sample space and discrete random variable – Bernoulli experiment and Binary variable: Bernoulli and binomial distributions – Count data: Poisson distribution – over dispersion in count data: negative binomial distribution – dependent Bernoulli trails: hypergeometric distribution (mean and variances in terms of mgf).  
Unit3 
Teaching Hours:12 
Probability Distributions For Continuous Data


Probability Distributions For Continuous Data Continuous sample space  Interval data  continuous random variable – uniform distribution  normal distribution (Gaussian distribution) – modeling lifetime data: exponential distribution, gamma distribution, Weibull distribution (Applications in Data science).  
Unit4 
Teaching Hours:12 
Jointly Distributed Random Variables


Jointly Distributed Random Variables Joint distribution of vector random variables – joint moments – covariance – correlation  independent random variables  conditional distribution – conditional expectation  sampling distributions: chisquare, t, F (pdf’s & properties).  
Unit5 
Teaching Hours:12 
Limit Theorems


Limit Theorems Chebychev’s inequality  weak law of large numbers (iid): examples  strong law of large numbers (statement only)  central limit theorems (iid case): examples.  
Text Books And Reference Books: [1] Introduction to the theory of statistics. A.M Mood, F.A Graybill and D.C Boes, Tata McGrawHill, 3rd Edition (Reprint), 2017. [2] Introduction to probability models. Ross, Sheldon M. 12th Edition, Academic Press, 2019. [3] Fundamentals of Applied Mathematics, S.C. Gupta and V.K. Kapoor (New Edition)
 
Essential Reading / Recommended Reading [1] A first course in probability. Ross, Sheldon, 10th Edition. Pearson, 2019. [ [2] An Introduction to Probability and Statistics. V.K Rohatgi and Saleh, 3rd Edition, 2015  
Evaluation Pattern CIA50% ESE50%  
MDS133  MATHEMATICAL FOUNDATIONS FOR DATA SCIENCEI (2023 Batch)  
Total Teaching Hours for Semester:45 
No of Lecture Hours/Week:4 
Max Marks:100 
Credits:3 
Course Objectives/Course Description 

Linear Algebra plays a fundamental role in the theory of Data Science. This course aims at introducing the basic notions of vector spaces and it’s spans and orthogonalization, linear transformation and the use of its matrix bijections in applications to Data Science. 

Course Outcome 

CO1: Understand the properties of Vector spaces CO2: Use the properties of Linear Maps in solving problems on Linear Algebra CO3: Demonstrate proficiency on the topics Eigenvalues, Eigenvectors and Inner Product Spaces CO4: Apply mathematics for some applications in Data Science 
Unit1 
Teaching Hours:9 
INTRODUCTION TO VECTOR SPACES


Vector Spaces: Definition and properties, Subspaces, Sums of Subspaces, Null space , Column space, Direct Sums, Span and Linear Independence, Bases, dimension, rank.  
Unit2 
Teaching Hours:9 
LINEAR TRANSFORMATIONS


Algebra of Linear Transformations, Null spaces and Injectivity, Range and Surjectivity, Fundamental Theorems of Linear Maps CayleyHamilton theorem  Orthonormal basis.  
Unit3 
Teaching Hours:9 
EIGENVALUES AND EIGENVECTORS


Invariant Subspaces, Polynomials applied to Operators – UpperTriangular matrices, Diagonal matrices, Invariant Subspaces on real vector Spaces Eigen values and Eigen vectors – Characteristic equation – Diagonalization  Upper Triangular matrices  Invariant Subspaces on Real Vector Spaces  
Unit4 
Teaching Hours:9 
INNER PRODUCT SPACES


Inner Products and Norms – Orthogonality  Orthogonal Bases – Orthogonal Projections –GramSchmidt process  Least square problems – Applications to Linear models  
Unit5 
Teaching Hours:9 
BASIC MATRIX METHODS FOR APPLICATIONS


Matrix Norms –Singular value decomposition Householder Transformation and QR decomposition Non Negative Matrix Factorization – bidiagonalization  
Text Books And Reference Books: 1. David C. Lay, Steven R. Lay, Judi J. McDonald (2016) Linear algebra and its applications. Pearson. 2. S. Axler, Linear algebra done right, Springer, 2017. 3. Strang, G. (2006) Linear Algebra and its Applications: Thomson Brooks. Cole, Belmont, CA, USA.  
Essential Reading / Recommended Reading 1. E. Davis, Linear algebra and probability for computer science applications, CRC Press, 2012. 2. J. V. Kepner and J. R. Gilbert, Graph algorithms in the language of linear algebra, Society for Industrial and Applied Mathematics, 2011. 3. D. A. Simovici, Linear algebra tools for data mining, World Scientific Publishing, 2012. 4. P. N. Klein, Coding the matrix: linear algebra through applications to computer science, Newtonian Press, 2015.  
Evaluation Pattern CIA  50% ESE  50%  
MDS151  APPLIED EXCEL (2023 Batch)  
Total Teaching Hours for Semester:30 
No of Lecture Hours/Week:3 
Max Marks:50 
Credits:1 
Course Objectives/Course Description 

This course is designed to build logical thinking ability and to provide handson experience in solving statistical models using MS Excel with Problem based learning. To explore and visualize data using excel formulas and data analysis tools. 

Course Outcome 

CO1: Demonstrate the data management using excel features. CO2: Analyze the given problem and solve using Excel. CO3: Infer the building blocks of excel, excel shortcuts, sample data creation. 
Unit1 
Teaching Hours:10 
Layout and Properties


File types  Spreadsheet structure  Menu bar  Quick access toolbar  Mini toolbar  Excel options  Formatting: Format painter  Font  Alignment  Number  Styles  Cells, Clear  Page layout Properties Symbols  Equation  Editing  Link  Filter  Charts  Formula Auditing  Overview of Excel tables and properties  Collecting sample data and arranging in definite format in Excel tables. Lab : 1. Excel Formulas 2. Excel Tables and Properties  
Unit2 
Teaching Hours:10 
Files and Databases


Files Importing data from different sources  Exporting data in different formats Database CO1 ,CO2 Creating database with the imported data  Data tools: text to column  identifying and removing duplicates  using format cell options Lab: 5.Import data 6.Export data 7.Creating database 8.Data tools  
Unit3 
Teaching Hours:10 
Functions


Functions Application of functions  Concatenate  Upper  Lower  Trim  Repeat  Proper  Clean  Substitute  Convert  Left  Right  Mid  Len  Find  Exact  Replace  Text join  Value  Fixed etc. ,CO2, CO3 Lab: 9.Excel functions.  
Text Books And Reference Books: [1] Alexander R, Kuselika R and Walkenbach J, Microsoft Excel 2019 Bible, Wiley India Pvt Ltd, New Delhi, 2018.
 
Essential Reading / Recommended Reading
[1] Paul M, Microsoft Excel 2019 formulas and functions, Pearson Eduction, 2019  
Evaluation Pattern CIA50% ESE50%  
MDS161A  PRINCIPLES OF PROGRAMMING (2023 Batch)  
Total Teaching Hours for Semester:30 
No of Lecture Hours/Week:3 
Max Marks:50 
Credits:2 
Course Objectives/Course Description 

The students shall be able to understand the main principles of programming. The objective also includes indoctrinating the activities of implementation of programming principles. 

Course Outcome 

CO1: Understand the fundamentals of programming languages. CO2: Understand the design paradigms of programming languages. CO3: To examine expressions, subprograms and their parameters. 
Unit1 
Teaching Hours:10 
Introduction to Syntax and Grammar


Introduction, Programming Languages, Syntax, Grammar, Ambiguity, Syntax and Semantics, Data Types (Primitive/Ordinal/Composite data types, Enumeration and subrange types, Arrays and slices, Records, Unions, Pointers and pointer problems).  
Unit2 
Teaching Hours:10 
Constructing Expressions


Expressions, Type conversion, Implicit/Explicit conversion, type systems, expression evaluation, Control Structures, Binding and Types of Binding,Lifetime, Referencing Environment (Visibility, Local/ Nonlocal/ Global variables), Scope (Scope rules, Referencing operations, Static/Dynamic scoping).  
Unit3 
Teaching Hours:10 
Subprograms and Parameters


Subprograms, signature, Types of Parameters, Formal/Actual parameters, Subprogram overloading, Parameter Passing Mechanisms, Aliasing, Eager/Normalorder/Lazy evaluation) , Subprogram Implementation (Activation record, Static/Dynamic chain, Staticchain method, Deep/Shallow access, Subprograms as parameters, Labels as parameters, Generic subprograms, Separate/Independent compilation).  
Text Books And Reference Books: 1. Allen B. Tucker, Robert Noonan, Programming Languages: Principles and Paradigms, Tata McGraw Hill Education, 2006. 2. Bruce J. MacLennan, “Principles of Programming Languages: Design, Evaluation, and Implementation”, Third Edition, Oxford University Press (New York), 1999.  
Essential Reading / Recommended Reading 1. T. W. Pratt, M. V. Zelkowitz, Programming Languages, Design and Implementation, Prentice Hall, Fourth Edition, 2001. 2. Robert Harper, Practical Foundations for Programming Languages, Second Edition, Cambridge University Press, 2016.  
Evaluation Pattern CIA  50% ESE  50%  
MDS161B  INTRODUCTION TO PROBABILITY AND STATISTICS (2023 Batch)  
Total Teaching Hours for Semester:30 
No of Lecture Hours/Week:3 
Max Marks:50 
Credits:2 
Course Objectives/Course Description 

This course is designed to introduce the historical development of statistics, presentation of data, descriptive measures and cultivate statistical thinking among students. This course also introduces the concept of probability. 

Course Outcome 

CO1: Demonstrate, present and visualize data in various forms, statistically. CO2: Understand and apply descriptive statistics. CO3: Evaluation of probabilities for various kinds of random events 
Unit1 
Teaching Hours:8 
ORGANIZATION AND PRESENTATION OF DATA


Origin and development of Statistics  Scope  limitation and misuse of statistics  types of data: primary, secondary, quantitative and qualitative data  Types of Measurements: nominal, ordinal, ratio and scale  discrete and continuous data  Presentation of data by tables  graphical representation of a frequency distribution by histogram and frequency polygon  cumulative frequency distributions (inclusive and exclusive methods).  
Unit2 
Teaching Hours:6 
DESCRIPTIVE STATISTICS I


Measures of location or central tendency: Arithmetic mean  Median  Mode  Geometric mean  Harmonic mean.  
Unit3 
Teaching Hours:6 
DESCRIPTIVE STATISTICS II


Partition values: Quartiles  Deciles and Percentiles  Measures of dispersion: Mean deviation  Quartile deviation  Standard deviation  Coefficient of variation  Moments: measures of skewness  kurtosis  
Unit4 
Teaching Hours:10 
BASICS OF PROBABILITY


Random experiment  sample point and sample space – event  algebra of events  Definition of Probability: classical  empirical and axiomatic approaches to probability  properties of probability  Theorems on probability  conditional probability and independent events  Laws of total probability  Baye’s theorem and its applications.  
Text Books And Reference Books: 1. David C. Lay, Steven R. Lay, Judi J. McDonald (2016) Linear algebra and its applications. Pearson. 2. S. Axler, Linear algebra done right, Springer, 2017. 2. Strang, G. (2006) Linear Algebra and its Applications: Thomson Brooks. Cole, Belmont, CA, USA.  
Essential Reading / Recommended Reading 1. E. Davis, Linear algebra and probability for computer science applications, CRC Press, 2012. 2. J. V. Kepner and J. R. Gilbert, Graph algorithms in the language of linear algebra, Society for Industrial and Applied Mathematics, 2011. 3. D. A. Simovici, Linear algebra tools for data mining, World Scientific Publishing, 2012. 4. P. N. Klein, Coding the matrix: linear algebra through applications to computer science, Newtonian Press, 2015.  
Evaluation Pattern CIA  50% ESE  50%  
MDS161C  LINUX ESSENTIALS (2023 Batch)  
Total Teaching Hours for Semester:30 
No of Lecture Hours/Week:3 
Max Marks:50 
Credits:2 
Course Objectives/Course Description 

This course is designed to introduce Linux working environment to students. This course will enable students to understand the Linux system architecture, File and directory commands and foundations of shell scripting. 

Course Outcome 

CO1: Demonstrate the Basic file, directory commands CO2: Understand the Unix system environment CO3: Apply shell programming concepts to solve given problem 
Unit1 
Teaching Hours:10 
Introduction


Introduction, Salient features, Unix system architecture,Unix Commands, Directory Related Commands, File Related Commands,Disk related Commands,General utilities,Unix File System,Boot inode, super and data block ,in core structure,Directories, conversion of path name to inode, inode to new file,Disk block Allocation  
Unit2 
Teaching Hours:10 
Process Management


Process Management Process state and data structures of a Process,Context of a Process, background processes,User versus Kernel node,Process scheduling commands,. Process scheduling commands,Process terminating and examining commands,Secondary Storage Management: Formatting, making file system, checking disk space, mountable file system, disk partitioning  
Unit3 
Teaching Hours:10 
shell Programming


Shell Programming, Vi Editor,.Shell types, Shell command line processing, Shell script & its features, system and user defined variables, Executing shell scripts expr command Shell Screen Interface, read and echo statement,Shell Script arguments Conditional Control Structures – if statement,Case statement,Looping Control Structure – while,for,Jumping Control Structures – break, continue, exit.
 
Text Books And Reference Books: [1] Linux: The Complete Reference, sixth edition, Richard Petersen, 2017  
Essential Reading / Recommended Reading [1] Linux Pocket Guide, Daniel J. Barrett,3rd edition, O’Reilly  
Evaluation Pattern
CIA 50% ESE 50%  
MDS171  PROGRAMMING USING PYTHON (2023 Batch)  
Total Teaching Hours for Semester:90 
No of Lecture Hours/Week:8 
Max Marks:150 
Credits:5 
Course Objectives/Course Description 

The objective of this course is to provide comprehensive knowledge of python programming paradigms required for Data Science. 

Course Outcome 

CO1: Demonstrate the use of builtin objects of Python. CO2: Demonstrate significant experience with python program development environment CO3: Implement numerical programming, data handling and visualization through NumPy, Pandas and MatplotLib modules. 
Unit1 
Teaching Hours:18 
INTRODUCTION TO PYTHON


Python and Computer Programming  Using Python as a calculator  Python memory management  Structure of Python Program  Branching and Looping  Problem Solving Using Branches and Loops  Lists and Mutability  Functions  Problem Solving Using Lists and Functions. Lab Exercises 1. Demonstrate usage of branching and looping statements 2. Demonstrate Recursive functions 3. Demonstrate Lists  
Unit2 
Teaching Hours:18 
SEQUENCE DATATYPES AND OBJECT ORIENTED PROGRAMMING


Sequences, Mapping and Sets  Dictionaries  Classes: Classes and Instances Inheritance  Exceptional Handling  Module: Built in modules & user defined module  Introduction to Regular Expressions using “re” module Lab Exercises 1.Demonstrate Tuples, Sets and Dictionaries 2. Demonstrate inheritance and exception handling 3. Demonstrate use of “re”  
Unit3 
Teaching Hours:18 
USING NUMPY


Basics of NumPy  Computation on NumPy  Aggregations  Computation on Arrays Comparisons, Masks and Boolean Arrays  Fancy IndexingSorting Arrays  Structured Data: NumPy’s Structured Array. Lab Exercises 1. Demonstrate Aggregation 2. Demonstrate Indexing and Sorting 3. Demonstrate handling of missing data 4. Demonstrate hierarchical indexing  
Unit4 
Teaching Hours:18 
DATA MANIPULATION WITH PANDAS


Introduction to Pandas Objects  Data indexing and Selection  Operating on Data in Pandas  Handling Missing Data  Hierarchical Indexing  Aggregation and Grouping  Pivot Tables  Vectorized String Operations  High Performance Pandas: and query(). Lab Exercises 1. Demonstrate usage of Pivot table 2. Demonstrate use of and query()  
Unit5 
Teaching Hours:18 
VISUALIZATION WITH MATPLOTLIB


Basics of matplotlib  Simple Line Plot and Scatter Plot  Density and Contour Plots  Histograms, Binnings and Density  Customizing Plot Legends  Multiple subplots  Three Dimensional Plotting in Matplotlib. Lab Exercises 1. Demonstrate Line plot and Scatter plat 2. Demonstrate 3D plotting  
Text Books And Reference Books: [1] Jake VanderPlas ,Python Data Science Handbook  Essential Tools for Working with Data, O’Reily Media,Inc, 2016 [2] Zhang. Y, An Introduction to Python and Computer Programming, Springer Publications, 2016  
Essential Reading / Recommended Reading [1] JoelGrus, Data Science from Scratch First Principles with Python, O’Reilly, Media,2016 [2] T.R.Padmanabhan, Programming with Python, Springer Publications, 2016.M. Rajagopalan and P. Dhanavanthan Statistical Inference1st ed.  PHI Learning (P) Ltd. New Delhi 2012. [3] V. K. Rohatgi and E. Saleh An Introduction to Probability and Statistics 3rd ed. John Wiley & Sons Inc New Jersey 2015.  
Evaluation Pattern CIA 50% ESE 50%  
MDS231  DESIGN AND ANALYSIS OF ALGORITHMS (2023 Batch)  
Total Teaching Hours for Semester:45 
No of Lecture Hours/Week:4 
Max Marks:100 
Credits:3 
Course Objectives/Course Description 

The course introduces techniques for designing and analyzing algorithms and data structures. It concentrates on techniques for evaluating the performance of algorithms. The objective is to understand different designing approaches like greedy, divide and conquer, dynamic programming etc. for solving different kinds of problems. 

Course Outcome 

CO1: Understand basic techniques for designing algorithms, including the techniques of recursion, divideandconquer, greedy algorithm etc. CO2: Understand the mathematical criterion for deciding whether an algorithm is efficient and know many practically important problems that do not admit any efficient algorithms. CO3: Apply classical sorting, searching, optimization and graph algorithms. CO4: Design new algorithms and analyze their asymptotic and absolute runtime and memory demands. 
Unit1 
Teaching Hours:9 
Introduction


Algorithms, Analyzing algorithms, Complexity of algorithms, Growth of functions, Performance measurements, Sorting and order Statistics  Shell sort, Heap sort, Sorting in linear time.  
Unit2 
Teaching Hours:9 
Advanced Data Structures


RedBlack trees, B – trees, Binomial Heaps, Fibonacci Heaps, Tries, skip list.  
Unit3 
Teaching Hours:9 
Divide and Conquer


Quick sort, Merge sort, Finding maximum and minimum,Matrix Multiplication, Searching.
Greedy methods with examples such as Optimal Reliability Allocation, Knapsack, Minimum Spanning trees – Prim’s and Kruskal’s algorithms, Single source shortest paths  Dijkstra’s and Bellman Ford algorithms.Optimal merge patterns.  
Unit4 
Teaching Hours:9 
Dynamic Programming


Dynamic programming with examples such as Knapsack, All pair shortest paths – Warshal’s and Floyd’s algorithms, Resource allocation problem. Backtracking, Branch and Bound with examples such as Travelling Salesman Problem, Graph Coloring, nQueen Problem, Hamiltonian Cycles and Sum of subsets  
Unit5 
Teaching Hours:9 
Unit V


Algebraic Computation, Fast Fourier Transform, String Matching, Theory of NPcompleteness, Approximation algorithms and Randomized algorithms.  
Text Books And Reference Books: [1] Coreman, Rivest, Lisserson, “An Introduction to Algorithm”, PHI, 2001
[2] Horowitz & SAHANI,” Fundamental of computer Algoritm”, Galgotia Publications, 2nd Edition.  
Essential Reading / Recommended Reading [1] Aho, Hopcraft, Ullman, “The Design and Analysis of Computer Algorithms” Pearson Ed9ucation, 2008.
 
Evaluation Pattern
CIA 50% ESE 50%  
MDS232  MATHEMATICAL FOUNDATIONS FOR DATA SCIENCEII (2023 Batch)  
Total Teaching Hours for Semester:45 
No of Lecture Hours/Week:3 
Max Marks:100 
Credits:3 
Course Objectives/Course Description 

This course aims at introducing data science related essential mathematics concepts such as fundamentals of topics on Calculus of several variables, Orthogonality, Convex optimization, and Graph Theory. 

Course Outcome 

CO1: Demonstrate the properties of multivariate calculus CO2: Use the idea of orthogonality and projections effectively CO3: Have a clear understanding of Convex Optimization CO4: Know the about the basic terminologies and properties in Graph Theory 
Unit1 
Teaching Hours:9 
Calculus of Several Variables


Functions of Several Variables: Functions of two, three variables  Limits and continuity in Higher Dimensions: Limits for functions of two variables, Functions of more than two variables  Partial Derivatives: partial derivative of functions of two variables, partial derivatives of functions of more than two variables  The Chain Rule: chain rule on functions of two, three variables, chain rule on functions defined on surfaces  
Unit2 
Teaching Hours:9 
Orthogonality


Perpendicular vectors and Orthogonality  Inner Products and Projections onto lines  Projections of Rank one  Projections and Least Squares Approximations  Projection Matrices  Orthogonal Bases, Orthogonal Matrices.  
Unit3 
Teaching Hours:9 
Introduction to Convex Optimization


Affine and Convex Sets: Lines and Line segments, affine sets, affine dimension andrelative interior, convexsets, cones  Hyperplanes and halfspaces  Euclidean balls and ellipsoids Norm balls and Norm cones – polyhedral.  
Unit4 
Teaching Hours:9 
Graph Theory  Basics


Graph Classes: Definition of a Graph and Graph terminology, isomorphism of graphs, Completegraphs, bipartite graphs, complete bipartite graphsVertex degree: adjacency and incidence, regular graphs  subgraphs, spanning subgraphs, induced subgraphs, removing or adding edges of a graph, removing vertices from graphs.  
Unit5 
Teaching Hours:9 
Graph Theory  More concepts


Matrix Representation of Graphs, Adjacency matrices, Incidence Matrices, Trees and its properties, Bridges (cutedges), spanning trees, weighted Graphs, minimal spanning tree problems, Shortest path problems  Applications of Graph Theory  
Text Books And Reference Books: 1] M D. Weir, J. Hass, and G. B. Thomas, Thomas' calculus. Pearson, 2016. (Unit 1) [2] G Strang, Linear Algebra and its Applications, 4th ed., Cengage, 2006. (Unit 2) [3] S. P. Boyd and L.Vandenberghe, Convex optimization.Cambridge Univ. Pr., 2011.(Unit 3) [4] J Clark, D A Holton, A first look at Graph Theory, Allied Publishers India, 1995. (Unit 4)  
Essential Reading / Recommended Reading [1] J. Patterson and A. Gibson, Deep learning: a practitioner's approach. O'Reilly Media, 2017 [2] S. Sra, S. Nowozin, and S. J. Wright, Optimization for machine learning. MIT Press, 2012 [3] D. Jungnickel, Graphs, networks and algorithms. Springer, 2014 [4] D Samovici, Mathematical Analysis for Machine Learning and Data Mining, World Scientific Publishing Co. Pte. Ltd, 2018 [5] P. N. Klein, Coding the matrix: linear algebra through applications to computer science. Newtonian Press, 2015 [6] K H Rosen, Discrete Mathematics and its applications, 7th ed., McGraw Hill, 2016  
Evaluation Pattern CIA 50% , ESE 50%  
MDS271  DATABASE TECHNOLOGIES (2023 Batch)  
Total Teaching Hours for Semester:75 
No of Lecture Hours/Week:4 
Max Marks:100 
Credits:4 
Course Objectives/Course Description 

The main objective of this course is to fundamental knowledge and practical experience with, database concepts. It includes the concepts and terminologies which facilitate the construction of relational databases, writing effective queries comprehend data warehouse and NoSQL databases and its types 

Course Outcome 

CO1: Demonstrate various databases and compose effective queries CO2: Understanding the process of OLAP system construction CO3: Develop applications using Relational and NoSQL databases 
Unit1 
Teaching Hours:15 
Introduction


Concept & Overview of DBMS, Data Models, Database Languages, Database Administrator, Database Users, Three Schema architecture of DBMS. Basic concepts, Design Issues, Mapping Constraints, Keys, EntityRelationship Diagram, Weak Entity Sets, Extended ER features.
Lab Exercises 1. Data Definition,
2. Table Creation
 
Unit2 
Teaching Hours:12 
Relational model and database design


SQL and Integrity Constraints, Concept of DDL, DML, DCL. Basic Structure, Set operations, Aggregate Functions, Null Values, Domain Constraints, Referential Integrity Constraints, assertions, views, Nested Subqueries, Functional Dependency, Different anomalies in designing a Database, Normalization: using functional dependencies, BoyceCodd Normal Form. Lab Exercises1. Insert, Select, Update & Delete Commands 2. Nested Queries & Join Queries
3. Views  
Unit3 
Teaching Hours:13 
Data warehouse: the building blocks


Defining Features, Database and Data Warehouses, Architectural Types, Overview of the Components, Metadata in the Data warehouse, The Star Schema, Star Schema Keys, Advantages of the Star Schema, Star Schema: Examples, Snowflake Schema, Aggregate Fact Tables.
Lab Exercises 1. Importing source data structures 2. Design Target Data Structures
3. Create target multidimensional cube
 
Unit4 
Teaching Hours:12 
Data Integration and Data Flow (ETL)


Requirements, ETL Data Structures, Extracting, Cleaning and Conforming, Delivering Dimension Tables, Delivering Fact Tables, RealTime ETL Systems
Lab Exercises
1. Perform the ETL process and transform into data map 2. Create the cube and process it 3. Generating Reports 4. Creating the Pivot table and pivot chart using some existing data
 
Unit5 
Teaching Hours:12 
NOSQL Databases


NOSQL DatabasesIntroduction to NOSQL Systems, The CAP Theorem, DocumentBased NOSQL Systems and MongoDB, NOSQL KeyValue Stores, ColumnBased or Wide Column NOSQL Systems, Graph databases, Multimedia databases. Lab Exercises 1. MongoDB Exercise  1
2. MongoDB Exercise  2  
Text Books And Reference Books: [1]Henry F. Korth and Silberschatz Abraham, “Database System Concepts”, Mc.Graw Hill. [2] Thomas Cannolly and Carolyn Begg, “Database Systems, A Practical Approach to Design, Implementation and Management”, Third Edition, Pearson Education, 2007. [3] The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling, 2nd John
Wiley & Sons, Inc. New York, USA, 2002  
Essential Reading / Recommended Reading [1] LiorRokach and OdedMaimon, Data Mining and Knowledge Discovery Handbook,
Springer, 2nd edition, 2010.  
Evaluation Pattern
EVALUATION PATTERN CIA 50% ESE 50%  
MDS272  INFERENTIAL STATISTICS USING R (2023 Batch)  
Total Teaching Hours for Semester:75 
No of Lecture Hours/Week:7 
Max Marks:100 
Credits:4 
Course Objectives/Course Description 

Statistical inference plays an important role when analyzing data and making decisions based on realworld phenomena. This course aims to teach students to test hypotheses and estimate parameters for real life data sets.


Course Outcome 

CO1: Demonstrate the concepts of population and samples CO2: Apply the idea of sampling distribution of different statistics in testing of hypothesis CO3: Estimate the unknown population parameters using the concepts of point and interval estimations using R. CO4: Test the hypothesis using nonparametric tests for real world problems using R. 
Unit1 
Teaching Hours:15 
INTRODUCTION


Population and Statistics – Finite and Infinite population – Parameter and Statistics – Types of sampling  Sampling Distribution – Sampling Error  Standard Error – Test of significance –concept of hypothesis – types of hypothesis – Errors in hypothesistesting – Critical region – level of significance  Power of the test – pvalue. Lab Exercises: 1. Calculation of sampling error and standard error
2. Calculation of probability of critical region using standard distributions
3. Calculation of power of the test using standard distributions.
 
Unit2 
Teaching Hours:15 
Testing of Hypothesis I


Concept of large and small samples – Tests concerning a single population mean for known σ (and unknown σ) – equality of two means for known σ (and unknown σ) – Test for Single variance  Test for equality of two variance for normal population – Tests for single proportion – Tests of equality of two proportions for the normal population. Lab Exercises: 1. Test of the single sample mean for known and unknown σ.
2. Test of equality of two means when known and unknown σ.
3. Tests of single variance and equality of variance for large samples.
4. Tests for single proportion and equality of two proportion for large samples.
 
Unit3 
Teaching Hours:15 
Testing of Hypothesis II


Students tdistribution and its properties (without proofs) – Single sample mean test – Independent sample mean test – Paired sample mean test – Tests of proportion (based on t distribution) – F distribution and its properties (without proofs) – Tests of equality of two variances using Ftest – Chisquare distribution and its properties (without proofs) – chisquare test for independence of attributes – Chisquare test for goodness of fit. Lab Exercises: 1. Single sample mean test 2. Independent and Paired sample mean test 3. Tests of proportion of one and two samples based on tdistribution
 
Unit4 
Teaching Hours:15 
Analysis of Variance


Meaning and assumptions  Fixed, random and mixed effect models  Analysis of variance of oneway and twoway classified data with and without interaction effects – Multiple comparison tests: Tukey’s method  critical difference. 1. Test of equality of two variances 2. Chisquare test for independence of attributes and goodness of fit. 3. Construction of oneway ANOVA 4. Construction of twoway ANOVA with interaction 5. Construction of twoway ANOVA without interaction
 
Unit5 
Teaching Hours:15 
Nonparametric Tests


Concept of Nonparametric tests  Run test for randomness  Sign test and Wilcoxon Signed Rank Test for one and paired samples  Run test  Median test and MannWhitneyWilcoxon tests for two samples. Lab Exercises:
1. Multiple comparision test using Tukey’s method and critical difference methods 2. Test of one sample using Run and sign tests 3. Test of paried sample using Wilcoxon signed rank test 4. Test of two samples using Run test and Median test  
Text Books And Reference Books: 1. Gupta S.C and Kapoor V.K, Fundamentals of Mathematical Statistics, 12th edition, Sultan Chand & Sons, New Delhi, 2020. 2. Brian Caffo, Statistical Inference for Data Science, Learnpub, 2016.  
Essential Reading / Recommended Reading 1. Walpole R.E, Myers R.H and Myers S.L, Probability and Statistics for Engineers and Scientists, 9th edition, Pearson, New Delhi, 2017. 2. Montgomery, D. C., & Runger, G. C. (2010). Applied statistics and probability for engineers. John wiley & sons. 3. Rajagopalan M and Dhanavanthan P, Statistical Inference, PHI Learning (P) Ltd, New Delhi, 2012. 4. Rohatgi V.K and Saleh E, An Introduction to Probability and Statistics, 3rd edition, JohnWiley & Sons Inc, New Jersey, 2015.  
Evaluation Pattern CIA  50% ESE  50%  
MDS273  FULL STACK WEB DEVELOPMENT (2023 Batch)  
Total Teaching Hours for Semester:75 
No of Lecture Hours/Week:7 
Max Marks:100 
Credits:4 
Course Objectives/Course Description 

On completion of this course, a student will be familiar with full stack and able to develop a web application using advanced technologies and cultivate good web programming style and discipline by solving the real world scenarios. 

Course Outcome 

CO1: Apply JavaScript, HTML5, and CSS3 effectively to create interactive and dynamic websites. CO2: Describe the main technologies and methods currently used in creating advanced web applications. CO3: Design websites using appropriate security principles, focusing specifically on the vulnerabilities inherent in common web implementations. CO4: Create modern web applications using MEAN. 
Unit1 
Teaching Hours:15 
OVERVIEW OF WEB TECHNOLOGIES AND HTML5


Internet and web Technologies Client/Server model Web Search EngineWeb CrawlingWeb IndexingSearch Engine Optimization and LimitationsWeb Services –Collective Intelligence –Mobile Web –Features of Web 3.0HTML vs HTML5Exploring Editors and Browsers Supported by HTML5New ElementsHTML5 SemanticsCanvasHTML Media Lab Exercises
1. Develop static pages for a given scenario using HTML 2. Creating Web Animation with audio using HTML5 & CSS3 3. Demonstrate Geolocation and Canvas using HTML5
 
Unit2 
Teaching Hours:15 
XML AND AJAX


XMLDocuments and VocabulariesVersions and Declaration Namespaces JavaScript and XML: AjaxDOM based XML processing EventTransforming XML DocumentsSelecting XML Data:XPATHTemplate based Transformations: XSLTDisplaying XML Documents in Browsers  Evolution of AJAX Web applications with AJAX AJAX Framework Lab Exercises 1. Write an XML file and validate the file using XSD 2. Demonstrate XSL with XSD 3. Demonstrate DOM parser
 
Unit3 
Teaching Hours:15 
CLIENT SIDE SCRIPTING


JavaScript Implementation  Use Javascript to interact with some of the new HTML5 apis Create and modify Javascript objects JS Forms  Events and Event handlingJS NavigatorJS CookiesIntroduction to JSONJSON vs XMLJSON ObjectsImportance of Angular JS in webAngular Expression and DirectivesSingle Page Application Lab Exercises 1.Write a JavaScript program to demonstrate Form Validation and Event Handling 2.Create a web application using AngularJS with Forms
 
Unit4 
Teaching Hours:15 
SERVER SIDE SCRIPTING


Introduction to Node.jsREPL TerminalPackage Manager(NPM)Node.js Modules and filesystemNode.js EventsDebugging Node JS ApplicationFile System and streamsTesting Node JS with jasmine Lab Exercises 1.Implement a single page web application using Angular JS CRUD Operation using AngularJS 2.Implement web application using AJAX with JSON 3.Demonstrate to fetch the information from an XML file with AJAX
 
Unit5 
Teaching Hours:15 
NODE JS WITH MYSQL


Introduction to MySQL Performing basic database operation(DML) (Insert, Delete, Update, Select)Prepared Statement Uploading Image or File to MySQL Retrieve Image or File from MySQL Lab Exercises 1.Demonstrate Node.js file system module 2.Implement Mysql with Node.JS 3.Implement CRUD Operation using MongoDB
 
Text Books And Reference Books: [1] Internet and World Wide Web:How to Program, Paul Deitel , Harvey Deitel & Abbey Deitel, Pearson Education, 5th Edition, 2018. [2] HTML 5 Black Book (Covers CSS3, JavaScript, XML, XHTML, AJAX, PHP, jQuery), DT Editorial Services, Dreamtech Press, 2nd Edition, 2016.  
Essential Reading / Recommended Reading [1] Chris Northwood, The Full Stack Developer: Your Essential Guide to the Everyday Skills Expected of a Modern Full Stack Web Developer, Apress Publications, 1st Edition, 2018. [2] Laura Lemay, Rafe Colburn & Jennifer Kyrnin, Mastering HTML, CSS & Javascript Web Publishing, BPB Publications, 1st Edition, 2016. [3] Alex Giamas, Mastering MongoDB 3.x, Packt Publishing Limited, First Edition, 2017.
Web Resources:
[2] http://www.php.net/docs.php
 
Evaluation Pattern CIA  50% ESE 50%  
MDS311  PROGRAMMING FOR DATA SCIENCE IN R (2022 Batch)  
Total Teaching Hours for Semester:30 
No of Lecture Hours/Week:2 
Max Marks:50 
Credits:2 
Course Objectives/Course Description 

This lab is designed to introduce implementation of practical machine learning algorithms using R programming language. The lab will extensively use datasets from real life situations. 

Course Outcome 

Unit1 
Teaching Hours:6 
R INSTALLTION, SETUP AND LINEAR REGRESSION


Download and install R – R IDE environments – Why R – Getting started with R – Vectors and Data Frames – Loading Data Frames – Data analysis with summary statistics and scatter plots – Summary tables  Working with Script Files Linear Regression – Introduction – Regression model for one variable regression – Selecting best model – Error measures SSE, SST, RMSE, R2 – Interpreting R2 – Multiple linear regression – Lasso and ridge regression – Correlation – Recitation – A minimum of 3 data sets for practice  
Unit2 
Teaching Hours:6 
LOGISTIC REGRESSION


Logistic Regression – The Logit – Confusion matrix – sensitivity, specificity – ROC curve – Threshold selection with ROC curve – Making predictions – Area under the ROC curve (AUC)  Recitation – A minimum of 3 data sets for practice  
Unit3 
Teaching Hours:6 
DECISION TREES


Approaches to missing data – Data imputation – Multiple imputation – Classification and Regression Tress (CART) – CART with Cross Validation – Predictions from CART – ROC curve for CART – Random Forests – Building many trees – Parameter selection – Kfold Cross Validation – Recitation – A minimum of 3 data sets for practice  
Unit4 
Teaching Hours:6 
TEXT ANALYTICS AND NLP


Using text as data – Text analytics – Natural language processing – Bag of words – Stemming – word clouds – Recitation – min 3 data sets for practice – Time series analysis – Clustering – kmean clustering – Random forest with clustering – Understanding cluster patterns – Impact of clustering – Heatmaps – Recitation – min 3 data sets for practice  
Unit5 
Teaching Hours:6 
ENSEMBLE MODELLING


Support Vector Machines – Gradient Boosting – Naive Bayes  Bayesian GLM – GLMNET  Ensemble modeling – Experimenting with all of the above approaches (Units 15) with and without data imputation and assessing predictive accuracy – Recitation – min 3 data sets for practice PROJECT – A concluding project work carried out individually for a common data set  
Text Books And Reference Books: [1]. Statistics : An Introduction Using R, Michael J. Crawley, WILEY, Second Edition, 2015.
 
Essential Reading / Recommended Reading [1].Handson programming with R, Garrett Grolemund, O’Reilley, 1st Edition, 2014 [2]. R for everyone, Jared Lander, Pearson, 1st Edition, 2014  
Evaluation Pattern CIA 50% ESE 50%  
MDS331  NEURAL NETWORKS AND DEEP LEARNING (2022 Batch)  
Total Teaching Hours for Semester:60 
No of Lecture Hours/Week:4 
Max Marks:100 
Credits:4 
Course Objectives/Course Description 

The main aim of this course is to provide fundamental knowledge of neural networks and deep learning. On successful completion of the course, students will acquire fundamental knowledge of neural networks and deep learning, such as Basics of neural networks, shallow neural networks, deep neural networks, forward & backward propagation process and build various research projects. 

Course Outcome 

CO1: Understand the major technology trends in neural networks and deep learning. CO2: Build, train and apply neural networks and fully connected deep neural networks CO3: Implement efficient (vectorized) neural networks for real time application. 
Unit1 
Teaching Hours:12 
INTRODUCTION TO ARTIFICIAL NEURAL NETWORKS


Neural NetworksApplication Scope of Neural Networks Fundamental Concept of ANN: The Artificial Neural NetworkBiological Neural NetworkComparison between Biological Neuron and Artificial NeuronEvolution of Neural Network. Basic models of ANNLearning MethodsActivation FunctionsImportance Terminologies of ANN.  
Unit2 
Teaching Hours:12 
SUPERVISED LEARNING NETWORK


Shallow neural networks Perceptron NetworksTheoryPerceptron Learning RuleArchitectureFlowchart for training ProcessPerceptron Training Algorithm for Single and Multiple Output Classes. Back Propagation Network TheoryArchitectureFlowchart for training processTraining AlgorithmLearning Factors for BackPropagation Network. Radial Basis Function Network RBFN: Theory, Architecture, Flowchart and Algorithm.  
Unit3 
Teaching Hours:12 
CONVOLUTIONAL NEURAL NETWORK


Introduction  Components of CNN Architecture  Rectified Linear Unit (ReLU) Layer  Exponential Linear Unit (ELU, or SELU)  Unique Properties of CNN Architectures of CNN Applications of CNN.  
Unit4 
Teaching Hours:12 
RECURRENT NEURAL NETWORK


Introduction The Architecture of Recurrent Neural Network The Challenges of Training Recurrent Networks EchoState Networks Long ShortTerm Memory (LSTM)  Applications of RNN.  
Unit5 
Teaching Hours:12 
AUTO ENCODER AND RESTRICTED BOLTZMANN MACHINE


Introduction  Features of Auto encoder Types of Autoencoder Restricted Boltzmann MachineBoltzmann Machine  RBM Architecture Example  Types of RBM.  
Text Books And Reference Books: 1. S.N.Sivanandam, S. N. Deepa, Principles of Soft Computing, WileyIndia, 3rd Edition, 2018. 2. Dr. S Lovelyn Rose, Dr. L Ashok Kumar, Dr. D Karthika Renuka, Deep Learning Using Python, WileyIndia, 1st Edition, 2019.  
Essential Reading / Recommended Reading 1. Charu C. Aggarwal, Neural Networks and Deep Learning, Springer, September 2018. 2. Francois Chollet, Deep Learning with Python, Manning Publications; 1st edition, 2017 3. John D. Kelleher, Deep Learning (MIT Press Essential Knowledge series), The MIT Press, 2019.  
Evaluation Pattern CIA 50% ESE 50%  
MDS341A  TIME SERIES ANALYSIS AND FORECASTING TECHNIQUES (2022 Batch)  
Total Teaching Hours for Semester:60 
No of Lecture Hours/Week:4 
Max Marks:100 
Credits:4 
Course Objectives/Course Description 

This course covers applied statistical methods pertaining to time series and forecasting techniques. Moving average models like simple, weighted and exponential are dealt with. Stationary time series models and nonstationary time series models like AR, MA, ARMA and ARIMA are introduced to analyse time series data. 

Course Outcome 

Unit1 
Teaching Hours:12 
UNIT 1


INTRODUCTION TO TIME SERIES AND STOCHASTIC PROCESS Introduction to time series and stochastic process, graphical representation, components and classical decomposition of time series data.Autocovariance and autocorrelation functions, Exploratory time series analysis, Test for trend and seasonality, Smoothing techniques such as Exponential and moving average smoothing, Holt Winter smoothing, Forecasting based on smoothing.  
Unit2 
Teaching Hours:12 
Unit 2


STATIONARY TIME SERIES MODELS Wold representation of linear stationary processes, Study of linear time series models: Autoregressive, Moving Average and Autoregressive Moving average models and their statistical properties like ACF and PACF function.  
Unit3 
Teaching Hours:12 
Unit 3


ESTIMATION OF ARMA MODELS Estimation of ARMA models: Yule Walker estimation of AR Processes, Maximum likelihood and least squares estimation for ARMA Processes, Residual analysis and diagnostic checking.  
Unit4 
Teaching Hours:12 
Unit 4


NONSTATIONARY TIME SERIES MODELS Concept of nonstationarity, general unit root tests for testing nonstationarity; basic formulation of the ARIMA Model and their statistical propertiesACF and PACF;  
Unit5 
Teaching Hours:12 
Unit 5


STATE SPACE MODELS Filtering, smoothing and forecasting using state space models, Kalman smoother, Maximum likelihood estimation, Missing data modifications  
Text Books And Reference Books: 1. George E. P. Box, G.M. Jenkins, G.C. Reinsel and G. M. Ljung, Time Series analysis Forecasting and Control, 5th Edition, John Wiley & Sons, Inc., New Jersey, 2016. 2. Montgomery D.C, Jennigs C. L and Kulachi M,Introduction to Time Series analysis and Forecasting, 2nd Edition,John Wiley & Sons, Inc., New Jersey, 2016.  
Essential Reading / Recommended Reading 1. Anderson T.W,Statistical Analysis of Time Series, John Wiley& Sons, Inc., New Jersey, 1971. 2. Shumway R.H and Stoffer D.S, Time Series Analysis and its Applications with R Examples, Springer, 2011. 3. P. J. Brockwell and R. A. Davis, Times series: Theory and Methods, 2nd Edition, SpringerVerlag, 2009. 4. S.C. Gupta and V.K. Kapoor, Fundamentals of Applied Statistics, 4th Edition, Sultan Chand and Sons, 2008.  
Evaluation Pattern CIA: 50% ESE: 50%  
MDS341B  BAYESIAN INFERENCE (2022 Batch)  
Total Teaching Hours for Semester:60 
No of Lecture Hours/Week:4 
Max Marks:100 
Credits:4 
Course Objectives/Course Description 

To equip the students with the knowledge of conceptual, computational, and practical methods of Bayesian data analysis. 

Course Outcome 

Unit1 
Teaching Hours:12 
INTRODUCTION


Basics on minimaxity: subjective and frequents probability, Bayesian inference, Bayesian estimation , prior distributions, posterior distribution, loss function, principle of minimum expected posterior loss, quadratic and other common loss functions, Advantages of being a Bayesian HPD confidence intervals, testing, credible intervals, prediction of a future observation.  
Unit2 
Teaching Hours:12 
BAYESIAN ANALYSIS WITH PRIOR INFORMATION


Robustness and sensitivity, classes of priors, conjugate class, neighbourhood class, density ratio class different methods of objective priors: Jeffrey’s prior, probability matching prior, conjugate priors and mixtures, posterior robustness: measures and techniques  
Unit3 
Teaching Hours:12 
MULTIPARAMETER AND MULTIVARIABLE MODELS


Basics of decision theory, multiparameter models, Multivariate models, linear regression, asymptotic approximation to posterior distributions.  
Unit4 
Teaching Hours:12 
MODEL SELECTION AND HYPOTHESIS TESTING


Selection criteria and testing of hypothesis based on objective probabilities and Bayes’ factors, large sample methods: limit of posterior distribution, consistency of posterior distribution, asymptotic normality of posterior distribution.  
Unit5 
Teaching Hours:12 
BAYESIAN COMPUTATIONS


Analytic approximation, E M Algorithm, Monte Carlo sampling, Markov Chain Monte Carlo Methods, Metropolis – Hastings Algorithm, Gibbs sampling, examples, convergence issues  
Text Books And Reference Books: 1. Albert Jim (2009) Bayesian Computation with R, second edition, Springer, New York 2. Bolstad W. M. and Curran, J.M. (2016) Introduction to Bayesian Statistics 3rd Ed. Wiley, New York 3. Christensen R. Johnson, W. Branscum A. and Hanson T.E. (2011) Bayesian Ideas and data analysis : A introduction for scientist and Statisticians, Chapman and Hall, LondonA. Gelman, J.B. Carlin, H.S. Stern and D.B. Rubin (2004). Bayesian Data Analysis,2nd Ed. Chapman & Hall  
Essential Reading / Recommended Reading 1. Congdon P. (2006) Bayesian Statistical Modeling, Wiley, New York. 2. Ghosh, J.K. Delampady M. and T. Samantha (2006). An Introduction to Bayesian Analysis: Theory and Methods, Springer, New York. 3. Lee P.M. (2012) Bayesian Statistics: An Introduction4th Ed. Hodder Arnold, New York. 4. Rao C.R. Day D. (2006) Bayesian Thinking, Modeling and Computation, Handbook of Statistics, Vol.25.  
Evaluation Pattern CIA 50% ESE 50%  
MDS341C  ECONOMETRICS (2022 Batch)  
Total Teaching Hours for Semester:60 
No of Lecture Hours/Week:4 
Max Marks:100 
Credits:4 
Course Objectives/Course Description 

The course is designed to impart the learning of principles of econometric methods and tools. This is expected to improve student’s ability to understand of econometrics in the study of economics and finance. The learning objective of the course is to provide students to get the basic knowledge and skills of econometric analysis, so that they should be able to apply it to the investigation of economic relationships and processes, and also understand the econometric methods, approaches, ideas, results and conclusions met in the majority of economic books and articles. Introduce the students to the traditional econometric methods developed mostly for the work with cross sections data. 

Course Outcome 

Unit1 
Teaching Hours:15 
INTRODUCTION


Introduction to Econometrics Meaning and Scope – Methodology of Econometrics – Nature and Sources of Data for Econometric analysis – Types of Econometrics  
Unit2 
Teaching Hours:15 
CORRELATION


Aitken’s Generalised Least Squares(GLS) Estimator, Heteroscedasticity, Autocorrelation, Multicollinearity, AutoCorrelation, Test of Autocorrelation, Multicollinearity, Tools for Handling Multicollinearity  
Unit3 
Teaching Hours:15 
REGRESSION


Linear Regression with Stochastic Regressors, Errors in Variable Models and Instrumental Variable Estimation, Independent Stochastic linear Regression, Auto regression, Linear regression, Lag Models  
Unit4 
Teaching Hours:15 
LINEAR EQUATIONS MODEL


Simultaneous Linear Equations Model : Structure of Linear Equations Model, Identification Problem, Rank and Order Conditions, Single Equation and Simultaneous Equations, Methods of Estimation Indirect Least squares, Least Variance Ratio and Two Stage Least Square  
Text Books And Reference Books: 1. Johnston, J. (1997). Econometric Methods, Fourth Edition, McGraw Hill 2. Gujarathi, D., and Porter, D. (2008). Basic Econometrics, Fifth Edition, McGraw Hill  
Essential Reading / Recommended Reading 1. Intriligator, M. D. (1980). Econometric ModelsTechniques and Applications, Prentice Hall. 2. Theil, H. (1971). Principles of Econometrics, John Wiley. 3. Walters, A. (1970). An Introduction to Econometrics, McMillan and Co.  
Evaluation Pattern CIA 50% ESE 50%  
MDS341D  BIOSTATISTICS (2022 Batch)  
Total Teaching Hours for Semester:60 
No of Lecture Hours/Week:4 
Max Marks:100 
Credits:4 
Course Objectives/Course Description 

This course provides an understanding of various statistical methods in describing and analyzing biological data. Students will be equipped with an idea about the applications of statistical hypothesis testing, related concepts and interpretation in biological data. 

Course Outcome 

Unit1 
Teaching Hours:12 
INTRODUCTION TO BIOSTATISTICS


Presentation of data  graphical and numerical representations of data  Types of variables, measures of location  dispersion and correlation  inferential statistics  probability and distributions  Binomial, Poisson, Negative Binomial, Hyper geometric and normal distribution.  
Unit2 
Teaching Hours:12 
PARAMETRIC AND NON  PARAMETRIC METHODS


Parametric methods  one sample ttest  independent sample ttest  paired sample ttest  oneway analysis of variance  twoway analysis of variance  analysis of covariance  repeated measures of analysis of variance  Pearson correlation coefficient  Non parametric methods: Chisquare test of independence and goodness of fit  Mann Whitney U test  Wilcoxon signedrank test  Kruskal Wallis test  Friedman’s test  Spearman’s correlation test  
Unit3 
Teaching Hours:12 
GENERALIZED LINEAR MODELS


Review of simple and multiple linear regression  introduction to generalized linear models  parameter estimation of generalized linear models  models with different link functions  binary (logistic) regression  estimation and model fitting  Poisson regression for count data  mixed effect models and hierarchical models with practical examples.  
Unit4 
Teaching Hours:12 
EPIDEMIOLOGY


Introduction to epidemiology, measures of epidemiology, observational study designs: case report, case series correlational studies, crosssectional studies, retrospective and prospective studies, analytical epidemiological studiescase control study and cohort study, odds ratio, relative risk, the bias in epidemiological studies.  
Unit5 
Teaching Hours:12 
DEMOGRAPHY


Introduction to demography, mortality and life tables, infant mortality rate, standardized death rates, life tables, fertility, crude and specific rates, migrationdefinition and concepts population growth, measurement of population growtharithmetic, geometric and exponential, population projection and estimation, different methods of population projection, logistic curve, urban population growth, components of urban population growth.  
Text Books And Reference Books: 1. Marcello Pagano and Kimberlee Gauvreau (2018), Principles of Biostatistics, 2nd Edition, Chapman and Hall/CRC press 2. David Moore S. and George McCabe P., (2017) Introduction to practice of statistics, 9th Edition, W. H. Freeman. 3. Sundar Rao and Richard J., (2012) Introduction to Biostatistics and research methods, PHI Learning Private limited, New Delhi  
Essential Reading / Recommended Reading 1. Abhaya Indrayan and Rajeev Kumar M., (2018) Medical Biostatistics, 4th Edition, Chapman and Hall/CRC Press. 2. Gordis Leon (2018), Epidemiology, 6th Edition, Elsevier, Philadelphia 3. Ram, F. and Pathak K. B., (2016): Techniques of Demographic Analysis, Himalaya Publishing house, Bombay . 4. Park K., (2019), Park's Text Book of Preventive and Social Medicine, Banarsidas Bhanot, Jabalpur.  
Evaluation Pattern CIA 50% ESE 50%  
MDS371  CLOUD ANALYTICS (2022 Batch)  
Total Teaching Hours for Semester:90 
No of Lecture Hours/Week:6 
Max Marks:150 
Credits:5 
Course Objectives/Course Description 

The objective of this course is to explore the basics of cloud analytics and the major cloud solutions. Students will learn how to analyze extremely large data sets, and to create visual representations of that data. Also aim to provide students with handson experience working with data at scale. 

Course Outcome 

CO1: Interpret the deployment and service models of cloud applications. CO2: Describe big data analytical concepts. CO2: Ingest, store, and secure data. CO3: Process and Visualize structured and unstructured data. 
Unit1 
Teaching Hours:18 
INTRODUCTION


INTRODUCTION Introduction to cloud computing  Major benefits of cloud computing  Cloud computing deployment models  Private cloud  Public cloud  Hybrid cloud  Types of cloud computing services Infrastructure as a Service – PaaS – SaaS  Emerging cloud technologies and services  Different ways to secure the cloud  Risks and challenges with the cloud  What is cloud analytics? Parameters before adopting cloud strategy  Technologies utilized by cloud computing 1.Creating Virtual Machines using Hypervisors 2.IaaS: Compute service  Creating and running Virtual Machines  
Unit2 
Teaching Hours:18 
CLOUD ENABLING TECHNOLOGIES


Virtualization  Load Balancing  Scalability & Elasticity – Deployment –Replication – Monitoring  Software Defined Networking  Network Function Virtualization – MapReduce  Identity and Access Management  Service Level Agreements  Billing 1. Storage as a Service: Ingesting & Querying data into cloud 2. Database as a Service: Building DB Server  
Unit3 
Teaching Hours:18 
BASIC CLOUD SERVICES & PLATFORMS


Compute Services Amazon Elastic Compute Cloud  Google Compute Engine  Windows Azure Virtual Machines Storage Services Amazon Simple Storage Service  Google Cloud Storage  Windows Azure Storage Database Services Amazon Relational Data Store  Amazon DynamoDB  Google Cloud SQL  Google Cloud Datastore  Windows Azure SQL Database  Windows Azure Table Service 1. PaaS: Working with GoogleAppEngine  
Unit4 
Teaching Hours:18 
DATA INGESTION AND STORING


Cloud Dataflow  The Dataflow programming model  Cloud Pub/Sub  Cloud storage  Cloud SQL  Cloud BigTable  Cloud Spanner  Cloud Datastore  Persistent disks 1. Database as a Service: Building DB Server 2. Transforming data PROCESSING AND VISUALIZING Google BigQuery  Cloud Dataproc  Google Cloud Datalab  Google Data Studio 1. Visualize structured data and unstructureddata  
Unit5 
Teaching Hours:18 
MACHINE LEARNING, DEEP LEARNING AND AI


Services on Artificial intelligence  Machine learning  Cloud Natural Language API – TensorFlow  Cloud Speech API  Cloud Translation API  Cloud Vision API  Cloud Video Intelligence – Dialogflow – AutoML 1. Load and query data in a data warehouse 2. Setting up and executing a data pipeline job to load data into cloud  
Text Books And Reference Books: 1. Sanket Thodge, Cloud Analytics with Google Cloud Platform, Packt Publishing, 20 18. 2. Arshdeep Bahga and Vijay Madisetti, Cloud computing  A HandsOn Approach, Create Space Independent Publishing Platform, 2014.  
Essential Reading / Recommended Reading 1. Deven Shah, Kailash Jayaswal, Donald J. Houde, Jagannath Kallakurchi, Cloud Computing  Black Book, Wiley, 2014. 2. Thomas Erl, Ricardo Puttini, Zaigham Mahmood, Cloud Computing: Concepts, Technology & Architecture, Prentice Hall, 2014.  
Evaluation Pattern CIA 50% ESE 50%  
MDS372  BUSINESS INTELLIGENCE (2022 Batch)  
Total Teaching Hours for Semester:75 
No of Lecture Hours/Week:5 
Max Marks:4 
Credits:4 
Course Objectives/Course Description 

This course is designed to introduce students the concepts of business intelligence andalso provide students with an understanding of data warehousing and data mining along with associated tools and techniques and their beneﬁts to organizations of all sizes. 

Course Outcome 

CO1: Understand the fundamentals of Business Intelligence and Analytics CO2: Apply the concepts of data warehouse concepts required for Business Intelligence CO3: Build a performance dashboard using data visualization and visual analytics. CO4: Implement the business intelligence perspective of data mining and text mining 
Unit1 
Teaching Hours:15 
Overview of Business Intelligence,


An Overview of Business Intelligence, Analytics, andDecision Support: ChangingBusinessEnvironmentsandComputerizedDecisionSupport AFrameworkforBusinessIntelligence (BI)  Transaction Processing VERSUS Analytic Processing  Successful BIImplementation  Business Analytics Overview: Descriptive Analytics  PredictiveAnalytics  Prescriptive Analytics  BriefIntroduction to Big Data Analytics ApplicationsofBI. LabExercises: 1.CasestudyonTransactionProcessing.2.Case StudyonPredictive Analytics
 
Unit2 
Teaching Hours:15 
Business Intelligence Tools and Applications


AdhocAnalysisOnlineAnalyticalProcessingMobileBIRealtimeBIOperationIntelligence OpenSourceBIEmbeddedBICollaborativeBILocationIntelligenceBusinessintelligence vendorsandmarket
LabExercises: 1.ExerciseonOLAPinBI. 2. ExerciseonRealtimeBI.  
Unit3 
Teaching Hours:15 
Power BI


Power BI OverviewInstallationData SourcesQuery EditorImporting FilesDataModeling Lookup Data TablesActive vs. Inactive RelationshipsRolesRefreshingDataandHierarchiesDataModelingDAXCalculatedColumnsMeasuresDesignandInteractiveReportsDashboard
LabExercises: 1.ExerciseonDatamodelinginPowerBI 2. ExerciseonDashboards&ReportsinPowerBI.  
Unit4 
Teaching Hours:15 
Tableau Basics


Tableau OverviewData Sources, First Bar Chart Graphthe Extracted Data Knowledgeof Aggregation,Granularity,andTimeSeriesWorkingwithChartsandFilterOverviewof First Dashboard, Maps, and Scatter PlotsJoins and Relationship Data Joining MapCreation.
LabExercises: 1.Exercise on Extracted Data in Tableau.2.ExerciseonJoins,Maps,PlotsinTableau.  
Unit5 
Teaching Hours:15 
Working with Tableau


First Dashboard Creation with Highlighting and FiltersOverview of Dualaxis Chart,Joining, Relationship, and BlendingJoining with Diﬀerent Conditions, i.e., MultipleFields and Duplicate ValuesWorking on Blending DataCreation of Dual Axis ChartUnderstandingofCalculatedFieldsUnderstanding ofRelationshipDataOverviewofNew DashboardUpdated Way of Data PreparationOverview of New Design FeatureandManyMoreAdvancementinTableau
LabExercises: 1.ExerciseonDashboardCreationandBlendinginTableau. 2. ExerciseonDatapreparation,DatarelationshipandﬁeldsinTableau.  
Text Books And Reference Books: 1. ChandraishSinha(2022).”MasteringPowerBI”,1^{st}Edition,BPBPublications. 2. Marleen,David,”MasteringTableau2021:ImplementadvancedbusinessintelligencetechniquesandanalyticswithTableau”,3rdEdition,Pakt, 3. RameshSharda,Dursun,Delen,EfraimTurban(2017).“BusinessIntelligence:ManegerialPerspectiveonAnalytics”,3^{rd}Edition,PearsonPublication.  
Essential Reading / Recommended Reading 1.AhmedSherif(2016).”PracticalBusinessIntelligence”,PacktPublishing.  
Evaluation Pattern CIA 50% ESE 50%  
MDS373A  NATURAL LANGUAGE PROCESSING (2022 Batch)  
Total Teaching Hours for Semester:90 
No of Lecture Hours/Week:6 
Max Marks:150 
Credits:5 
Course Objectives/Course Description 

The goal is to make familiar with the concepts of the study of human language from a computational perspective. It covers syntactic, semantic and discourse processing models, emphasizing machine learning concepts. 

Course Outcome 

Unit1 
Teaching Hours:15 
INTRODUCTION


Introduction to NLP Background and overview NLP Applications NLP hard Ambiguity Algorithms and models, Knowledge Bottlenecks in NLP Introduction to NLTK, Case study Lab: 1. Write a program to tokenize text 2. Write a program to count word frequency and to remove stop words
 
Unit2 
Teaching Hours:15 
PARSING AND SYNTAX


Word Level Analysis: Regular Expressions, Text Normalization, Edit Distance, Parsing and Syntax Spelling, Error Detection and correctionWords and Word classes Partof Speech Tagging, Naive Bayes and Sentiment Classification: Case study. Lab: 3. Write a program to program to tokenize NonEnglish Languages 4. Write a program to get synonyms from WordNet
 
Unit3 
Teaching Hours:15 
SMOOTHED ESTIMATION AND LANGUAGE MODELLING


Ngram Language Models: NGrams, Evaluating Language Models The language modelling problem SEMANTIC ANALYSIS AND DISCOURSE PROCESSING Semantic Analysis: Meaning RepresentationLexical Semantics AmbiguityWord Sense Disambiguation. Discourse Processing: cohesionReference Resolution Discourse Coherence and Structure. Lab:
5. Write a program to get Antonyms from WordNet 6. Write a program for stemming NonEnglish words
 
Unit4 
Teaching Hours:15 
NATURALLANGUAGE GENERATION AND MACHINE TRANSLATION


Natural Language Generation: Architecture of NLG Systems, Applications Machine Translation: Problems in Machine Translation Machine Translation Approaches Evaluation of Machine Translation systems. Case study: Characteristics of Indian Language Lab:
7. Write a program for lemmatizing words Using WordNet 8. Write a program to differentiate stemming and lemmatizing words
 
Unit5 
Teaching Hours:15 
INFORMATION RETRIEVAL AND LEXICAL RESOURCES


Information Retrieval: Design features of Information Retrieval SystemsClassical, Non classical, Alternative Models of Information Retrieval – valuation Lexical Resources: Word Embeddings  Word2vec Glove. UNSUPERVISED METHODS IN NLP Graphical Models for Sequence Labelling in NLP. Lab:
9. Write a program for POS Tagging or Word Embeddings. 10. Case studybased program (IBM) or Sentiment analysis
 
Text Books And Reference Books: 1. Speech and Language Processing, Daniel Jurafsky and James H., 2nd Edition, Martin Prentice Hall,2013. 2. Foundations of Statistical Natural Language Processing. Cambridge, MA: MIT Press, 1999  
Essential Reading / Recommended Reading 1. Foundations of Computational Linguistics: Humancomputer Communication in Natural Language, Roland R. Hausser, Springer, 2014. 2. Steven Bird, Ewan Klein and Edward Loper Natural Language Processing with Python, O’Reilly Media; 1 edition, 2009.  
Evaluation Pattern CIA 50% ESE 50%  
MDS373B  HADOOP (2022 Batch)  
Total Teaching Hours for Semester:90 
No of Lecture Hours/Week:6 
Max Marks:150 
Credits:5 
Course Objectives/Course Description 

ThesubjectisintendedtogivetheknowledgeofBigDataevolving ineveryrealtimeapplications and how they are manipulated using the emerging technologies. Thiscourse breaks down the walls of complexity in processing Big Data by providing apractical approach to developing Java applications on top of the Hadoop platform. ItdescribestheHadooparchitectureandhowtoworkwiththeHadoopDistributedFileSystem(HDFS)andHBaseinUbuntuplatform. 

Course Outcome 

CO1: Understand the Big Data concepts in real time scenario CO2: Understand the big data systems and identify the main sources of Big Data in the real world. CO3: Demonstrate an ability to use Hadoop framework for processing Big Data for Analytics. CO4: Evaluate the Map reduce approach for diﬀerent domain problems. 
Unit1 
Teaching Hours:15 
INTRODUCTION


Distributed ﬁle system – Big Data and its importance, Four Vs, Drivers for Big data, Big dataanalytics,Bigdataapplications,Algorithmsusingmapreduce,MatrixVectorMultiplicationbyMapReduce.
Apache Hadoop– Moving Data in and out of Hadoop – Understanding inputs and outputsofMapReduceDataSerialization,ProblemswithtraditionallargescalesystemsRequirementsfor a new approachHadoop – ScalingDistributed FrameworkHadoop v/s RDBMSBriefhistoryofHadoop.
LabExercise 1.InstallingandConﬁguringHadoop  
Unit2 
Teaching Hours:15 
CONFIGURATIONS OF HADOOP


HadoopProcesses(NN,SNN,JT,DN,TT)Temporarydirectory–UICommonerrorswhenrunning Hadoop cluster, solutions.Setting up Hadoop on a local Ubuntu host: Prerequisites,downloading Hadoop, setting up SSH, conﬁguring the pseudodistributed mode, HDFSdirectory, NameNode, Examples of MapReduce, Using Elastic MapReduce, Comparison oflocalversus EMR Hadoop. UnderstandingMapReduce:Key/valuepairs,TheHadoopJavaAPIforMapReduce,WritingMapReduce programs, Hadoopspeciﬁc data types, Input/output.Developing MapReducePrograms: UsinglanguagesotherthanJavawithHadoop, Analysing alarge dataset.
LabExercise
1. 1.WordcountapplicationinHadoop.
2. 2.SortingthedatausingMapReduce.
3. 3.Findingmaxand minvalueinHadoop.  
Unit3 
Teaching Hours:15 
ADVANCED MAPREDUCE TECHNIQUES


Simple,advanced,and inbetweenJoins,Graphalgorithms,usinglanguageindependentdatastructures.Hadoop conﬁguration properties  Setting up a cluster, Cluster access control,managingtheNameNode,ManagingHDFS,MapReducemanagement,Scaling.
LabExercise:
1. ImplementationofdecisiontreealgorithmsusingMapReduce.
2. ImplementationofKmeansClusteringusingMapReduce.
3. GenerationofFrequentItemsetusingMapReduce.  
Unit4 
Teaching Hours:15 
HADOOP STREAMING


HadoopStreamingStreamingCommandOptions  SpecifyingaJavaClassastheMapper/ReducerPackaging FilesWithJobSubmissionsSpecifyingOtherPluginsforJobs.
LabExercise: 1. Countthenumberofmissingandinvalid valuesthroughjoiningtwo largegivendatasets. 2. Usinghadoop’smapreduce,EvaluatingNumberofProductsSoldinEachCountryintheonlineshoppingportal.Datasetis given. 3. Analyzethesentimentforproductreviews,thisworkproposesaMapReducetechniqueprovidedbyApache Hadoop.  
Unit5 
Teaching Hours:15 
HIVE & PIG


Architecture,Installation,Conﬁguration,HivevsRDBMS,Tables,DDL&DML,Partitioning& Bucketing, Hive Web Interface, Pig, Use case of Pig, Pig Components, Data Model, PigLatin. LabExercise
1.TrendAnalysisbasedonAccessPatternoverWebLogsusingHadoop.2.ServiceRatingPredictionbyExploring SocialMobile UsersGeographicalLocations.  
Unit6 
Teaching Hours:15 
HBase


RDBMSVsNoSQL,HBasics,Installation,Buildinganonlinequeryapplication–Schemadesign,LoadingData,OnlineQueries,Successfulservice.
HandsOn: SingleNodeHadoopClusterSetupinanycloud serviceprovider Howtocreateinstance.How to connect that Instance Using putty.Installing Hadoop framework on thisinstance.Runsampleprogramswhich comewithHadoopframework.  
Text Books And Reference Books: 1]Borislublinsky,Kevint.Smith,AlexeyYakubovich,ProfessionalHadoopSolutions,Wiley,2015.
[2] TomWhite,Hadoop:TheDeﬁnitiveGuide,O’ReillyMediaInc.,2015.
 
Essential Reading / Recommended Reading [1] Garry Turkington, Hadoop Beginner's Guide, Packt Publishing, 2013.
 
Evaluation Pattern CIA 50% ESE 50%  
MDS373C  BIO INFORMATICS (2022 Batch)  
Total Teaching Hours for Semester:90 
No of Lecture Hours/Week:6 
Max Marks:150 
Credits:5 
Course Objectives/Course Description 

To enable the students to learn the information search and retrieval,Genome analysis and Genemapping, alignment of multiplesequences and PERL for Bioinformatics. 

Course Outcome 

CO1: Understand the molecular Biology and Bioinformatics applications. CO2: Apply the modeling and simulation technologies in Biology and medicine. CO3: Evaluate the algorithms to ﬁnd the similarity between protein and DNA sequences. 
Unit1 
Teaching Hours:18 
BIOINFORMATICS


Introduction, Historical Overview and Deﬁnition, Applications, Major databases inBioinformatics,Datamanagement and Analysis,CentralDogma of MolecularBiology.INFORMATION SEARCH AND RETRIEVAL Introduction,Toolsforwebsearch,Dataretrievaltools,DataminingofBiologicaldatabases. LabExercise 1. Test and verify thebasic Linux commands and Filters. 2. Create the ﬁle(s) and verify the ﬁle handling commands.  
Unit2 
Teaching Hours:18 
GENOME ANALYSIS AND GENE MAPPING


GENOME ANALYSIS AND GENE MAPPING Introduction, Genome analysis, Genomemapping, Sequence assembly problem, Genetic mapping and linkage analysis, Physicalmaps, Cloning the entire Genome, Genome sequencing, Applications of Genetic maps,Identiﬁcation of Genes in Contigs, Human Genome Project. ALIGNMENT OF PAIRS OFSEQUENCES Introduction,Biological motivation of alignment,Methods of sequencealignments,Usingscore matrices,Measuringsequence detection LabExercise 1. Create directories and verify the directory commands. 2. Perform basic mathematical operations using PERL. 3. Write a PERL script to demonstrate the Array operations and Regular expressions.  
Unit3 
Teaching Hours:18 
ALIGNMENT OF MULTIPLE SEQUENCES


ALIGNMENT OF MULTIPLE SEQUENCES Methods of multiple sequence alignment,Evaluating multiple alignments,Applications of multiple alignments,Phylogenetic analysis, Methods of phylogenetic analysis, Tree evaluation, Problems in Phylogeneticanalysis. TOOLS FOR SIMILARITY SEARCH AND SEQUENC EALIGNMENTIntroduction, Working with FASTA, Working with BLAST, Filtering and Gapped BLAST, FASTA and BLAST algorithm comparison. LabExercise: 1. Write a PERL script to concatenate DNA sequences. 2. Write a PERL script to transcribe DNA sequence into RNA sequence 3.Write a PERL script to calculate the reverse complement of fast rand of DNA.  
Unit4 
Teaching Hours:18 
PERL FOR BIOINFORMATICS


Sequences and Strings: Representing sequence data, Program to store a DNA sequence,Concatenating DNA fragments, Transcription DNA to RNA, Proteins, Files and Arrays,ReadingProteinsinFiles,Arrays,ScalarandListContext.
Motifs and Loops: Flow control,Code layout, Findingmotifs, Counting Nucleotides,Explodingstrings and arrays,Operating on strings.Subroutine andBugs:Subroutines,Scoping and Subroutines, Commandline * arguments and Arrays, Passing datatoSubroutines,Modules and Libraries of Subroutines. LabExercise 1. Write a PERL script to read protein sequence data from a ﬁle. 2. Write a PERL script to search for a motifina DNA sequence.  
Unit5 
Teaching Hours:18 
THE GENETIC CODE


Hashes,Data structure and algorithms forBiology,TranslatingDNAintoProteins,ReadingDNA from the ﬁles in FASTA format,ReadingFrames. GenBank: GenBankﬁles,GenBankLibraries,SeparatingSequenceandAnnotation,ParsingAnnotations,Indexing GenBank with DBM. Protein Data Bank: Files and Folders, PDB Files, ParsingPDBFiles. Lab Exercises: 1. Write a PERLscript to append ACGT to DNA using a subroutine. 2.Case Study: a. To retrieve the sequence of the Human keratin protein from UniProtdatabase and to interpret the results. b. To retrieve the sequence of the Human keratinproteinfromGenBankdatabase andtointerprettheresults.  
Text Books And Reference Books:
[1] Bioinformatics:MethodsandApplications,S.C.Rastogi,NamitaMendirataandParagRastogi,4thEdition,PHILearning,2013. [2] BeginningPerlforBioinformatics,TisdallJames,1stedition,ShroﬀPublishers(O’Reilly),2009.  
Essential Reading / Recommended Reading [1] IntroductiontoBioinformatics,ArthurMLesk,2ndEdition,OxfordUniversityPress,4thedition,2014. [2] BioinformaticsTechnologies,YiPingPhoebeChen(Ed),1stedition,Springer,2005. [3] BioinformaticsComputing,BryanBergeron,2ndEdition,PrenticeHall,1stedition,2003. Webresources: [1] http://cac.annauniv.edu/PhpProject1/aidetails/afug_2013_fu/24.%20BIO%20MED.pdf [2] https://www.amrita.edu/school/biotechnology/academics/pg/introductionbioinformaticsbif410 [3] https://canvas.harvard.edu/courses/8084/assignments/syllabus [4] https://www.coursera.org/specializations/bioinformatics[5]http://www.dtc.ox.ac.uk/modules/introductionbioinformaticsbioscientists.htmlEvaluationPattern  
Evaluation Pattern CIA50% ESE50%  
MDS373D  EVOLUTIONARY ALGORITHMS (2022 Batch)  
Total Teaching Hours for Semester:90 
No of Lecture Hours/Week:6 
Max Marks:150 
Credits:5 
Course Objectives/Course Description 

Able to understand the core concepts of evolutionary computing techniques and popular evolutionary algorithms that are used in solving optimization problems.Students will be able to implement custom solutions for realtime problems applicable with evolutionary computing.


Course Outcome 

CO1: Basic understanding of evolutionary computing concepts and techniques CO2: Classify relevant realtime problems for the applications of evolutionary algorithms 
Unit1 
Teaching Hours:18 
INTRODUCTION TO EVOLUTIONARY COMPTUTING


Terminologies – Notations – Problems to be solved – Optimization – Modeling – Simulation – Search problems – Optimization constraints Lab Program 1.Implementation of single and multiobjectivefunctions 2.Implementation of binaryGA
 
Unit2 
Teaching Hours:18 
EVOLUTIONARY PROGRAMMING


Continuous evolutionary programming – Finite state machine optimization – Discrete evolutionary programming – The Prisoner’s dilemma EVOLUTION STRATEGY One plus one evolution strategy – The 1/5 Rule – (μ+1) evolution strategy – Self adaptive evolution strategy Lab Program 1.Implementation of continuous GA 2.Implementation of evolutionary programming
 
Unit3 
Teaching Hours:18 
GENETIC PROGRAMMING


Fundamentals of genetic programming – Genetic programming for minimal time control EVOLUTIONARY ALGORITHM VARIATION Initialization – Convergence – Population diversity – Selection option – Recombination – Mutation Lab Program 1.Implementation of genetic programming 2.Implementation of Ant Colony Optimization  
Unit4 
Teaching Hours:18 
ANT COLONY OPTIMIZATION


ANT COLONY OPTIMIZATION Pheromone models – Ant system – Continuous Optimization – Other Ant System PARTICLE SWARM OPTIMIZATION Velocity limiting – Inertia weighting – Global Velocity updates – Fully informed Particle Swarm Lab Program 1.Implementation of Particle Swarm Optimization 2.Implementation of MultiObject Optimization
 
Unit5 
Teaching Hours:18 
MULTOBJECTIVE OPTIMIZATION


Pareto Optimality – Hyper volume – Relative coverage – Nonpareto based EAs – Pareto based EAs – Multiobjective Biogeography based optimization Lab Program 1.Simulation of EA in Planning problems (routing, scheduling, packing) and Design problems (Circuit, structure,art) 2.Simulation of EA in classiﬁcation/prediction modelling
 
Text Books And Reference Books: [1] D. Simon, Evolutionary optimization algorithms: biologically inspired andpopulationbasedapproachestocomputerintelligence.NewJersey:JohnWiley,2013. [2] Eiben and J. Smith, Introduction to evolutionary computing. 2nd ed. Berlin:Springer,2015.  
Essential Reading / Recommended Reading 1. D.Goldberg,Geneticalgorithmsinsearch,optimization,andmachinelearning.Boston: AddisonWesley,2012.
2. K. Deb, Multiobjective optimization using evolutionary algorithms. Chichester: John Wiley & Sons,2009.
3. R. Poli, W. Langdon, N. McPhee and J. Koza, A ﬁeld guide to genetic programming. [S.l.]: Lulu Press,2008.
4. T.Bäck,Evolutionaryalgorithmsintheoryandpractice.NewYork:OxfordUniv.Press, 1996.
Web Resources:
1 E.A.EandS.J.E,"IntroductiontoEvolutionaryComputingTheonline accompaniment to the book Introduction toEvolutionary Computing",Evolutionarycomputation.org,2015.[Online].Available: http://www.evolutionarycomputation.org/.
2 F.Lobo,"EvolutionaryComputation2018/2019",Fernandolobo.info,2018.[Online]. Available:http://www.fernandolobo.info/ec1819.
3 "EClabTools",Cs.gmu.edu,2008.[Online].Available: https://cs.gmu.edu/~eclab/tools.html.
 
Evaluation Pattern CIA 50% ESE 50%  
MDS373E  OPTIMIZATION TECHNIQUE (2022 Batch)  
Total Teaching Hours for Semester:90 
No of Lecture Hours/Week:6 
Max Marks:150 
Credits:5 
Course Objectives/Course Description 

This course will help thes tudents to acquire and demonstrate the implementation of the necessary algorithms for solving advanced level Optimization techniques. 

Course Outcome 

CO1: Apply the notions of linear programming in solving transportation problems CO2: Understand the theory of games for solving simple games C03: Use linear programming in the formulation of the shortest route problem. CO4: Apply algorithmic approach in solving various types of network problems CO5: Create applications using dynamic programming. 
Unit1 
Teaching Hours:18 
INTRODUCTION


INTRODUCTION OperationsResearchMethods SolvingtheORmodelQueuingandSimulationmodels – Art of modelling – phases of OR study.MODELLINGWITHLINEARPROGRAMMING Two variable LP model – Graphical LP solution – Applications. Simplex method andsensitivityanalysis–DualityandpostoptimalAnalysisFormulationofthedualproblem.LabExercise 1. SimplexMethod 2. DualSimplexMethod
 
Unit2 
Teaching Hours:18 
TRANSPORTATION MODEL


TRANSPORTATIONMODEL DeterminationoftheStartingSolution–Iterativecomputationsofthetransportationalgorithm.AssignmentModel:TheHungarianMethod–Simplexexplanationof theHungarianMethod–ThetransshipmentModel. LabExercise 1. BalancedTransportationProblem 2. UnbalancedTransportationProblem 3. AssignmentProblems  
Unit3 
Teaching Hours:18 
NETWORK MODELS


NETWORKMODELS Minimal Spanning tree Algorithm – Linear Programming formulation of the shortestrouteproblem. Maximal Flow Model: Enumeration of cuts – Maximal Flow Diagram – LinearProgrammingFormulationofMaximalFlowModel. CPMandPERT Network Representation– Critical PathComputations –Constructionofthetime Schedule –LinearProgramming formulationofCPM–PERTnetworks.LabExercise: 1.Shortest path computations in a network 2.Maximumﬂowproblem  
Unit4 
Teaching Hours:18 
GAME THEORY


GAMETHEORY Strategic Games and examplesNash equilibrium and examplesOptimal Solution of two person zero sum gamesSolution of Mixed strategy gamesMixed strategy Nash equilibriumDominated action with example. GOALPROGRAMMING Formulation–Tax Planning Problem–Goal Programming algorithms–Weights method –Preemptive method. LabExercise: 1. CriticalpathComputations 2. GameProgramming  
Unit5 
Teaching Hours:18 
MARKOV CHAINS


MARKOVCHAINS Deﬁnition–Absolute and nstepTransition Probability–Classiﬁcation of states.DYNAMIC PROGRAMMING Recursivenature of computation in Dynamic Programming – Forward and Backward Recursion – Knapsack / Fly Away / CargoLoading Model – Equipment ReplacementModel. LabExercise: 1. GoalProgramming 2. DynamicProgramming
 
Text Books And Reference Books:
1. HamdyATaha,OperationsResearch,9thEdition,PearsonEducation,2012. 2.Garrido José M. Introduction to Computational Models with Python. CRC Press,2016.
 
Essential Reading / Recommended Reading 1. RathindraPSen,OperationsResearch–AlgorithmsandApplications,PHI LearningPvt. Limited, 2011 2. R.Ravindran,D.T.PhilipsandJ.J.Solberg,OperationsResearch:PrinciplesandPractice,2nded.,JohnWiley& Sons,2007.
3. F.S.HillierandG.J.Lieberman,Introductiontooperationsresearch,8thed.,McGrawHillHigherEducation,2004.
4. K.C.Rao andS.L.Mishra,Operationsresearch,AlphaScienceInternational,2005.
5. Hart, William E. Pyomo: Optimization Modeling in Python. Springer, 2012.
6.MartinJ.Osborne,AnintroductiontoGametheory,OxfordUniversityPress,2008  
Evaluation Pattern CIA 50% ESE50%  
MDS381  SPECIALIZATION PROJECT (2022 Batch)  
Total Teaching Hours for Semester:60 
No of Lecture Hours/Week:4 
Max Marks:100 
Credits:2 
Course Objectives/Course Description 

The course is designed to provide a realworld project development and deployment environment for the students. 

Course Outcome 

CO1: Identify the problem and relevant analytics for the selected domain. CO2: pply appropriate design/development strategy and tools. 
Unit1 
Teaching Hours:60 
Specialization Project


Project will be based on the specialization domains which students are opted for during this semester.  
Text Books And Reference Books: [1]. Statistics : An Introduction Using R, Michael J. Crawley, WILEY, Second Edition, 2015. Recommended References
 
Essential Reading / Recommended Reading [1].Handson programming with R, Garrett Grolemund, O’Reilley, 1st Edition, 2014 [2]. R for everyone, Jared Lander, Pearson, 1st Edition, 2014  
Evaluation Pattern CIA 50% ESE 50%  
MDS481  INDUSTRY PROJECT (2022 Batch)  
Total Teaching Hours for Semester:30 
No of Lecture Hours/Week:2 
Max Marks:300 
Credits:12 
Course Objectives/Course Description 

This course helps the student to develop students to become globally competent and to inculcate Entrepreneurial skills among students.


Course Outcome 

CO1: Develop Real time Projects CO2: Practices different data science principles and strategies in the project 
Unit1 
Teaching Hours:30 
Project Work


It is a full time project to be taken up either in the industry or in an R&D organization.
 
Text Books And Reference Books:   
Essential Reading / Recommended Reading   
Evaluation Pattern CIA 50% ESE 50% 