Statistics
50
Views
0
Downloads
0
Donations
Support
Share
Uploader

高宏飞

Shared on 2026-01-20

AuthorProbyto Data Science and Consulting Pvt. Ltd.

The book will initially explain the What-Why of Data Science and the process of solving a Data Science problem. The fundamental concepts of Data Science, such as Statistics, Machine Learning, Business Intelligence, Data pipeline, and Cloud Computing, will also be discussed. All the topics will be explained with an example problem and will show how the industry approaches to solve such a problem. The book will pose questions to the learners to solve the problems and build the problem-solving aptitude and effectively learn. The book uses Mathematics wherever necessary and will show you how it is implemented using Python with the help of an example dataset.

Tags
No tags
ISBN: 9389423287
Publisher: BPB Publications
Publish Year: 2020
Language: 英文
Pages: 366
File Format: PDF
File Size: 22.7 MB
Support Statistics
¥.00 · 0times
Text Preview (First 20 pages)
Registered users can read the full content for free

Register as a Gaohf Library member to read the complete e-book online for free and enjoy a better reading experience.

 i Data Science for Business Professionals A Practical Guide for Beginners by Probyto Data Science and Consulting Pvt. Ltd.
ii  FIRST EDITION 2020 Copyright © BPB Publications, India ISBN: 978-93-89423-280 All Rights Reserved. No part of this publication may be reproduced or distributed in any form or by any means or stored in a database or retrieval system, without the prior written permission of the publisher with the exception to the program listings which may be entered, stored and executed in a computer system, but they can not be reproduced by the means of publication. LIMITS OF LIABILITY AND DISCLAIMER OF WARRANTY The information contained in this book is true to correct and the best of author’s & publisher’s knowledge. The author has made every effort to ensure the accuracy of these publications, but cannot be held responsible for any loss or damage arising from any information in this book. All trademarks referred to in the book are acknowledged as properties of their respective owners. Distributors: BPB PUBLICATIONS 20, Ansari Road, Darya Ganj New Delhi-110002 Ph: 23254990/23254991 MICRO MEDIA Shop No. 5, Mahendra Chambers, 150 DN Rd. Next to Capital Cinema, V.T. (C.S.T.) Station, MUMBAI-400 001 Ph: 22078296/22078297 DECCAN AGENCIES 4-3-329, Bank Street, Hyderabad-500195 Ph: 24756967/24756400 BPB BOOK CENTRE 376 Old Lajpat Rai Market, Delhi-110006 Ph: 23861747 Published by Manish Jain for BPB Publications, 20 Ansari Road, Darya Ganj, New Delhi-110002 and Printed by him at Repro India Ltd, Mumbai
 iii Dedicated to Students & Data Science Enthusiast The Probyto Team Members, who are at forefront of sharing their knowledge
iv  About the Author Probyto Data Science and Consulting Private Limited (referred as Probyto) is leading solution provider in Artificial Intelligence (AI) domain for business from different sizes and industries. Probyto develops AI Solutions for businesses and delivers them through fully Managed AI platform. AI Platform enables businesses of any size to subscribe AI solutions and get started within 7 days. The vision of Probyto is to become “AI Success Partner” to our clients by “Accelerating the AI Journey” in quick, secure, scalable and affordable manner. Probyto create AI equity by feeding in the value cycle, good talent, quality output and client value, hence creating AI equity for societies and companies. With Probyto AI resources bouquet, the innovation process is streamlined to innovate at scale. • Founded in 2015 in India; Expanded our services to Ireland, Singapore and US • Striving to deliver best value through our rich experience across geographies and industries using finest of Data Science and Technology • Our approach is to become your AI Success Partner and help you climb the AI success ladder Probyto activities and contribution in the field of AI are driven by three key goals. • AI Democratization – Be it small shop or a large business the benefit of AI/ ML should reach everyone • Affordable AI - The cost of AI development and operations should allow higher adoption rate • Good for Society - Whatever AI solutions Probyto develops, it needs to keep overall good of society at its core To fulfil Probyto’s goals, the book has been written by collective experience of many of Probyto past client projects, academic collaborations and team members for last 5 years. The collective work is represented by different experts in data driven decision making and portion they deal with in creating value for the clients. The team has experienced professionals and freshers who have gained from the approach as mentioned in the book as well. Visit Probyto to know about our team and our offerings for academia & Industry: https://probyto.com
 v Acknowledgement The book is not just a collection of topics in the Data Science domain but a journal of what Probyto team has learned in practical application of Data Science over past 5+ years implementing solutions and nurturing fresh talent. This book has been possible by the support and work of a multidisciplinary team comprising of researchers, cloud architects, developers and business consultants at Probyto. Special mention goes to the team members who led the efforts for writing the book manuscript, Parvej Reja Saleh, Namachivayam Dharmalingam, Srivathshan KS, Devjit Dey, Jayeesha Ghosh and Md Rakibul Ashiquee. A special thank goes to Abhishek Singh from Probyto for facilitation of the whole effort with BPB. The book learnings have been gathering by our numerous interactions with academic institutions, our interns, researchers and most important the clients. The feedback from clients help us build the right skillset in team and influence the freshers to look at data science as a tool to solve business problems rather than mastery of tools itself. “I would like to express my deep gratitude to all the Probyto Team members for their valuable contribution in this book. I would like to thank Abhishek Singh, for his advice and assistance in keeping my progress on schedule. My grateful thanks are also extended to the co-author Namachivayam for his contribution and constant support. Finally, I wish to thank SM Saleh (Father), Roushanara Saleh (Mother), BS Hasina (Aunt) and Late Saheda Akhtar (Aunt) for their constant support and encouragement.” - Parvej Reja Saleh “I personally thank Probyto and our team members who supports to share the knowledge to successfully write this book. Hope this book will make a good starting point of Data Science journey for the students” - NamachivayamDharmalingam A big thanks for the team at BPB, for making this book possible for our freshers in India and Abroad. This book will open opportunities for students to see the Data Science domain from professional perspective and give them path to learn the valuable skills.
vi  Preface Data Science has emerged as a standalone industry itself serving needs of multiple other industries and sectors by providing valuable factual insights and automation of data driven tasks. Further, due to multiple reasons of which talent being most significant one, the adoption rate of Data Science is slower. It has been proven that data driven decision tools can reduce cost for companies’ operations and at the same time create new markets. Nowadays, the data science training programs are growing with high rate due to steep increase in demand of skilled candidates for open roles in Data Science domain. Data Science trainings offered by various platforms are designed to cover three crucial parts of skilling the freshers; theoretical concepts in Machine Learning, technology and programming skills, and the skills to create data-based solutions for business problems. For a fresher or enthusiast, accessing so many different aspects of data science is a challenge due to; 1. Too much and too varied content provided by platforms 2. Difficulty in stitching together technology, business and cloud skills to build a solution 3. Lack of innovative and real-world examples of application implementation The core data science community has started to emphasise the need to re-structure the way we train the freshers by providing them a view of actual implementation of the end-to-end solution in a business set-up. It is important to equip the Data Scientist with the facts that “most accurate and optimised solution might not be the right solution in a dynamic business & technology environment”. Business value delivery is core to Data Science in any enterprise or in society. This book tries to set first step in combing the complex parts of Data Science skills and their application in creating a real business solution. This include having enough knowledge of business processes, mathematics, technology and other technological innovation in cloud computing.
 vii The book is divided into eight sections covering all aspects of creating value from data science in business set-up. 1. Data Science Overview: Explain everything a programmer needs to know about data science, from what data science is all about? Why data science is important? And how does data science work in real implementations. 2. Mathematics and Statistics: Introduce basics of linear algebra and its importance in solving data problems. Introduce basic mathematics to understand machine learning including optimization and calculus. Explain importance of statistics in data science. Cover key concepts of statistics required to solve data science problems. 3. Machine Learning: Introduces the basics of machine learning including exploratory data analysis, data preparation steps and algorithms for model training 4. Data Engineering: Introduces the concept of data pipelines and their significance. Also discuss how to build simple data pipelines. It also touched upon big data systems and databases. 5. Cloud Computing: Introduces the key enabling concept behind cloud computing – Hypervisors. It further will show with example how to work with cloud to put application on cloud for end-user use. 6. Business Intelligence: The business intelligence concepts are introduced what it is and how to make use of tools. Further, an example is built on Power BI to show how to frame business questions and answer them using visualisation tools. 7. Industry Use Cases: Two uses cases have been discussed at length to show how starting from problem we build a solution and put to use by end-user as an AI application. 8. Self-Assessment: The self-assessment is collection of typical questions and gaps that industry is looking for in people to hire them for entry level roles.
viii  Downloading the code bundle and coloured images: Please follow the link to download the Code Bundle and the Coloured Images of the book: https://rebrand.ly/bac0131 Errata We take immense pride in our work at BPB Publications and follow best practices to ensure the accuracy of our content to provide with an indulging reading experience to our subscribers. Our readers are our mirrors, and we use their inputs to reflect and improve upon human errors if any, occurred during the publishing processes involved. To let us maintain the quality and help us reach out to any readers who might be having difficulties due to any unforeseen errors, please write to us at : errata@bpbonline.com Your support, suggestions and feedbacks are highly appreciated by the BPB Publications’ Family.
 ix Table of Contents 1. Data Science Overview ....................................................................................... 1 Structure .......................................................................................................... 2 Objectives ........................................................................................................ 2 Evolution of data analytics ........................................................................... 2 Define data science ........................................................................................ 4 Domain knowledge ....................................................................................... 6 Mathematical and scientific techniques ...................................................... 7 Tools and technology ................................................................................... 10 Data science analysis types ......................................................................... 13 Data science job roles ................................................................................... 14 ML model development process ................................................................ 15 Data visualizations....................................................................................... 16 Result communication ................................................................................. 17 Responsible and ethical AI ......................................................................... 18 Career in data science .................................................................................. 19 Conclusion .................................................................................................... 20 2. Mathematics Essentials .................................................................................... 21 Structure ........................................................................................................ 21 Objectives ...................................................................................................... 22 Introduction to linear algebra .................................................................... 22 Scalar, vectors, matrices, and tensors ........................................................ 23 Scalar ........................................................................................................ 23 Vectors ...................................................................................................... 23 Matrices ................................................................................................... 25 Tensors...................................................................................................... 27 The determinant ........................................................................................... 28 Eigenvalues and Eigenvectors ................................................................... 29 Eigenvalue decomposition and Singular Value Decomposition (SVD) .................................................................................. 30 Singular value decomposition ................................................................... 32
x  Principal component analysis .................................................................... 33 Multivariate calculus ................................................................................... 35 Differential Calculus .................................................................................... 36 Sum rule ................................................................................................... 37 Power rule ................................................................................................ 38 Special cases ............................................................................................. 38 Trigonometric functions ........................................................................... 40 Product rule ............................................................................................. 41 Chain rule................................................................................................. 41 Quotient rule ............................................................................................ 41 Multiple variables .................................................................................... 42 Partial differentiation ............................................................................... 42 Total derivative ......................................................................................... 42 Integral calculus ........................................................................................... 43 Slices ......................................................................................................... 43 Definite vs.indefinite integrals ................................................................. 44 The Gradient ............................................................................................ 45 The Jacobian ............................................................................................. 45 The Hessian .............................................................................................. 45 The Lagrange multipliers ......................................................................... 46 Laplace interpolation ................................................................................ 46 Optimization ............................................................................................ 46 The Gradient Descent algorithm .............................................................. 47 Conclusion .................................................................................................... 48 3. Statistics Essentials............................................................................................ 49 Structure ........................................................................................................ 49 Objectives ...................................................................................................... 50 Introduction to probability and statistics ................................................. 50 Descriptive statistics .................................................................................... 51 The measure of central tendency .............................................................. 53 Mean ............................................................................................................. 53 Median .......................................................................................................... 54 Mode.............................................................................................................. 54 Measures of variability ............................................................................. 54
 xi Range ............................................................................................................ 54 Variance ........................................................................................................ 55 Covariance ..................................................................................................... 55 Standard Deviation ....................................................................................... 56 Measure of asymmetry ............................................................................. 57 Modality ........................................................................................................ 57 Skewness ....................................................................................................... 57 Populations and samples .......................................................................... 58 Central Limit Theorem ............................................................................. 59 Sampling distribution .............................................................................. 59 Conditional probability ............................................................................... 59 Random variables ........................................................................................ 60 Inferential statistics ...................................................................................... 61 Probability distributions .......................................................................... 62 What is a probability distribution? ............................................................... 62 Normal distribution ...................................................................................... 63 Binomial distribution .................................................................................... 65 Poisson distribution ...................................................................................... 67 Geometric distribution .................................................................................. 67 Exponential distribution ............................................................................... 67 Conclusion .................................................................................................... 68 4. Exploratory Data Analysis ............................................................................... 69 Structure ........................................................................................................ 70 Objectives ...................................................................................................... 70 What is EDA?................................................................................................ 70 Need for the EDA ..................................................................................... 71 Understanding data ..................................................................................... 72 Categorical variables ................................................................................ 72 Numeric variables .................................................................................... 72 Binning (numeric to categorical) ............................................................. 73 Encoding .................................................................................................. 73 Methods of EDA ........................................................................................... 73 Key concepts of EDA ................................................................................... 74 Conclusion .................................................................................................... 79
xii  5. Data Preprocessing ............................................................................................ 81 Structure ........................................................................................................ 82 Objectives ...................................................................................................... 82 Introduction to data preprocessing ........................................................... 82 Methods in data preprocessing .................................................................. 83 Transformation into vectors ..................................................................... 83 Normalization .......................................................................................... 83 Dealing with the missing values .............................................................. 84 Conclusion .................................................................................................... 91 6. Feature Engineering .......................................................................................... 93 Structure ........................................................................................................ 94 Objectives ...................................................................................................... 94 Introduction to feature engineering .......................................................... 94 Importance of feature variable .................................................................. 95 Feature engineering in machine learning ................................................ 95 Feature engineering techniques ................................................................. 96 Imputation................................................................................................ 97 Handling outliers ..................................................................................... 98 Binning .................................................................................................... 99 Log Transform ........................................................................................ 101 One-hot encoding ................................................................................... 102 Grouping operations .............................................................................. 103 Categorical column grouping ..................................................................... 103 Numerical column grouping....................................................................... 104 Feature split ........................................................................................... 104 Scaling .................................................................................................... 106 Extracting date ....................................................................................... 107 Applying feature engineering .................................................................. 108 Conclusion ..................................................................................................110 7. Machine Learning Algorithms .......................................................................111 Structure .......................................................................................................112 Objectives .....................................................................................................112 Introduction to machine learning .............................................................112
 xiii Brief history of machine learning ............................................................113 Classification of machine learning algorithms ........................................114 Top 10 algorithms of machine learning explained .................................119 Building a machine learning model ........................................................ 121 Conclusion .................................................................................................. 126 8. Productionizing Machine Learning Models .............................................. 127 Structure ...................................................................................................... 127 Objectives .................................................................................................... 128 Types of ML production system .............................................................. 128 Batch prediction ..................................................................................... 129 Batch learning ........................................................................................ 129 REST APIs ............................................................................................. 130 Online learning ...................................................................................... 130 Introduction to REST APIs ........................................................................ 130 Application Programming Interface (APIs) .......................................... 131 Hyper Text Transfer Protocol (HTTP) ................................................... 131 Client-server architecture....................................................................... 132 Resource ................................................................................................. 133 Flask framework ........................................................................................ 134 Simple flask application ......................................................................... 136 Salary prediction model.......................................................................... 136 ML model user interface ........................................................................... 142 HTML template ..................................................................................... 142 Conclusion .................................................................................................. 145 9. Data Flows in Enterprises .............................................................................. 147 Structure ...................................................................................................... 147 Objectives .................................................................................................... 148 Introducing data pipeline ......................................................................... 148 Designing data pipeline ............................................................................ 149 ETL vs. ELT ................................................................................................. 156 Scheduling jobs ........................................................................................... 157 Messaging queue........................................................................................ 158
xiv  Passing arguments to data pipeline ........................................................ 159 Conclusion .................................................................................................. 165 10. Introduction to Databases .............................................................................. 167 Structure ...................................................................................................... 167 Objectives .................................................................................................... 168 Modern databases and terminology ....................................................... 168 Relational database or SQL database ...................................................... 170 Install PostgreSQL and pgAdmin ......................................................... 170 Set-up a database and table .................................................................... 171 Connect Python to Postgres ................................................................... 173 Modify data pipeline to store in Postgres ............................................... 174 Document-oriented database or No-SQL ............................................... 177 Install MongoDB and compass client .................................................... 178 Create a database and collection ............................................................. 179 Connect Python to MongoDB ............................................................... 179 Modify data pipeline to store in MongoDB .......................................... 181 Graph databases ......................................................................................... 183 Install and start Neo4j ........................................................................... 184 Add nodes and relations ......................................................................... 184 Filesystem as storage ................................................................................. 189 What is Filesystem? ............................................................................... 189 Filesystem as data store .......................................................................... 190 Hierarchy to store CSV .......................................................................... 191 Conclusion .................................................................................................. 191 11. Introduction to Big Data ................................................................................. 193 Structure ...................................................................................................... 193 Objectives .................................................................................................... 194 Introducing Big Data ................................................................................. 194 Definition of Big Data ............................................................................ 195 Introducing Hadoop .................................................................................. 196 Hadoop Distributed File System (HDFS) ............................................. 197 MapReduce ............................................................................................. 198 YARN ..................................................................................................... 199
 xv Hadoop common ..................................................................................... 200 Setting-up a Hadoop Cluster.................................................................... 201 Installing a Hadoop Cluster ................................................................... 201 Starting Hadoop cluster in Docker ........................................................ 201 Word-count MapReduce Program ........................................................... 204 Map program ......................................................................................... 205 Reducer program .................................................................................... 206 MapReduce JAR ..................................................................................... 206 Running Word Count in HDFS Cluster ............................................... 207 Conclusion .................................................................................................. 210 12. DevOps for Data Science ................................................................................211 Structure .......................................................................................................211 Objectives .................................................................................................... 212 Introduction to DevOps ............................................................................ 212 Agile methodology, CI/CD, and DevOps .............................................. 214 DevOps for data science............................................................................ 215 Source Code Management...................................................................... 215 Quality Assurance ................................................................................. 217 Model objects andsecurity ...................................................................... 219 Production deployment .......................................................................... 220 Communication and collaboration ......................................................... 222 Conclusion .................................................................................................. 223 13. Introduction to Cloud Computing ............................................................... 225 Structure ...................................................................................................... 225 Objectives .................................................................................................... 226 Introducing cloud computing .................................................................. 226 Operating system model ........................................................................ 226 What is virtualization? .......................................................................... 227 What is cloud computing? ..................................................................... 229 Types of cloud services .............................................................................. 230 Infrastructure as a Service (IaaS) .......................................................... 231 Platform as a Service (PaaS) .................................................................. 233 Software as a Service (SaaS) .................................................................. 234
xvi  Types of cloud infrastructure ................................................................... 235 Public cloud ............................................................................................ 236 Private cloud .......................................................................................... 237 Hybrid cloud .......................................................................................... 238 Data science and cloud computing ......................................................... 239 Data ........................................................................................................ 239 Compute ................................................................................................. 240 Integration.............................................................................................. 240 Deployment ............................................................................................ 240 Market growth of cloud ............................................................................ 240 Conclusion .................................................................................................. 242 14. Deploy Model to Cloud .................................................................................. 243 Structure ...................................................................................................... 243 Objectives .................................................................................................... 244 Register for GCP free account .................................................................. 244 GCP console ................................................................................................ 245 Create VM and its properties ................................................................... 246 Connecting and uploading code to VM ................................................. 252 Executing Python model on cloud .......................................................... 261 Access the model via browser .................................................................. 262 Scaling the resources in Cloud ................................................................. 263 Conclusion .................................................................................................. 265 15. Introduction to Business Intelligence ......................................................... 267 Structure ...................................................................................................... 267 Objectives .................................................................................................... 268 What is business intelligence? .................................................................. 268 Business intelligence analysis .................................................................. 269 Business intelligence process .................................................................... 271 Step 1: Data awareness .......................................................................... 272 Data types ................................................................................................... 272 Data sources ................................................................................................ 272 Step 2: Store data ................................................................................... 272 Data models................................................................................................. 273
 xvii Data storage ................................................................................................ 273 Step 3: Business needs............................................................................ 273 Key Performance Indicators (KPI) .............................................................. 274 Data Visuals ................................................................................................ 274 Step 4: a Visualization tool .................................................................... 274 Time to insight ............................................................................................ 275 Ease of use ................................................................................................... 275 Step 5: Enable platform .......................................................................... 276 Data access .................................................................................................. 276 Business users ............................................................................................. 276 Business intelligence trends...................................................................... 276 Gartner 2019 Magic Quadrant.................................................................. 277 Conclusion .................................................................................................. 279 16. Data Visulazation Tools .................................................................................. 281 Structure ...................................................................................................... 281 Objectives .................................................................................................... 281 Introduction to data visualization ........................................................... 282 Data visualization types ........................................................................ 283 Data visualization tools ............................................................................. 284 Visualization tool features ...................................................................... 285 Introduction to Microsoft Power BI......................................................... 286 Use case Microsoft Power BI ................................................................. 286 Microsoft Power BI console .................................................................... 287 Load the data .......................................................................................... 288 Create data visuals ................................................................................. 289 Publish the visuals ................................................................................. 293 Conclusion .................................................................................................. 293 17. Industry Use Case 1 - Form Assist ................................................................ 295 Structure ...................................................................................................... 295 Objective ...................................................................................................... 296 Abstract ....................................................................................................... 296 Introduction ................................................................................................ 296 Related Work .............................................................................................. 297
xviii  Proposed work ........................................................................................... 298 Work architecture ................................................................................... 299 NIST dataset .......................................................................................... 300 Activation function – ReLU .................................................................. 302 Dropout .................................................................................................. 302 Data augmentation .................................................................................... 303 Optimization ............................................................................................... 303 Feature extraction ....................................................................................... 304 Image thresholding .................................................................................... 305 Classifier ...................................................................................................... 305 Results .......................................................................................................... 306 Conclusion .................................................................................................. 308 Acknowledgment ....................................................................................... 309 References ................................................................................................... 309 18. Industry Use Case 2 - People Reporter .........................................................311 Structure .......................................................................................................311 Objective .......................................................................................................311 Abstract ....................................................................................................... 312 Introduction ................................................................................................ 312 Event detection ........................................................................................... 313 Work architecture ....................................................................................... 315 Results .......................................................................................................... 317 Nipah virus outbreak in Kerala .............................................................. 317 CSK enters the final of IPL 2018: .......................................................... 318 OnePlus 6 launched in India ................................................................. 319 Conclusion .................................................................................................. 319 Acknowledgment ....................................................................................... 320 References ................................................................................................... 321 19. Data Science Learning Resources ................................................................. 323 Structure ...................................................................................................... 323 Objective ...................................................................................................... 324 Books ............................................................................................................ 324 Online courses ............................................................................................ 324
 xix Competitions .............................................................................................. 325 Blogs and magazines ................................................................................. 325 University courses ..................................................................................... 326 Conferences and events............................................................................. 326 Meet-ups and interest groups .................................................................. 326 YouTube channels and Podcasts .............................................................. 327 Analytic reports and white paper ............................................................ 327 Talk to people ............................................................................................. 327 Conclusion .................................................................................................. 327 20. Do It Your Self Challenges ............................................................................ 329 Structure ...................................................................................................... 329 Objectives .................................................................................................... 329 DIY challenge 1 – Analyzing the pathological slide for blood analysis ....................................................................................... 330 Challenge overview ................................................................................ 330 Challenge statement ............................................................................... 330 Target users ............................................................................................ 330 Resources ................................................................................................ 331 IP source ................................................................................................. 331 DIY challenge 2 – IoT based weather monitoring system .................... 331 Challenge overview ................................................................................ 331 Challenge statement ............................................................................... 331 Target Users ........................................................................................... 332 Resources ................................................................................................ 332 IP source ................................................................................................. 332 DIY challenge 3 – Facial image-based BMI calculator .......................... 332 Challenge overview ................................................................................ 332 Challenge statement ............................................................................... 332 Target users ............................................................................................ 333 Resources ................................................................................................ 333 IP source ................................................................................................. 333 DIY challenge 4 – Chatbot assistant for Tourism in North East .......... 333 Challenge overview ................................................................................ 333 Challenge statement ............................................................................... 334
xx  Target users ............................................................................................ 334 Resources ................................................................................................ 334 IP source ................................................................................................. 334 DIY challenge 5 – Assaying and grading of fruits for e-procurement ...................................................................................... 334 Challenge overview ................................................................................ 334 Challenge statement ............................................................................... 335 Target users ............................................................................................ 335 Resources ................................................................................................ 335 IP source ................................................................................................. 335 Conclusion .................................................................................................. 335 21. Qs for DS Assessment .................................................................................... 337 Structure ...................................................................................................... 337 Objectives .................................................................................................... 338 Data Science Overview ............................................................................. 338 Mathematics Essentials ............................................................................. 339 Statistics Essentials..................................................................................... 339 Exploratory Data Analysis ........................................................................ 340 Data Preprocessing .................................................................................... 341 Feature Engineering ................................................................................... 341 Machine Learning Algorithms ................................................................. 341 Productionizing Machine Learning Models .......................................... 342 Data Flows in Enterprises ......................................................................... 342 Introduction to Databases ......................................................................... 343 Introduction to Big Data ........................................................................... 343 DevOps for Data Science .......................................................................... 344 Introduction to Cloud Computing .......................................................... 344 Deploy Model to Cloud ............................................................................ 345 Introduction to Business Intelligence ...................................................... 345 Data Visualization Tools ........................................................................... 346 Conclusion .................................................................................................. 346