Kyran Dale Data Visualization with Python and JavaScript Scrape, Clean, Explore, and Transform Your Data 2nd Edition
DATA SCIENCE “Kyran’s book includes a wealth of information, covering the minutiae of D3.js through to building a database-backed API that’s consumed by a custom interactive dashboard. It’s safe to say you’ll learn a huge amount with this book!” —Peter Cook Author of D3 Start to Finish Data Visualization with Python and JavaScript US $65.99 CAN $82.99 ISBN: 978-1-098-11187-8 Twitter: @oreillymedia linkedin.com/company/oreilly-media youtube.com/oreillymedia How do you turn raw, unprocessed data into dynamic interactive web visualizations? In this practical book, author Kyran Dale shows data scientists and analysts—as well as Python and JavaScript developers—how to create the ideal toolchain for the job. By providing engaging examples and sharing hard-earned good practices, this guide teaches you how to exploit best-of-breed Python and JavaScript libraries. Python provides powerful, mature libraries for scraping, cleaning, and processing data. JavaScript is the best language when it comes to programming web visualizations. Together, these two languages complement each other perfectly to help you create a modern web-visualization toolchain. This book gets you started. You’ll learn how to: • Obtain your data using scraping or web APIs (Requests, Scrapy, Beautiful Soup) • Clean and process data using Python’s heavyweight data processing libraries within the NumPy ecosystem (Jupyter notebooks with pandas, Matplotlib, and Seaborn) • Deliver the data to a browser with static files or with a lightweight Python server (a Flask RESTful API) • Pick up enough web development skills (HTML, CSS, JavaScript) to visualize your data on the web • Use your mined and refined data to create web charts and visualizations (Plotly, D3) Kyran Dale is a jobbing programmer, ex-research scientist, recreational hacker, independent researcher, and occasional entrepreneur. During 15-odd years as a research scientist he’s hacked a lot of code, learned a lot of libraries, and settled on some favorite tools. Kyran finds that Python, JavaScript, and a little C++ go a long way to solving most problems out there.
Kyran Dale Data Visualization with Python and JavaScript Scrape, Clean, Explore, and Transform Your Data SECOND EDITION Boston Farnham Sebastopol TokyoBeijing
978-1-098-11187-8 [LSI] Data Visualization with Python and JavaScript by Kyran Dale Copyright © 2023 Kyran Dale Limited. All rights reserved. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://oreilly.com). For more information, contact our corporate/institutional sales department: 800-998-9938 or corporate@oreilly.com. Acquisitions Editor: Michelle Smith Development Editor: Shira Evans Production Editor: Kristen Brown Copyeditor: Liz Wheeler Proofreader: Piper Editorial Consulting, LLC Indexer: Ellen Troutman-Zaig Interior Designer: David Futato Cover Designer: Karen Montgomery Illustrator: Kate Dullea July 2016: First Edition December 2022: Second Edition Revision History for the Second Edition 2022-12-07: First Release See http://oreilly.com/catalog/errata.csp?isbn=9781098111878 for release details. The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Data Visualization with Python and JavaScript, the cover image, and related trade dress are trademarks of O’Reilly Media, Inc. While the publisher and the author have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the author disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights.
Table of Contents Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxi Part I. Basic Toolkit 1. Development Setup. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 The Accompanying Code 3 Python 3 Anaconda 3 Installing Extra Libraries 4 Virtual Environments 4 JavaScript 6 Content Delivery Networks 6 Installing Libraries Locally 6 Databases 7 Getting MongoDB Up and Running 7 Easy MongoDB with Docker 8 Integrated Development Environments 8 Summary 9 2. A Language-Learning Bridge Between Python and JavaScript. . . . . . . . . . . . . . . . . . . . 11 Similarities and Differences 11 Interacting with the Code 12 Python 13 JavaScript 13 Basic Bridge Work 15 iii
Style Guidelines, PEP 8, and use strict 15 CamelCase Versus Underscore 15 Importing Modules, Including Scripts 16 JavaScript Modules 18 Keeping Your Namespaces Clean 19 Outputting “Hello World!” 20 Simple Data Processing 21 String Construction 22 Significant Whitespace Versus Curly Brackets 23 Comments and Doc-Strings 24 Declaring Variables Using let or var 25 Strings and Numbers 26 Booleans 27 Data Containers: dicts, objects, lists, Arrays 27 Functions 29 Iterating: for Loops and Functional Alternatives 30 Conditionals: if, else, elif, switch 32 File Input and Output 32 Classes and Prototypes 33 Differences in Practice 39 Method Chaining 39 Enumerating a List 39 Tuple Unpacking 40 Collections 41 Underscore 42 Functional Array Methods and List Comprehensions 43 Map, Reduce, and Filter with Python’s Lambdas 45 JavaScript Closures and the Module Pattern 46 A Cheat Sheet 49 Summary 51 3. Reading and Writing Data with Python. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 Easy Does It 53 Passing Data Around 54 Working with System Files 54 CSV, TSV, and Row-Column Data Formats 56 JSON 58 Dealing with Dates and Times 59 SQL 62 Creating the Database Engine 62 Defining the Database Tables 63 Adding Instances with a Session 65 iv | Table of Contents
Querying the Database 66 Easier SQL with Dataset 69 MongoDB 71 Dealing with Dates, Times, and Complex Data 75 Summary 77 4. Webdev 101. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 The Big Picture 79 Single-Page Apps 80 Tooling Up 80 The Myth of IDEs, Frameworks, and Tools 82 A Text-Editing Workhorse 83 Browser with Development Tools 83 Terminal or Command Prompt 84 Building a Web Page 84 Serving Pages with HTTP 84 The DOM 85 The HTML Skeleton 85 Marking Up Content 87 CSS 89 JavaScript 91 Data 92 Chrome DevTools 92 The Elements Tab 92 The Sources Tab 93 Other Tools 94 A Basic Page with Placeholders 95 Positioning and Sizing Containers with Flex 98 Filling the Placeholders with Content 104 Scalable Vector Graphics 105 The <g> Element 105 Circles 105 Applying CSS Styles 107 Lines, Rectangles, and Polygons 108 Text 109 Paths 110 Scaling and Rotating 113 Working with Groups 113 Layering and Transparency 115 JavaScripted SVG 116 Summary 118 Table of Contents | v
Part II. Getting Your Data 5. Getting Data Off the Web with Python. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 Getting Web Data with the Requests Library 121 Getting Data Files with Requests 122 Using Python to Consume Data from a Web API 124 Consuming a RESTful Web API with Requests 126 Getting Country Data for the Nobel Dataviz 128 Using Libraries to Access Web APIs 129 Using Google Spreadsheets 130 Using the Twitter API with Tweepy 132 Scraping Data 134 Why We Need to Scrape 135 Beautiful Soup and lxml 136 A First Scraping Foray 136 Getting the Soup 137 Selecting Tags 138 Crafting Selection Patterns 140 Caching the Web Pages 143 Scraping the Winners’ Nationalities 143 Summary 147 6. Heavyweight Scraping with Scrapy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 Setting Up Scrapy 150 Establishing the Targets 151 Targeting HTML with Xpaths 153 Testing Xpaths with the Scrapy Shell 154 Selecting with Relative Xpaths 157 A First Scrapy Spider 158 Scraping the Individual Biography Pages 164 Chaining Requests and Yielding Data 167 Caching Pages 167 Yielding Requests 168 Scrapy Pipelines 172 Scraping Text and Images with a Pipeline 173 Specifying Pipelines with Multiple Spiders 179 Summary 180 vi | Table of Contents
Part III. Cleaning and Exploring Data with pandas 7. Introduction to NumPy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 The NumPy Array 184 Creating Arrays 186 Array Indexing and Slicing 187 A Few Basic Operations 188 Creating Array Functions 190 Calculating a Moving Average 190 Summary 191 8. Introduction to pandas. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 Why pandas Is Tailor-Made for Dataviz 193 Why pandas Was Developed 193 Categorizing Data and Measurements 194 The DataFrame 195 Indices 196 Rows and Columns 197 Selecting Groups 197 Creating and Saving DataFrames 199 JSON 200 CSV 201 Excel Files 202 SQL 204 MongoDB 205 Series into DataFrames 207 Summary 210 9. Cleaning Data with pandas. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 Coming Clean About Dirty Data 211 Inspecting the Data 213 Indices and pandas Data Selection 216 Selecting Multiple Rows 218 Cleaning the Data 220 Finding Mixed Types 220 Replacing Strings 220 Removing Rows 222 Finding Duplicates 223 Sorting Data 225 Removing Duplicates 226 Dealing with Missing Fields 230 Dealing with Times and Dates 232 Table of Contents | vii
The Full clean_data Function 235 Adding the born_in column 236 Merging DataFrames 238 Saving the Cleaned Datasets 239 Summary 240 10. Visualizing Data with Matplotlib. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243 pyplot and Object-Oriented Matplotlib 243 Starting an Interactive Session 244 Interactive Plotting with pyplot’s Global State 245 Configuring Matplotlib 246 Setting the Figure’s Size 247 Points, Not Pixels 247 Labels and Legends 247 Titles and Axes Labels 248 Saving Your Charts 250 Figures and Object-Oriented Matplotlib 250 Axes and Subplots 251 Plot Types 255 Bar Charts 255 Scatter Plots 259 seaborn 263 FacetGrids 266 PairGrids 270 Summary 272 11. Exploring Data with pandas. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273 Starting to Explore 274 Plotting with pandas 276 Gender Disparities 277 Unstacking Groups 278 Historical Trends 281 National Trends 284 Prize Winners Per Capita 285 Prizes by Category 287 Historical Trends in Prize Distribution 289 Age and Life Expectancy of Winners 295 Age at Time of Award 296 Life Expectancy of Winners 298 Increasing Life Expectancies over Time 301 The Nobel Diaspora 302 Summary 304 viii | Table of Contents
Part IV. Delivering the Data 12. Delivering the Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307 Serving the Data 308 Organizing Your Flask Files 309 Serving Data with Flask 309 Delivering Data Files 312 Dynamic Data with Flask APIs 317 A Simple Data API with Flask 317 Using Static or Dynamic Delivery 319 Summary 319 13. RESTful Data with Flask. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321 The Tools for a RESTful Job 321 Creating the Database 322 A Flask RESTful Data Server 323 Serializing with marshmallow 324 Adding our RESTful API Routes 325 Posting Data to the API 328 Extending the API with MethodViews 330 Paginating the Data Returns 332 Deploying the API Remotely with Heroku 335 CORS 336 Consuming the API Using JavaScript 337 Summary 338 Part V. Visualizing Your Data with D3 and Plotly 14. Bringing Your Charts to the Web with Matplotlib and Plotly. . . . . . . . . . . . . . . . . . . . 341 Static Charts with Matplotlib 341 Adapting to Screen Sizes 344 Using Remote Images or Assets 345 Charting with Plotly 345 Basic Charts 346 Plotly Express 347 Plotly Graph-Objects 348 Mapping with Plotly 349 Adding Custom Controls with Plotly 354 From Notebook to Web with Plotly 357 Native JavaScript Charts with Plotly 359 Fetching JSON Files 362 Table of Contents | ix
User-Driven Plotly with JavaScript and HTML 365 Summary 369 15. Imagining a Nobel Visualization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371 Who Is It For? 371 Choosing Visual Elements 372 Menu Bar 373 Prizes by Year 374 A Map Showing Selected Nobel Countries 374 A Bar Chart Showing Number of Winners by Country 376 A List of the Selected Winners 376 A Mini-Biography Box with Picture 377 The Complete Visualization 378 Summary 379 16. Building a Visualization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381 Preliminaries 382 Core Components 382 Organizing Your Files 382 Serving the Data 383 The HTML Skeleton 384 CSS Styling 387 The JavaScript Engine 391 Importing the Scripts 391 Modular JS with Imports 392 Basic Data Flow 393 The Core Code 393 Initializing the Nobel Prize Visualization 395 Ready to Go 396 Data-Driven Updates 398 Filtering Data with Crossfilter 400 Running the Nobel Prize Visualization App 403 Summary 404 17. Introducing D3—The Story of a Bar Chart. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405 Framing the Problem 406 Working with Selections 406 Adding DOM Elements 410 Leveraging D3 415 Measuring Up with D3’s Scales 416 Quantitative Scales 417 Ordinal Scales 419 x | Table of Contents
Unleashing the Power of D3 with Data Binding/Joining 420 Updating the DOM with Data 421 Putting the Bar Chart Together 424 Axes and Labels 426 Transitions 432 Updating the Bar Chart 436 Summary 436 18. Visualizing Individual Prizes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 437 Building the Framework 437 Scales 438 Axes 439 Category Labels 440 Nesting the Data 441 Adding the Winners with a Nested Data-Join 444 A Little Transitional Sparkle 447 Updating the Bar Chart 449 Summary 449 19. Mapping with D3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 451 Available Maps 452 D3’s Mapping Data Formats 452 GeoJSON 453 TopoJSON 454 Converting Maps to TopoJSON 456 D3 Geo, Projections, and Paths 456 Projections 458 Paths 460 graticules 461 Putting the Elements Together 462 Updating the Map 465 Adding Value Indicators 468 Our Completed Map 470 Building a Simple Tooltip 471 Updating the Map 474 Summary 475 20. Visualizing Individual Winners. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 477 Building the List 478 Building the Bio-Box 481 Updating the Winners List 484 Summary 484 Table of Contents | xi
21. The Menu Bar. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 485 Creating HTML Elements with D3 486 Building the Menu Bar 486 Building the Category Selector 487 Adding the Gender Selector 489 Adding the Country Selector 490 Wiring Up the Metric Radio Button 494 Summary 495 22. Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 497 Recap 497 Part I: Basic Toolkit 497 Part II: Getting Your Data 498 Part III: Cleaning and Exploring Data with pandas 498 Part IV: Delivering the Data 499 Part V: Visualizing Your Data with D3 and Plotly 499 Future Progress 500 Visualizing Social Media Networks 500 Machine-Learning Visualizations 500 Final Thoughts 501 A. D3’s enter/exit Pattern. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 503 Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 511 xii | Table of Contents
Preface The chief ambition of this book is to describe a data visualization (dataviz) toolchain that, in the era of the internet, is starting to predominate. The guiding principle of this toolchain is that whatever insightful nuggets you have managed to mine from your data deserve a home on the web browser. Being on the web means you can easily choose to distribute your dataviz to a select few (using authentication or restricting to a local network) or the whole world. This is the big idea of the internet and one that dataviz is embracing at a rapid pace. And that means that the future of dataviz involves JavaScript, the only first-class language of the web browser. But JavaScript does not yet have the data-processing stack needed to refine raw data, which means data visualization is inevitably a multilanguage affair. I hope this book provides support for my belief that Python is the natural complementary language to JavaScript’s monopoly of browser visualizations. Although this book is a big one (that fact is felt most keenly by the author right now), it has had to be very selective, leaving out a lot of really cool Python and JavaScript dataviz tools and focusing on the ones that provide the best building blocks. The number of helpful libraries I couldn’t cover reflects the enormous vitality of the Python and JavaScript data science ecosystems. Even while the book was being written, brilliant new Python and JavaScript libraries were being introduced, and the pace continues. All data visualization is essentially transformative, and showing the journey from one reflection of a dataset (HTML tables and lists) to a more modern, engaging, interactive, and, fundamentally, browser-based one provides a good way to introduce key data visualization tools in a working context. The challenge is to transform a basic Wikipedia list of Nobel Prize winners into a modern, interactive, browser-based visualization. Thus, the same dataset is presented in a more accessible, engaging form. The journey from unprocessed data to a fairly rich, user-driven visualization informs the choice of best-of-breed tools. First, we need to get our dataset. Often this is provided by a colleague or client, but to increase the challenge and learn some xiii
1 There are many versions of JavaScript based on ECMAScript, but the most significant version, which provides the bulk of new functionality, is ECMAScript 6. pretty vital dataviz skills along the way, we learn how to scrape the dataset from the web (Wikipedia’s Nobel Prize pages) using Python’s powerful Scrapy library. This unprocessed dataset then needs to be refined and explored, and there isn’t a much better ecosystem for this than Python’s pandas. Along with Matplotlib in support and driven by a Jupyter notebook, pandas is becoming the gold standard for this kind of forensic data work. With clean data stored (to SQL with SQLAlchemy and SQLLite) and explored, the cherry-picked data stories can be visualized. I cover the use of Matplotlib and Plotly to embed static and dynamic charts from Python to a web page. But for something more ambitious, the supreme dataviz library for the web is the JavaScript-based D3. We cover the essentials of D3 while using them to produce our showpiece Nobel data visualization. This book is a collection of tools forming a chain, with the creation of the Nobel visualization providing a guiding narrative. You should be able to dip into relevant chapters when and if the need arises; the different parts of the book are self-contained so you can quickly review what you’ve learned when required. This book is divided into five parts. The first part introduces a basic Python and JavaScript dataviz toolkit, while the next four show how to retrieve raw data, clean it, explore it, and finally transform it into a modern web visualization. Let’s summarize the key lessons of each part now. Part I: Basic Toolkit Our basic toolkit consists of: • A language-learning bridge between Python and JavaScript. This is designed to smooth the transition between the two languages, highlighting their many similarities and setting the scene for the bilingual process of modern dataviz. With the advent of the latest JavaScript,1 Python and JavaScript have even more in common, making switching between them that much less stressful. • Being able to read from and write to the key data formats (e.g., JSON and CSV) and databases (both SQL and NoSQL) with ease is one of Python’s great strengths. We see how easy it is to pass data around in Python, translating formats and changing databases as we go. This fluid movement of data is the main lubricant of any dataviz toolchain. • We cover the basic web development (webdev) skills needed to start producing modern, interactive, browser-based dataviz. By focusing on the concept of the single-page application rather than building whole websites, we minimize con‐ ventional webdev and place the emphasis on programming your visual creations xiv | Preface
in JavaScript. An introduction to Scalable Vector Graphics (SVG), the chief building block of D3 visualizations, sets the scene for the creation of our Nobel Prize visualization in Part V. Part II: Getting Your Data In this part of the book, we look at how to get data from the web using Python, assuming a nice, clean data file hasn’t been provided to the data visualizer: • If you’re lucky, a clean file in an easily usable data format (i.e., JSON or CSV) is at an open URL, a simple HTTP request away. Alternatively, there may be a dedicated web API for your dataset, with any luck a RESTful one. As an example, we look at using the Twitter API (via Python’s Tweepy library). We also see how to use Google spreadsheets, a widely used data-sharing resource in dataviz. • Things get more involved when the data of interest is present on the web in human-readable form, often in HTML tables, lists, or hierarchical content blocks. In this case, you have to resort to scraping, getting the raw HTML content and then using a parser to make its embedded content available. We see how to use Python’s lightweight Beautiful Soup scraping library and the much more feature‐ ful and heavyweight Scrapy, the biggest star in the Python scraping firmament. Part III: Cleaning and Exploring Data with pandas In this part, we turn the big guns of pandas, Python’s powerful programmatic spread‐ sheet, onto the problem of cleaning and then exploring datasets. We first see how pandas is part of Python’s NumPy ecosystem, which leverages the power of very fast, powerful low-level array processing libraries, while making them accessible. The focus is on using pandas to clean and then explore our Nobel Prize dataset: • Most data, even that which comes from official web APIs, is dirty. And making it clean and usable will occupy far more of your time as a data visualizer than you probably anticipated. Taking the Nobel dataset as an example, we progressively clean it, searching for dodgy dates, anomalous datatypes, missing fields, and all the common grime that needs cleaning before you can start to explore and then transform your data into a visualization. • With our clean (as we can make it) Nobel Prize dataset in hand, we see how easy it is to use pandas and Matplotlib to interactively explore data, easily creating inline charts, slicing the data every which way, and generally getting a feel for it, while looking for those interesting nuggets you want to deliver with visualization. Preface | xv
Part IV: Delivering the Data In this part, we see how easy it is to create a minimal data API using Flask, to deliver data both statically and dynamically to the web browser: First, we see how to use Flask to serve static files and then how to roll your own basic data API, serving data from a local database. Flask’s minimalism allows you to create a very thin data-serving layer between the fruits of your Python data processing and their eventual visualization on the browser. The glory of open source software is that you can often find robust, easy-to-use libraries that solve your problem better than you could. In the second chapter of this part, we see how easy it is to use best-of-breed Python (Flask) libraries to craft a robust, flexible RESTful API, ready to serve your data online. We also cover the easy online deployment of this data server using Heroku, a favorite of Pythonistas. Part V: Visualizing Your Data with D3 and Plotly In the first chapter of this part, we see how to take the fruits of your pandas-driven exploration, in the form of charts or maps, and put them on the web, where they belong. Matplotlib can produce publication-standard static charts while Plotly brings user controls and dynamic charts to the table. We see how to take a Plotly chart directly from a Jupyter notebook and put it in a web page. The part of the book that covers D3 is some of the most challenging, but you may well end up being employed to construct the kind of multielement visualizations it produces. One of the joys of D3 is the huge number of examples that can easily be found online, but most of them demonstrate a single technique and there are few showing how to orchestrate multiple visual elements. In these D3 chapters, we see how to synchronize the update of a timeline (featuring all the Nobel Prizes), a map, a bar chart, and a list as the user filters the Nobel Prize dataset or changes the prize-winning metric (absolute or per capita). Mastery of the core themes demonstrated in these chapters should allow you to let loose your imagination and learn by doing. I’d recommend choosing some data close to your heart and designing a D3 creation around it. The Second Edition I was a little reluctant when O’Reilly offered me the opportunity of writing a second edition of this book. The first edition ended up larger than anticipated, and updating and augmenting it was potentially a lot of work. However, after reviewing the status of the libraries covered and changes to the Python and JavaScript dataviz ecosystem, it was clear that most of the libraries used (e.g., Scrapy, NumPy, pandas) were still solid choices and needed fairly small updates. xvi | Preface
D3 was the library that had changed the most, but these changes had made D3 both easier to use and easier to teach. JavaScript modules were also solidly in place, making the code cleaner and more familiar to a Pythonista. A few Python libraries no longer seemed like solid choices and a couple had been deprecated. The first edition dealt fairly extensively with MongoDB, a NoSQL data‐ base. I now think that good old-fashioned SQL is a better fit for dataviz work and that the minimal file-based, serverless SQLite represents a dataviz sweet spot if a database is required. Rather than replace the deprecated RESTful data server with another Python library, I thought it would be particularly instructive to build a simple one from scratch, demonstrating the use of some brilliant Python libraries, such as marshmallow, which are useful in many dataviz scenarios. With the time available for updating the book, I made the decision to use the first book’s dataset for demonstrating exploration and analysis with Matplotlib and pan‐ das, focusing on updating all the libraries to their current (as of mid-2022) versions. This allowed time to be spent on new material, chief of which is a chapter dedicated to Python’s Plotly library, which allows you to easily transfer exploratory work from a Jupyter notebook to a web presentation with user interactions. A particular strength of this approach is the availability of Mapbox maps, a rich mapping ecosystem. The main thrust of the second edition was: • To bring all the libraries up to date. • To remove and/or replace libraries that hadn’t stood the test of time. • To add some new material suggested by changes in the fast-developing world of Python and JavaScript dataviz. The metaphor of the dataviz toolchain still holds good, I think, and the transforma‐ tive pipeline, from raw, unprocessed web data through exploratory dataviz-driven analysis to polished web visualization, remains a good way to learn the key tools of the job. Conventions Used in This Book The following typographical conventions are used in this book: Italic Indicates new terms, URLs, email addresses, filenames, and file extensions. Preface | xvii
Constant width Used for program listings, as well as within paragraphs to refer to program elements such as variable or function names, databases, datatypes, environment variables, statements, and keywords. Constant width bold Shows commands or other text that should be typed literally by the user. Constant width italic Shows text that should be replaced with user-supplied values or by values deter‐ mined by context. This element signifies a tip or suggestion. This element signifies a general note. This element indicates a warning or caution. Using Code Examples Supplemental material (code examples, exercises, etc.) is available for download at https://github.com/Kyrand/dataviz-with-python-and-js-ed-2. This book is here to help you get your job done. In general, if example code is offered with this book, you may use it in your programs and documentation. You do not need to contact us for permission unless you’re reproducing a significant portion of the code. For example, writing a program that uses several chunks of code from this book does not require permission. Selling or distributing a CD-ROM of examples from O’Reilly books does require permission. Answering a question by citing this book and quoting example code does not require permission. Incorporating a signifi‐ cant amount of example code from this book into your product’s documentation does require permission. xviii | Preface
Comments 0
Loading comments...
Reply to Comment
Edit Comment