Data Modeling for MongoDB. Building Well-Designed and Supportable MongoDB Databases (Steve Hoberman)（Z-Library）

(This page has no text content)

Building Well-Designed and Supportable MongoDB Databases first edition Steve Hoberman

Published by: Technics Publications, LLC 2 Lindsley Road Basking Ridge, NJ 07920 USA http://www.TechnicsPub.com Cover design by Mark Brye Edited by Carol Lehn and Erin Elizabeth Long Technical reviews by Rob Garrison and Richard Kreuter All rights reserved. No part of this book may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording or by any information storage and retrieval system, without written permission from the publisher, except for the inclusion of brief quotations in a review. The author and publisher have taken care in the preparation of this book, but make no expressed or implied warranty of any kind and assume no for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. MongoDB® is a registered trademark of MongoDB, Inc. All other trademarks are property of their respective owners and should be treated as such. Copyright © 2014 by Technics Publications, LLC ISBN, print ed. 978-1-935504-70-2 ISBN, ePub ed. 978-1-935504-71-9 First Printing 2014 Library of Congress Control Number: 2014938921

To my brother, Gary, who is not only an impressive techno-wiz and CIO, but also knows how to apply technology to make amazing things happen.

Table of Contents Foreword By Ryan Smith, Information Architect at Nike Introduction Conventions Used in This Book Section I Getting Started Chapter 1 The Power of Data Modeling Many Forms to Represent an Information Landscape Confirming and Documenting Different Perspectives Data Modeling Is Not Optional! Embarking on Our Publishing Adventure EXERCISE 1: Life Without Data Modeling? Chapter 2 The Power of NoSQL and MongoDB NoSQL vs. the Traditional Relational Database Four Types of NoSQL Databases MongoDB Is a Document-Oriented NoSQL Database Installing MongoDB Chapter 3 MongoDB Objects Entities Attributes Domains Relationships Keys

Chapter 4 MongoDB Functionality Adding Data in MongoDB Querying Data in MongoDB Updating Data in MongoDB Deleting Data in MongoDB EXERCISE 4: MongoDB Functionality Section II Levels of Granularity Chapter 5 Conceptual Data Modeling Concept Explanation Conceptual Data Modeling Approach EXERCISE 5: Conceptual Data Modeling Mindset Chapter 6 Logical Data Modeling Logical Data Modeling Approach EXERCISE 6: Logical Data Modeling Mindset Chapter 7 Physical Data Modeling Physical Data Modeling Approach Section III Case Study Chapter 8 Survey Data Entry Case Study Background Conceptual Data Modeling Logical Data Modeling Physical Data Modeling APPENDIX A Answers to Exercises EXERCISE 1: Life without Data Modeling? EXERCISE 2: Subtyping in MongoDB EXERCISE 3: Interpreting Queries EXERCISE 4: MongoDB Functionality EXERCISE 5: Conceptual Data Modeling Mindset EXERCISE 6: Logical Data Modeling Mindset EXERCISE 7: Embed or Reference APPENDIX B References

APPENDIX C Glossary Index

Foreword By Ryan Smith, Information Architect at Nike How do you design for a database platform which doesn’t require design? Should you design your data at all, or just start building? I’d like to think I have a somewhat unique perspective on this, working at Nike where I have a front row seat to world class product design and innovation. Perspectives vary on the importance of designing data these days. Traditional databases fundamentally force a certain level of design, to define the structure of your information before content can be inserted. But many NoSQL databases will accept whatever data format you throw at them, and you can completely change that format from one record to the next. So in theory you don’t need to have a plan for your data at all! I strongly believe that you need to have a plan. Designing your data brings clarity. It exposes what you do and don’t know about it, and what confusion may exist about it within your team and organization. Designing data is an essential form of leadership, charting a course and testing ideas, bringing forward a clear story while being open to changing it based on new learning. Most importantly, design is about understanding the consumer, knowing what audiences will be using the system and what they need the data to do for them. Data modeling can greatly accelerate development, improve the quality of the end product, reduce maintenance costs, and multiply the value of what has been built by establishing a common understanding of the information. In many ways up-front data design with NoSQL databases can actually be more important than it is with traditional relational databases. For example, since many of these databases don’t join data sets, their fast-retrieval performance potential excels only if the collections you’ve created contain all or most of the information required

by a given request. If you have to retrieve a large percentage of your database content in order to scan and filter it down to a relevant result set, you would have been better off writing SQL to join tables within a relational database. But as I have seen in my experience, good data design practices can lead to excellent performance. Beyond the performance topic, NoSQL databases with flexible schema capabilities similarly require more discipline in aligning to a common information model. For example, MongoDB would not hesitate to allow you to sequentially save four records in the same collection with field names of: zipCode, zipcode, ZipCode, and postalCode, respectively. Each of these variations will be treated as a new field, with no warnings given. Everything will work great until you ask for your zipCode values back and only one document out of four has a field by that name. The flexible schema is a great innovation for quick evolution of your data model, and yet it requires discipline to harvest the benefits without experiencing major data quality issues and other frustrations as a result. Data modeling is essential for success, but it’s not rocket science, and with this book it is easier than ever to implement effectively. Written by an exceptional teacher of data modeling, these chapters clearly explain the process needed to navigate these new capabilities with confidence. Anyone who has taken Steve’s Data Modeling Master Class can attest to his passion for both data and teaching. In reading the manuscript, it was immediately evident to me that his gift for teaching is here in his writing every bit as much as when he presents in person. Steve carefully crafts his explanations so that even the more abstract concepts can be easily grasped and internalized. Steve’s broad consulting experience also shows through. Seth Godin states in his book Linchpin that there is a “thrashing” process inherent in the creation of any product, meaning the brainstorming and iteration of different ideas and approaches. He writes, “Thrashing is essential. The question is: when to thrash? In the typical amateur project, all the thrashing is near the end…Professional creators thrash early.” Anyone who has found themselves at the tail end of a poorly-executed project knows all about late thrashing. Early and ongoing modeling adds tremendous value, no matter how agile your methodology. Steve’s book is about how to work through uncertainties like a professional, so you can evolve your data models over time without devolving into chaos. May your data models provide calm and clarity which allow your work to thrive, so that you can focus on what matters most: your team, your objectives, and your consumers.

Introduction Congratulations! You completed the MongoDB application within the given tight timeframe, and there is a party to celebrate your application’s release into production. Although people are congratulating you at the celebration, you are feeling some uneasiness inside. To complete the project on time required making a lot of assumptions about the data such as what terms meant and how calculations were derived. In addition, the poor documentation about the application will be of limited use to the support team, and not investigating all of the inherent rules in the data may eventually lead to poorly performing structures in the not-so-distant future. Now, what if you had a time machine and could go back and read this book before starting the project. You would learn that even NoSQL databases like MongoDB require some level of data modeling. Data modeling is the process of learning about the data, and regardless of technology, this process must be performed for a successful application. You would learn the value of conceptual, logical, and physical data modeling and how each stage increases our knowledge of the data and reduces assumptions and poor design decisions. Read this book to learn how to do data modeling for MongoDB applications and accomplish these five objectives: 1. Understand how data modeling contributes to the process of learning about the data and is, therefore, a required technique even when the resulting database is not relational. That is, NoSQL does not mean NoDataModeling! 2. Know how NoSQL databases differ from traditional relational databases and where MongoDB fits.

3. Explore each MongoDB object and comprehend how each compares to its data modeling and traditional relational database counterparts, as well as learn the basics of adding, querying, updating, and deleting data in MongoDB. 4. Practice a streamlined, template-driven approach to performing conceptual, logical, and physical data modeling. Recognize that data modeling does not always have to lead to traditional data models! 5. Know the difference between top-down and bottom-up data modeling approaches and complete a top-down case study. This book is written for anyone who is working with, or will be working with, MongoDB including business analysts, data modelers, database administrators, developers, project managers, and data scientists. There are three sections: In Section I, Getting Started, we will reveal the power of data modeling and the important connections to data models that exist when designing any type of database (Chapter 1); compare NoSQL with traditional relational databases and find where MongoDB fits (Chapter 2); explore each MongoDB object and comprehend how each compares to its data modeling and traditional relational database counterparts (Chapter 3); and explain the basics of adding, querying, updating, and deleting data in MongoDB (Chapter 4). I n Section II, Levels of Granularity, we cover Conceptual Data Modeling (Chapter 5), Logical Data Modeling (Chapter 6), and Physical Data Modeling (Chapter 7). Notice the “ing” at the end of each of these chapters. We focus on the process of building each of these models, which is where we gain essential business knowledge. I n Section III, Case Study, we will explain both top-down and bottom-up development approaches and complete a top-down case study where we start with conceptual data modeling and end with a MongoDB database. This case study will tie together the conceptual, logical, and physical techniques from Section II. Key points are included at the end of each chapter as a way to reinforce concepts. In addition, this book is loaded with hands-on exercises along with their answers, which are provided in Appendix A. Appendix B contains all of the book’s references, and Appendix C contains a glossary of the terms used throughout the text. There is also a

comprehensive index. CONVENTIONS USED IN THIS BOOK We will use the shortcut RDBMS for Relational Database Management System. RDBMS represents the traditional relational database invented by E. F. Codd at IBM in 1970 and first commercially available in 1979 (which was Oracle) [Wikipedia]. Popular RDBMS databases today include Oracle, Sybase, Microsoft SQL Server, and Teradata. We use the Embarcadero ER/Studio ® Data Architect tool to build our data models throughout the text. Learn more about this tool at this website: http://www.embarcadero.com/products/er-studio-data-architect. There is an important distinction between the term relational database and the term relational. The term relational database refers to the technology on how the data is stored, whereas the term relational implies the technique of modeling business rules and applying normalization. The term object includes any data model component such as entities, attributes, and relationships. Objects also include any MongoDB component such as fields, documents, and collections. We make use of the following simple conventions: Object names Customer Last Name is an attribute of Customer. MongoDB code db.account.find( ) Object instances Bob Smith is an instance of Student. I am a firm believer in learning through playing. We might as well have fun learning, so throughout the book, “play” and build your own data models and MongoDB collections. It is very easy to get started with a MongoDB server and client on your computer, as you will see in Chapter 2. I hope you will realize as I do that data modeling and MongoDB go together like peanut butter and jelly!

(This page has no text content)

Section I Getting Started

In this section we will reveal the power of data modeling and the important connections to data models that exist when designing any type of database (Chapter 1); compare NoSQL with traditional relational databases and where MongoDB fits (Chapter 2); explore each MongoDB object and comprehend how each compares to its data modeling and traditional relational database counterparts (Chapter 3); and explain the basics of adding, querying, updating, and deleting data in MongoDB (Chapter 4).

By the end of this section, you will know why data modeling is so important to any database, including MongoDB, and be able to explain and put to use the basic set of MongoDB objects and functions. After reading the four chapters in this section, you will be prepared to tackle conceptual, logical, and physical data modeling in Section II.

Chapter 1 The Power of Data Modeling I gave the steering wheel a heavy tap with my hands as I realized that, once again, I was completely lost. It was about an hour before dawn, I was driving in France, and an important business meeting awaited me. I spotted a gas station up ahead that appeared to be open. I parked, went inside, and showed the attendant the address of my destination. I don’t speak French and the attendant didn’t speak English. The attendant did, however, recognize the name of the company I needed to visit. Wanting to help and unable to communicate verbally, the attendant took out a pen and paper. He drew lines for streets, circles for roundabouts along with numbers for exit paths, and rectangles for his gas station and my destination, an organization called “MFoods”:

With this custom-made map, which contained only the information that was relevant to me, I arrived at my address without making a single wrong turn. The map was a model of the actual roads I needed to travel. Now, what does my poor sense of direction and a gas station attendant skilled at drawing maps have to do with MongoDB? A map simplifies a complex geographic landscape in the same way that a data model simplifies a complex information landscape. In many cases with large data volumes, high velocity in receiving data, and diverse data types, the complexities we encounter in MongoDB applications can make those roundabouts in France look ridiculously simple. We therefore need maps (in the form of data models) to provide clear and precise documentation about an application’s information landscape. It would probably have taken me hours of trial and error to reach my destination in France, whereas that simple map the gas station attendant drew provided me with an almost instantaneous broad understanding of how to reach my destination. A model makes use of standard symbols that allow one to grasp the content quickly. In the map he drew for me, the attendant used lines to symbolize streets and circles to symbolize roundabouts. His skillful use of those symbols helped me visualize the streets and roundabouts. Data modeling is the process of learning about the data, and the data model is the end result of the data modeling process. A data model is a set of symbols and text that precisely explains a business information landscape. A box with the word “Customer” within it represents the concept of a real Customer such as Bob, IBM, or Walmart on a data model. A line represents a relationship between two concepts such as capturing that a Customer may own one or many Accounts. This chapter explains why data modeling is necessary for any database, relational or NoSQL, and also introduces the publishing case study that appears in each of the following chapters. MANY FORMS TO REPRESENT AN INFORMATION LANDSCAPE The result of the data modeling process is a data model, yet data models themselves can be represented through many different forms. Data models can look like the box and line drawings that are the subject of this book, or they can take other forms such as Unified Modeling Language (UML) Class Diagrams, spreadsheets, or State Transition Diagrams. They can even take the form of precise business assertions generated from the answers to business questions. For example, here are four forms of data models:

Information Engineering Fully Communication-Oriented Information Modeling Unified Modeling Language The Axis Technique Each depicts the same business area but uses a different set of symbols. Which form works best? It depends on the audience. The data modeler can get very creative with the form used to explain an application’s information landscape. CONFIRMING AND DOCUMENTING DIFFERENT PERSPECTIVES The reason we do data modeling is to confirm and document our understanding of different perspectives. A data model is a communication tool. Think of all of the people involved in building even a simple application: business professionals, business analysts, data modelers, data architects, database developers, database administrators, developers, managers, etc. People have different backgrounds and experiences and varying levels of business knowledge and technical expertise. The data model allows us to confirm our knowledge of the area and make sure people see the information landscape similarly or, at a minimum, have an understanding of the differences that exist. A data model can describe a new information landscape, or it can describe an information landscape that currently exists. This figure contains the new and existing areas where data modeling can be leveraged:

Traditionally, data models have been built during the analysis and design phases of a project to ensure that the requirements for a new application are fully understood and correctly captured before the actual database is created (i.e., forward engineering). There are, however, other uses for modeling than simply building databases. Among these uses are the following: Risk mitigation. A data model can capture the concepts and interactions that are impacted by a development project or program. What is the impact of adding or modifying structures for an application already in production? One example of impact analysis would be to use data modeling to determine what impact modifying its structures would have on purchased software. Reverse engineer. We can derive a data model from an existing application by examining the application’s database and building a data model of its structures. The technical term for the process of building data models from existing applications is “reverse engineering.” The trend in many industries is to purchase

Statistics

Uploader

Data Modeling for MongoDB. Building Well-Designed and Supportable MongoDB Databases (Steve Hoberman)（Z-Library）

Tags

Text Preview (First 20 pages)

Registered users can read the full content for free

Comments 0

Blog & Notes

Recommended for You

Statistics

Uploader

Data Modeling for MongoDB. Building Well-Designed and Supportable MongoDB Databases (Steve Hoberman)（Z-Library）

Tags

Text Preview (First 20 pages)

Registered users can read the full content for free

Comments 0

Reply to Comment

Edit Comment

Blog & Notes

Recommended for You