Tamer Khraisha Foreword by Martijn Groot Afterword by Brian Buzzelli Financial Data Engineering Design and Build Data-Driven Financial Products
Financial Data Engineering “This book delivers a profound exploration of data engineering tailored specifically to the complexities of the financial domain.” Vipul Bharat Marlecha Senior Data Engineer, Netflix ISBN: 978-1-098-15999-3 US $69.99 CAN $87.99 DATA ENGINEERING | FINANCE Today, investment in financial technology and digital transformation is reshaping the financial landscape and generating many opportunities. Too often, however, engineers and professionals in financial institutions lack a practical and comprehensive understanding of the concepts, problems, techniques, and technologies necessary to build a modern, reliable, and scalable financial data infrastructure. This is where financial data engineering is needed. A data engineer developing a data infrastructure for a financial product possesses not only technical data engineering skills but also a solid understanding of financial domain-specific challenges, methodologies, data ecosystems, providers, formats, technological constraints, identifiers, entities, standards, regulatory requirements, and governance. This book offers a comprehensive, practical, domain-driven approach to financial data engineering, featuring real-world use cases, industry practices, and hands-on projects. You’ll learn: • The data engineering landscape in the financial sector • Specific problems encountered in financial data engineering • The structure, players, and particularities of the financial data domain • Approaches to designing financial data identification and entity systems • Financial data governance frameworks, concepts, and best practices • The financial data engineering lifecycle from ingestion to production • The varieties and main characteristics of financial data workflows • How to build financial data pipelines using open source tools and APIs Tamer Khraisha, PhD, is a senior software engineer and scientific author with over a decade of experience in both industry and research. Tamer combines a solid background in financial markets with substantial expertise in software and data engineering. He’s worked with various FinTech startups, where he designed and built data-driven solutions for financial research, AI, and asset management, as well as international payment systems.
Praise for Financial Data Engineering This book transforms complex technical concepts into practical tools, empowering professionals to unlock new dimensions of innovation in finance. A rare blend of depth, clarity, and forward-thinking insight. —Shivani Gole, Data Engineer, McKinsey & Company Financial Data Engineering is a helpful and thorough guide on using data engineering in the financial sector. Tamer does a great job of mixing theory with practical examples, making it useful for both professionals and academics. This book is well-organized, with real-world examples that make complex ideas easy to understand. It is very relevant in today’s data-driven financial world. —Pankaj Gupta, Manager, Data Engineering, Discover Financial Services This book navigates through both the complex regulatory landscape and the deep technical workings of financial data engineering. Tamer’s approach simplifies challenging concepts, serving as both an accessible introduction to the topic and a valuable reference for professionals in the field. —Aakash Atul Alurkar, Senior Product Manager, Financial Services, Zoom In Financial Data Engineering, Tamer Khraisha provides a comprehensive overview of the many different aspects of financial data as well as the engineering capabilities required. Through helpful frameworks and many real-world examples, this book is a great resource for experienced practitioners as well as people new to the industry. —Martijn Groot, Financial Data Management Executive
An essential read for those exploring financial data. This book offers clear guidance on building scalable, efficient, and compliant data solutions in finance. —Ganesh Harke, Tech Lead at Citibank N.A. Financial Data Engineering by Tamer Khraisha is an indispensable resource that masterfully bridges the worlds of finance and data engineering. This book offers a uniquely comprehensive approach with its careful balance between foundational finance concepts and cutting-edge data engineering applications. Its insightful blend of theory, practical examples, and contemporary case studies makes it a must-read for anyone involved in the dynamically changing world of financial data. Whether you are a finance professional seeking to deepen your data engineering knowledge or a data engineer exploring and developing financial applications, this book provides the clarity and depth needed to navigate today’s complex financial data landscape and master tomorrow’s financial data engineering challenges. —Brian Buzzelli, Head of Data Practice, Meradia Data from various sources demands diverse skills to prepare it for modeling. Tamer has done an outstanding job in illustrating these tools and processes, making financial data engineering much more accessible and understandable. —Abdullah Karasan, Founder of Leveragai and Adjunct Faculty at UMBC Financial Data Engineering is a necessary read for those data engineers planning to work in the financial sector—or within financial organizations of any corporation or government agency. Key takeaways from the book include a focus on the financial data ecosystem (regulatory responsibilities from a public institution and government perspective), financial data governance, and the significance of data engineers committing to developing knowledge of the financial domain. —Johnnie Jones, Director of Data Engineering, Boeing Employee Credit Union (BECU) This book balances finance concepts and modern data practices, making it a clear guide for professionals in the finance and data fields. Whether you’re new to finance and data or experienced in the field, you’ll find valuable insights here. —William Jamir Silva, Senior Software Engineer
This book masterfully bridges the gap between data engineering and financial data, offering a well-balanced exploration of both fields with practical insights and timeless principles. The author’s ability to distill complex concepts into an accessible and comprehensive guide makes it an invaluable resource for professionals navigating the intersection of financial data and data engineering. —Vipul Bharat Marlecha, Senior Data Engineer, Netflix
(This page has no text content)
Tamer Khraisha Foreword by Martijn Groot Afterword by Brian Buzzelli Financial Data Engineering Design and Build Data-Driven Financial Products Boston Farnham Sebastopol TokyoBeijing
978-1-098-15999-3 [LSI] Financial Data Engineering by Tamer Khraisha Copyright © 2025 Tamer Khraisha. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (https://oreilly.com). For more information, contact our corporate/institu‐ tional sales department: 800-998-9938 or corporate@oreilly.com. Acquisitions Editor: Michelle Smith Development Editor: Jill Leonard Production Editor: Gregory Hyman Copyeditor: Liz Wheeler Proofreader: Sonia Saruba Indexer: nSight, Inc. Interior Designer: David Futato Cover Designer: Karen Montgomery Illustrator: Kate Dullea October 2024: First Edition Revision History for the First Edition 2024-10-09: First Release See http://oreilly.com/catalog/errata.csp?isbn=9781098159993 for release details. The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Financial Data Engineering, the cover image, and related trade dress are trademarks of O’Reilly Media, Inc. The views expressed in this work are those of the author and do not represent the publisher’s views. While the publisher and the author have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the author disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights.
To my wife Marti, my constant cheerleader and greatest inspiration, and to our wonder‐ ful baby Mark, whose laughter is the melody that accompanies my writing. This book is a small reflection of the love and joy you bring into my world.
(This page has no text content)
Table of Contents Foreword. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv Part I. Foundations of Financial Data Engineering 1. Financial Data Engineering Clarified. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Defining Financial Data Engineering 4 First of All, What Is Finance? 5 Defining Data Engineering 10 Defining Financial Data Engineering 12 Why Financial Data Engineering? 13 Volume, Variety, and Velocity of Financial Data 14 Finance-Specific Data Requirements and Problems 19 Financial Machine Learning 21 The Disruptive FinTech Landscape 27 Regulatory Requirements and Compliance 31 The Financial Data Engineer Role 32 Description of the Role 32 Where Do Financial Data Engineers Work? 33 Responsibilities and Activities of a Financial Data Engineer 36 Skills of a Financial Data Engineer 38 Summary 42 2. Financial Data Ecosystem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 Sources of Financial Data 44 Public Financial Data 44 v
Security Exchanges 47 Commercial Data Vendors, Providers, and Distributors 47 Survey Data 53 Alternative Data 54 Confidential and Proprietary Data 54 Structures of Financial Data 55 Time Series Data 55 Cross-Sectional Data 56 Panel Data 57 Matrix Data 59 Graph Data 60 Text Data 67 Types of Financial Data 68 Fundamental Data 68 Market Data 70 Transaction Data 73 Analytics Data 76 Alternative Data 76 Reference Data 77 Entity Data 80 Benchmark Financial Datasets 80 Center for Research in Security Prices 81 Compustat Financials 81 Trade and Quote Database 81 Institutional Brokers’ Estimate System 81 IvyDB OptionMetrics 82 Trade Reporting and Compliance Engine 82 Orbis Global Database 82 SDC Platinum 83 Standard & Poor’s Dow Jones Indices 83 Alternative Datasets 83 Summary 85 3. Financial Identification Systems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 Financial Identifiers 87 Financial Identifier and Identification System Defined 88 The Need for Financial Identifiers 89 Who Creates Financial Identification Systems? 90 Desired Properties of a Financial Identifier 93 Uniqueness 93 Globality 94 Scalability 96 vi | Table of Contents
Completeness 97 Accessibility 98 Timeliness 98 Authenticity 99 Granularity 99 Permanence 100 Immutability 102 Security 102 Financial Identification Systems Landscape 103 International Securities Identification Number 104 Classification of Financial Instruments 106 Financial Instrument Short Name 107 Committee on Uniform Security Identification Procedures 108 Legal Entity Identifier 109 Transaction Identifiers 110 Stock Exchange Daily Official List 112 Ticker Symbols 113 Derivative Identifiers 114 Financial Instrument Global Identifier 116 FactSet Permanent Identifier 119 LSEG Permanent Identifier 119 Digital Asset Identifiers 120 Industry and Sector Identifiers 121 Bank Identifiers 122 Summary 125 4. Financial Entity Systems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 Financial Entity Defined 128 Financial Named Entity Recognition 129 Named Entity Recognition Described 129 How Does Named Entity Recognition Work? 134 Approaches to Named Entity Recognition 141 Named Entity Recognition Software Libraries 149 Financial Entity Resolution 150 Entity Resolution Described 151 The Importance of Entity Resolution in Finance 151 How Does Entity Resolution Work? 156 Approaches to Entity Resolution 164 Entity Resolution Software Libraries 170 Summary 170 Table of Contents | vii
5. Financial Data Governance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 Financial Data Governance 171 Financial Data Governance Defined 171 Financial Data Governance Justified 172 Data Quality 174 Dimension 1: Data Errors 175 Dimension 2: Data Outliers 177 Dimension 3: Data Biases 179 Dimension 4: Data Granularity 180 Dimension 5: Data Duplicates 182 Dimension 6: Data Availability and Completeness 185 Dimension 7: Data Timeliness 187 Dimension 8: Data Constraints 188 Dimension 9: Data Relevance 189 Data Integrity 190 Principle 1: Data Standards 190 Principle 2: Data Backups 191 Principle 3: Data Archiving 191 Principle 4: Data Aggregation 192 Principle 5: Data Lineage 193 Principle 6: Data Catalogs 194 Principle 7: Data Ownership 195 Principle 8: Data Contracts 195 Principle 9: Data Reconciliation 197 Data Security and Privacy 198 Data Privacy 201 Data Anonymization 203 Data Encryption 209 Access Control 210 Summary 212 Part II. The Financial Data Engineering Lifecycle 6. Overview of the Financial Data Engineering Lifecycle. . . . . . . . . . . . . . . . . . . . . . . . . . 215 Financial Data Engineering Lifecycle Defined 215 Criteria for Building the Financial Data Engineering Stack 218 Criterion 1: Open Source Versus Commercial Software 218 Criterion 2: Ease of Use Versus Performance 224 Criterion 3: Cloud Versus On Premises 227 Criterion 4: Public Versus Private Versus Hybrid Cloud 235 Criterion 5: Single Versus Multi-Cloud 238 viii | Table of Contents
Criterion 6: Monolithic Versus Modular Codebase 240 Summary 242 7. Data Ingestion Layer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243 Data Transmission and Arrival Processes 243 Data Transmission Protocols 244 Data Arrival Processes 249 Data Ingestion Formats 256 General-Purpose Formats 256 Big Data Formats 257 In-Memory Formats 258 Standardized Financial Formats 258 Data Ingestion Technologies 269 Financial APIs 269 Financial Data Feeds 274 Secure File Transfer 275 Cloud Access 275 Web Access 277 Specialized Financial Software 277 Data Ingestion Best Practices 277 Meet Business Requirements 277 Design for Change 278 Enforce Data Governance 278 Perform Benchmarking and Stress Testing 279 Summary 279 8. Data Storage Layer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281 Principles of Data Storage System Design 281 Principle 1: Business Requirements 282 Principle 2: Data Modeling 283 Principle 3: Transactional Guarantee 284 Principle 4: Consistency Tradeoffs 287 Principle 4: Scalability 288 Principle 5: Security 290 Data Storage Modeling 290 SQL Versus NoSQL 291 Primary Versus Secondary 292 Operational Versus Analytical 292 Native Versus Non-Native 293 Multi-Model Versus Polyglot Persistence 293 Data Storage Models 294 The Data Lake Model 294 Table of Contents | ix
The Relational Model 301 The Document Model 314 The Time Series Model 319 The Message Broker Model 323 The Graph Model 329 The Warehouse Model 335 The Blockchain Model 343 Summary 346 9. Data Transformation and Delivery Layer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347 Data Querying 347 Querying Patterns 347 Query Optimization 351 Data Transformation 357 Transformation Operations 357 Transformation Patterns 368 Computational Requirements 374 Data Delivery 382 Data Consumers 382 Delivery Mechanisms 383 Summary 384 10. The Monitoring Layer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385 Metrics, Events, Logs, and Traces 386 Metrics 386 Events 388 Logs 388 Traces 389 Data Quality Monitoring 390 Performance Monitoring 392 Cost Monitoring 397 Business and Analytical Monitoring 400 Data Observability 404 Summary 406 11. Financial Data Workflows. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407 Workflow-Oriented Software Architectures 407 What Is a Data Workflow? 408 Workflow Management Systems 410 Flexibility 410 Configurability 410 Dependency Management 411 x | Table of Contents
Coordination Patterns 412 Scalability 413 Integration 413 Types of Financial Data Workflows 414 Extract-Transform-Load Workflows 414 Stream Processing Workflows 417 Microservice Workflows 420 Machine Learning Workflows 424 Summary 429 12. Hands-On Projects. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 431 Prerequisites 431 Project 1: Designing a Bank Account Management System Database with PostgreSQL 432 Conceptual Model: Business Requirements 432 Logical Model: Entity Relationship Diagram 434 Physical Model: Data Definition and Manipulation Language 436 Project 1: Local Testing 436 Project 1: Clean Up 441 Project 1: Summary 442 Project 2: Designing a Financial Data ETL Workflow with Mage and Python 442 Project 2: Workflow Definition 442 Project 2: Database Design 443 Project 2: Local Testing 444 Project 2: Clean Up 448 Project 2: Summary 448 Project 3: Designing a Microservice Workflow with Netflix Conductor, PostgreSQL, and Python 448 Project 3: Workflow Definition 448 Project 3: Database Design 450 Project 3: Local Testing 452 Project 3: Clean Up 456 Project 3: Summary 457 Project 4: Designing a Financial Reference Data Store with OpenFIGI, PermID, and GLEIF APIs 457 Project 4: Prerequisites 458 Project 4: Local Testing 458 Project 4: Clean Up 459 Project 4: Summary 460 Conclusion 460 Follow Updates on These Projects 460 Report Issues or Ask Questions 460 Table of Contents | xi
The Path Forward: Trends Shaping Financial Markets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 461 Afterword. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465 Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 467 xii | Table of Contents
Foreword The common metaphor for data in financial services is that of oil, lifeblood, or more generally life-giving fuel. However, equally important is how firms use this fuel to the benefit of their business. Data is often raw material that needs to be processed, refined and blended before it can be used. How this material is then used in decision- making workflows is a critical differentiator. From strategy formulation, product development down to process implementation, operations and reporting to investors, customers and regulators, how firms manage information can be the difference between thriving, surviving or indeed declining. The rapid changes in how data is captured, aggregated, distilled, consumed and dis‐ tributed have led to faster cycle times, new insights and a much more close-knit inte‐ gration of computer science into business operations. The wide variety of traditional as well as new data sets that often don’t fit the mold of traditional computer science, brings opportunities for business differentiation and provides the burgeoning field of financial data engineering with the raw materials it needs. This variety has fostered rapid development of data governance and a better apprecia‐ tion of the various aspects of data quality. Data pipelines do not exist in isolation but are shaped by the context of business goals, external reporting considerations as well as legal and commercial constraints. Cloud transition and the range of data sets, tools and data engineering techniques available can make us feel spoiled for choice. Invari‐ ably, trade-offs take place, quality can be in the eye of the beholder and the right com‐ bination of data and engineering methods bridges business and technology. However, getting this combination right is no mean feat and there are many challenges that can easily derail any financial data engineering project. Perhaps a more apt metaphor for data in financial services is that of food with data as the ingredients and financial data engineering being similar to the process of cooking. The number of both basic ingredients and spices has grown and so has the number of culinary techniques and recipes, leading to a proliferation of options on the menu. How the kitchen is staffed is a key differentiator. xiii
What Tamer Khraisha has done in his book, Financial Data Engineering, is to provide us with a comprehensive guide that helps to better understand the vast and varied landscape of financial information and how best to apply financial data engineering concepts to make the most of it. It provides ‘data chefs’ a structured overview of financial data engineering and the raw materials to work with: from the different types of data sets and their identification to a structured treatment of the different aspects of data quality and data integrity. Subsequently, we get to work on these raw materials step by step, treating the entire financial data engineering lifecycle and tak‐ ing the business context and trade-offs into account. Whether cloud adoption, data‐ base and data warehouse developments, tokenization, machine learning or gen AI, Tamer puts it into business context. I especially liked that Tamer interlaced his overview and guidelines with many real- world case studies and ends the book with several in-depth data engineering projects. This will help practitioners build and finesse their own judgement as to what tools and data sets are fit for purpose and what controls or guardrails make sense. With his training in financial economics, background in network science and as a fin‐ tech practitioner, Tamer bridges business and technology and brings a unique per‐ spective to the field of financial data engineering. This book connects different topics that are often treated in a dispersed way, not least in financial services organizations themselves. This book will be a source to technologists looking to get a better appreciation of the business context of financial data engineering, as well as to those in financial services trying to get a better sense of the art of the possible in financial data engineering. It will be helpful to people fresh in the job market but will also serve as a reference to experienced practitioners. Financial data engineering bridges the old divide between computer science and busi‐ ness and this book will be a great help to successfully navigate that road. — Martijn Groot Financial Data Management expert Author of Managing Financial Information in the Trade Lifecycle (Elsevier, 2008) and A Primer in Financial Data Management (Elsevier, 2017) St. Paul’s Bay, Malta, September 2024 xiv | Foreword
Comments 0
Loading comments...
Reply to Comment
Edit Comment