M A N N I N G Marco Peixeiro
Core concepts of time-series forecasting Core concept Chapter Section Model vs. algorithm 1 1.1 Definition of a foundation model 1 1.1 Architecture of the transformer 1 1.2 Benefits and drawbacks of foundation models 1 1.3 Basis expansion in N-BEATS 2 2.1.1 Architecture of N-BEATS 2 2.2 Pretraining 2 2.3 Transfer learning 2 2.3 Fine-tuning 2 2.6 Challenges of building foundation models 2 2.9 Definition of generative pretrained transformers (GPT) 3 3.1 TimeGPT 3 3.2 Conformal predictions 3 3.2.2 Forecasting with TimeGPT 3 3.3 Fine-tuning TimeGPT 3 3.4 TimeGPT with covariates 3 3.5 Cross-validation 3 3.6 Lag-Llama 4 4.1 Architecture of Lag-Llama 4 4.1.1 Forecasting with Lag-Llama 4 4.2 Fine-tuning Lag-Llama 4 4.3 T5 language models 5 5.1 Chronos 5 5.3 Selecting the right Chronos model 5 5.4.3 Forecasting with Chronos 5 5.5 (Continued on inside back cover) Licensed to Swaroop Kallakuri <iamk.swaroop@gmail.com>
MANN I NG Shelter ISland Marco Peixeiro Time Series Forecasting Using Foundation Models Licensed to Swaroop Kallakuri <iamk.swaroop@gmail.com>
For online information and ordering of this and other Manning books, please visit www.manning.com. The publisher offers discounts on this book when ordered in quantity. For more information, please contact Special Sales Department Manning Publications Co. 20 Baldwin Road PO Box 761 Shelter Island, NY 11964 Email: orders@manning.com © 2026 Manning Publications Co. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by means electronic, mechanical, photocopying, or otherwise, without prior written permission of the publisher. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in the book, and Manning Publications was aware of a trademark claim, the designations have been printed in initial caps or all caps. Recognizing the importance of preserving what has been written, it is Manning’s policy to have the books we publish printed on acid- free paper, and we exert our best efforts to that end. Recognizing also our responsibility to conserve the resources of our planet, Manning books are printed on paper that is at least 15 percent recycled and processed without the use of elemental chlorine. ∞ Manning Publications Co. 20 Baldwin Road PO Box 761 Shelter Island, NY 11964 ISBN 9781633435896 Printed in the United States of America The author and publisher have made every effort to ensure that the information in this book was correct at press time. The author and publisher do not assume and hereby disclaim any liability to any party for any loss, damage, or disruption caused by errors or omissions, whether such errors or omissions result from negligence, accident, or any other cause, or from any usage of the information herein. Development editor: Sarah Harter Technical editor: Anurag Lahon Review editor: Dunja NikitoviÊ Production editor: Andy Marinkovich Copy editor: Keir Simpson Proofreader: Melody Dolab Typesetter: Tamara ŠveliÊ SabljiÊ Cover designer: Marija Tudor Licensed to Swaroop Kallakuri <iamk.swaroop@gmail.com>
To my wife, my parents, and my sister. Please read it this time. And to my little peach, may this book put you to sleep. Licensed to Swaroop Kallakuri <iamk.swaroop@gmail.com>
iv brief contents Part 1 The rise of foundation machine learning models ..1 1 ■ Understanding foundation models 3 2 ■ Building a foundation model 16 Part 2 Foundation models developed for forecasting ....33 3 ■ Forecasting with TimeGPT 35 4 ■ Zero-shot probabilistic forecasting with Lag-Llama 67 5 ■ Learning the language of time with Chronos 88 6 ■ Moirai: A universal forecasting transformer 114 7 ■ Deterministic forecasting with TimesFM 141 Part 3 Using LLMs for time-series forecasting ............... 161 8 ■ Forecasting as a language task 163 9 ■ Reprogramming an LLM for forecasting 195 Part 4 Capstone project .................................................... 211 10 ■ Capstone project: Forecasting daily visits to a blog 213 Licensed to Swaroop Kallakuri <iamk.swaroop@gmail.com>
v contents preface xi acknowledgments xii about this book xiii about the author xvi about the cover illustration xvii Part 1 The rise of foundation machine learning .. models ............................................................1 1 Understanding foundation models 3 1.1 Defining a foundation model 4 1.2 Exploring the transformer architecture 6 Feeding the encoder 7 ■ Inside the encoder 9 ■ Making predictions 10 1.3 Advantages and disadvantages of foundation models 12 Benefits of foundation forecasting models 13 ■ Drawbacks of foundation forecasting models 13 1.4 Next steps 14 2 Building a foundation model 16 2.1 Exploring the architecture of N-BEATS 17 Basis expansion 17 Licensed to Swaroop Kallakuri <iamk.swaroop@gmail.com>
vi contents 2.2 Architecture of N-BEATS 18 A block in N-BEATS 18 ■ A stack in N-BEATS 18 Assembling N-BEATS 19 2.3 Pretraining our model 20 2.4 Pretraining N-BEATS 22 2.5 Transfer learning with our pretrained model 23 2.6 Fine-tuning our pretrained model 25 2.7 Evaluating each approach 26 2.8 Forecasting at another frequency 28 2.9 Understanding the challenges of building a foundation model 31 2.10 Next steps 32 Part 2 Foundation models developed for .............. forecasting .................................................33 3 Forecasting with TimeGPT 35 3.1 Defining generative pretrained transformers 36 3.2 Exploring TimeGPT 37 Training TimeGPT 38 ■ Quantifying uncertainty in TimeGPT 39 3.3 Forecasting with TimeGPT 42 Initial setup 43 ■ Zero-shot forecasting 45 Performance evaluation 47 3.4 Fine-tuning with TimeGPT 48 Fine-tuning TimeGPT 48 ■ Evaluating the fine-tuned model 50 Controlling the depth of fine-tuning 50 3.5 Forecasting with exogenous variables 52 Preparing the exogenous features 52 ■ Forecasting with exogenous variables 53 ■ Explaining the effect of exogenous features with Shapley values 53 ■ Evaluating forecasts with exogenous features 56 3.6 Cross-validating with TimeGPT 57 3.7 Forecasting on a long horizon with TimeGPT 59 Licensed to Swaroop Kallakuri <iamk.swaroop@gmail.com>
viicontents 3.8 Detecting anomalies with TimeGPT 61 Detecting anomalies 62 ■ Evaluating anomaly detection 63 3.9 Next steps 66 4 Zero-shot probabilistic forecasting with Lag-Llama 67 4.1 Exploring Lag-Llama 68 Viewing the architecture of Lag-Llama 68 ■ Pretraining Lag- Llama 71 4.2 Forecasting with Lag-Llama 71 Setting up Lag-Llama 72 ■ Zero-shot forecasting with Lag- Llama 72 ■ Changing the context length in Lag-Llama 77 4.3 Fine-tuning Lag-Llama 82 Handling initial setup 82 ■ Reading and splitting the data in Colab 84 ■ Launching the fine-tuning procedure 84 ■ Forecasting with a fine-tuned model 84 Evaluating the fine-tuned model 85 4.4 Model comparison table 86 4.5 Next steps 86 5 Learning the language of time with Chronos 88 5.1 Discovering the T5 family 89 5.2 Exploring Chronos 89 5.3 Using tokenization in Chronos 90 5.4 Training a model with Chronos 93 Tackling data scarcity with augmentation techniques 94 Examining the pretrained Chronos models 96 ■ Selecting the appropriate Chronos model 96 5.5 Forecasting with Chronos 97 Initial setup 98 ■ Predictions 98 5.6 Cross-validating with Chronos 100 Running cross-validation 102 ■ Evaluating Chronos 102 5.7 Fine-tuning Chronos 103 Performing initial setup 104 ■ Configuring the fine-tuning parameters 105 ■ Launching the fine-tuning procedure 106 Forecasting with a fine-tuned model 107 ■ Evaluating the fine- tuned model 107 Licensed to Swaroop Kallakuri <iamk.swaroop@gmail.com>
viii contents 5.8 Detecting anomalies with Chronos 109 5.9 Next steps 112 6 Moirai: A universal forecasting transformer 114 6.1 Exploring Moirai 115 Viewing the architecture of Moirai 115 ■ Pretraining Moirai 120 ■ Selecting the appropriate model 121 6.2 Discovering Moirai-MoE 121 Patching and embedding 122 ■ Studying the decoder-only transformer 123 ■ Pretraining Moirai-MoE 124 6.3 Forecasting with Moirai 124 Zero-shot forecasting with Moirai 124 ■ Cross-validation with Moirai 128 ■ Forecasting with exogenous features 131 6.4 Detecting anomalies with Moirai 135 6.5 Next steps 139 7 Deterministic forecasting with TimesFM 141 7.1 Examining TimesFM 142 Architecture of TimesFM 142 ■ Pretraining TimesFM 145 7.2 Forecasting with TimesFM 147 Zero-shot forecasting with TimesFM 148 ■ Cross-validation with TimesFM 150 ■ Forecasting with exogenous features 153 7.3 Fine-tuning TimesFM and anomaly detection 158 7.4 Next steps 158 Part 3 Using LLMs for time-series forecasting ............................................... 161 8 Forecasting as a language task 163 8.1 Overview of LLMs and prompting techniques 164 Exploring Flan-T5 and Llama-3.2 164 ■ Understanding the basics of prompting 165 8.2 Forecasting with Flan-T5 167 Function to forecast with Flan-T5 167 ■ Forecast with Flan-T5 170 Licensed to Swaroop Kallakuri <iamk.swaroop@gmail.com>
ixcontents 8.3 Cross-validation with Flan-T5 173 Running cross-validation 173 ■ Evaluating Flan-T5 174 8.4 Forecasting with exogenous features with Flan-T5 175 Including exogenous features with Flan-T5 175 ■ Extracting future values of exogenous variables 177 ■ Cross-validating with external features 178 ■ Evaluating Flan-T5 forecasts with exogenous features 179 8.5 Detecting anomalies with Flan-T5 180 Defining a function for anomaly detection with Flan-T5 181 Running anomaly detection 183 8.6 Forecasting with Llama-3.2 184 Performing initial setup 184 ■ Creating a function to forecast via API call 184 ■ Making predictions 186 8.7 Cross-validating with Llama-3.2 187 8.8 Detecting anomalies with Llama-3.2 190 Modifying the system prompt 190 ■ Defining a function for anomaly detection 190 ■ Running anomaly detection with Llama-3.2 191 ■ Evaluating anomaly detection 192 8.9 Next steps 193 9 Reprogramming an LLM for forecasting 195 9.1 Discovering Time-LLM 196 Patch reprogramming 196 ■ Discovering Prompt-as-Prefix 197 Making predictions 200 9.2 Forecasting with Time-LLM 200 Performing initial setup 201 ■ Generating forecasts 202 9.3 Cross-validating with Time-LLM 202 9.4 Evaluating Time-LLM 204 9.5 Detecting anomalies with Time-LLM 205 Detecting anomalies 205 ■ Evaluating anomaly detection 207 9.6 Next steps 208 Part 4 Capstone project ...................................... 211 10 Capstone project: Forecasting daily visits to a blog 213 10.1 Introducing the use case 214 Licensed to Swaroop Kallakuri <iamk.swaroop@gmail.com>
x contents 10.2 Walking through the project 216 Setting the constants 216 ■ Forecasting with a seasonal naïve model 217 ■ Forecasting with ARIMA 218 ■ Forecasting with TimeGPT 219 ■ Forecasting with Chronos 220 Forecasting with Moirai 221 ■ Forecasting with TimesFM 223 Forecasting with Time-LLM 225 ■ Evaluating all models 226 10.3 Staying up to date 230 references 231 index 233 Licensed to Swaroop Kallakuri <iamk.swaroop@gmail.com>
xi preface In October 2023, I used TimeGPT, one of the foundation forecasting models that we explore in this book, for the first time. After running it for a project, I found that it made better predictions than the models I’d carefully built and tuned on my data. That’s when I knew that large time models were about to change the field of time-series forecasting. A pretrained model not only performed better than my own but also was much faster and more convenient. This is the ultimate promise of foundation models: a single model enables you to deliver state-of-the-art forecasting performance without the hassle of training a model from scratch or maintaining multiple models for each use case. Since then, many models have been proposed and developed, and a big shift has occurred in the scientific community, where a great deal of effort is now spent building better foundation forecasting models. Just as data professionals are expected to know large language models (LLMs), I anticipate that large time models will be must-know technology for practitioners, so I set out to write a book to bring readers up to speed. This book explores the major contributions to large time models. It can’t cover all that has been done or anticipate all that will happen next, of course, but it will enable you to use and optimize current large models. I included the most recent modifications to methods covered in the book to ensure that what you read is as up to date as possible. The book focuses on practicality and hands-on work with each model. The idea is that you’ll master new tools and adapt them to your own scenarios. In a dedicated capstone project at the end, you compare large time models with more traditional approaches and evaluate their performance. I had the chance to join Nixtla and have worked on TimeGPT since 2024, which gave me the opportunity to study other foundation models and work with them extensively, putting me in a particularly good position to write this book. I remain impartial in my evaluations, as you’ll see throughout the chapters. Licensed to Swaroop Kallakuri <iamk.swaroop@gmail.com>
xii acknowledgments First, thanks to my lovely wife for her patience during this project, although I think she was enjoying the quiet evenings without me toward the end. Special thanks to Brian Sawyer. He allowed me to write my first book, and writing a second one is more than a dream come true. I thank him for his trust, and let’s hope for a third book. Huge thanks to Sarah Harter for her amazing work as my development editor. She helped me get organized and improve the book throughout the months. Working with her was an absolute pleasure. A big “thank you” to Jonathan Gennick for trusting my expertise and for going above and beyond to make this book a reality. Thanks also to my technical editor, Anurag Lahon, for his careful review of the code. Many thanks to everyone I haven’t met who worked in the background to make this book come true. To all the reviewers—Ako Heidari, Alireza Aghamohammadi, Anne Katrine Falk, Arjun Ashok, Ashish Patel, Aushim Nagarkatti, Avinash Tiwari, Chalamayya Batchu, Christoph Bergmeir, Felipe Coutinho, Gaurav Pandey, Guillermo Alcantara, Har- dev Ranglani, Hatim Kagalwala, Jay Shah, Jeffrey Tackes, Johannes Stephan, Karan- bir Singh, Kaushik Dutt, Kaushik Ruparel, Kavin Soni, Manu Joseph, Mariano Junge, Mariia Bulycheva, Meetu Malhotra, Natapong Sornprom, Olena Sokol, Peter Gru- ber, Prashanth Josyula, Ritwik Dubey, Saikrishna Chinthapatla, Sana Hassan, Sathya Narayanan Annamalai Geetha, Sharmila Devi Chandariah, Shubham Patel, Sofiia Shvets, Steven Edwards, Sudarshan Anand, Tony Dunsworth, and Vojta Tůma—your suggestions helped make this book better. Finally, thank you to all my teammates at Nixtla, who built amazing open source soft- ware for the forecasting community and gave me the chance to contribute to it. Thanks to them, writing the code for this book was a breeze. Licensed to Swaroop Kallakuri <iamk.swaroop@gmail.com>
xiii about this book This book is meant to give you all the necessary knowledge to use large time models in the most optimal way and adapt them to your own use cases. We begin by exploring the transformer architecture, which still powers most foundation forecasting models. Then we attempt to build a tiny foundation model to experiment with concepts such as pretraining, fine-tuning, and transfer learning. This experience is a great way to appre- ciate the challenges of building a truly foundational model for forecasting. Next, we explore foundation models specifically built for time-series forecasting, from TimeGPT to TimesFM. Then we experiment with LLMs applied to forecasting. We explore each method’s inner workings and pretraining procedures, which dictate the model’s capabilities and optimal use cases. That way, you’ll understand when to use a particular model and how to use it optimally. The book concludes with an experiment that draws on all the methods we explored throughout the book. Who should read this book? This book is meant for practitioners who have some experience in time-series forecast- ing using Python, possess foundational knowledge of time-series forecasting concepts, and know how to train forecasting models. The book assumes knowledge of basic fore- casting concepts such as seasonality, trend, and autoregression. It also assumes some knowledge of statistical models such as ARIMA, which we use in the last chapter to compare the performance of traditional models and foundation models. By the end of the book, you’ll have the skills and knowledge to apply the major avail- able large time models to your own projects, making sure that the models are adapted and fine-tuned to your use cases. Licensed to Swaroop Kallakuri <iamk.swaroop@gmail.com>
xiv about this book How this book is organized: A roadmap This book is divided into four sections covering 10 chapters. In part 1, we explore the concept of foundation models and build a tiny foundation model: ¡ Chapter 1 explores the transformer architecture, which is the backbone of many large time models that we use throughout the book. We also study the benefits and drawbacks of using foundation models for forecasting. ¡ Chapter 2 details the technical steps and concepts involved in building a foun- dation model, such as pretraining, transfer learning, and fine-tuning. We apply those concepts in a hands-on experiment by building a small time model. In part 2, we explore foundation models built specifically for time-series forecasting: ¡ Chapter 3 introduces TimeGPT, one of the first foundation models proposed. We learn how it works, how it was pretrained, and how to use it for forecasting. We also fine-tune the model, include exogenous features, and produce explain- ability plots using shap. As a bonus, we use it for anomaly detection. ¡ Chapter 4 explores Lag-Llama, a probabilistic model mostly geared toward research. We learn how its parameters affect its performance and how to fine- tune it. ¡ Chapter 5 dives into Chronos, a framework that can adapt any LLM for forecast- ing tasks. After studying its architecture and pretraining protocol, we learn how to use it optimally. We also perform fine-tuning and anomaly detection. ¡ Chapter 6 explores Moirai, a model built to handle exogenous features natively. We discover its architecture and learn to perform inference both with and with- out covariates. ¡ Chapter 7 explores TimesFM, a deterministic model that is ideal for point forecasts. In part 3, we experiment with LLMs in forecasting because the task of completing sen- tences with text can be analogous to forecasting with numbers: ¡ Chapter 8 explores the use of Flan-T5 and Llama models for time-series fore- casting. We use Flan-T5 as a local model and access Llama through an API. We learn how to adapt LLMs for forecasting and use techniques such as few-shot and chain-of-thought prompting to guide the models. ¡ Chapter 9 introduces Time-LLM, a model that effectively reprograms LLMs for time-series forecasting. It can’t perform zero-shot forecasting but can be a better choice than using LLMs directly. The single chapter in part 4 is a capstone project: ¡ Chapter 10 gives you the perfect opportunity to solidify your learning and imple- ment your knowledge in a self-guided project. I provide a proposed solution and analysis of the results, but the goal is to let you experiment and come up with your own results. Licensed to Swaroop Kallakuri <iamk.swaroop@gmail.com>
xvabout this book TIP To get the most value from chapters 3 through 7, readers who have never worked with foundation models should read the first two chapters to under- stand their capabilities and concepts. About the code All the code in this book is in Python. You may not reproduce the same results because there is always some variability in the output of models, especially probabilistic models. This book contains many examples of source code both in numbered listings and in line with normal text. In both cases, source code is formatted in a fixed-width font like this to separate it from ordinary text. Sometimes code is also in bold to highlight code that has changed from previous steps in the chapter, such as when a new feature adds to an existing line of code. In many cases, the original source code has been reformatted; we’ve added line breaks and reworked indentation to accommodate the available page space in the book. In rare cases, even this was not enough, and listings include line-continuation markers (➥). Additionally, comments in the source code have often been removed from the list- ings when the code is described in the text. Code annotations accompany many of the listings, highlighting important concepts. You can get executable snippets of code from the liveBook (online) version of this book at https://livebook.manning.com/book/time-series-forecasting-using-foundation -models. The complete code for the examples in the book is available for download from the Manning website at https://www.manning.com and from GitHub at https:// mng.bz/a9Q9. liveBook discussion forum Purchase of Time Series Forecasting Using Foundation Models includes free access to live- Book, Manning’s online reading platform. Using liveBook’s exclusive discussion fea- tures, you can attach comments to the book globally or to specific sections or paragraphs. It’s a snap to make notes for yourself, ask and answer technical questions, and receive help from the author and other users. To access the forum, go to https://livebook .manning.com/book/time-series-forecasting-using-foundation-models/discussion. Manning’s commitment to our readers is to provide a venue where meaningful dia- logue between individual readers and between readers and authors can take place. It is not a commitment to any specific amount of participation on the part of the author, whose contribution to the forum remains voluntary (and unpaid). We suggest that you try asking the author some challenging questions lest his interest stray! The forum and the archives of previous discussions will be accessible on the publisher’s website for as long as the book is in print. Licensed to Swaroop Kallakuri <iamk.swaroop@gmail.com>
xvi about the author Marco Peixeiro is the author of Time Series Forecasting in Python, published by Manning Publications. He works at Nixtla, actively developing TimeGPT and maintaining open source forecasting libraries such as neuralforecast. He conducted time-series forecasting workshops for the Open Data Science Conference (ODSC) and is a guest lecturer at Harvard Business School. He also writes blog articles for his Medium publication The Forecaster (https://medium.com/the-forecaster) and hosts online courses on forecasting and other subjects on his website (https://www.datasciencewithmarco.com). Licensed to Swaroop Kallakuri <iamk.swaroop@gmail.com>
xvii about the cover illustration The figure on the cover of Time Series Forecasting Using Foundation Models, captioned “Tartare de Crimée,” or “Crimean Tatar,” is taken from a collection by Jacques Grasset de Saint-Sauveur, published in 1784. Each illustration is finely drawn and colored by hand. In those days, it was easy to identify where people lived and what their trade or station in life was by their dress alone. Manning celebrates the inventiveness and initiative of the computer business with book covers based on the rich diversity of regional culture centuries ago, brought back to life by pictures from collections such as this one. Licensed to Swaroop Kallakuri <iamk.swaroop@gmail.com>
Licensed to Swaroop Kallakuri <iamk.swaroop@gmail.com>
Comments 0
Loading comments...
Reply to Comment
Edit Comment