Previous Next

Practical Machine Learning with Rust Creating Intelligent Applications in Rust (Joydeep Bhattacharjee)(Z-Library)

Author: Joydeep Bhattacharjee

RUST

Explore machine learning in Rust and learn about the intricacies of creating machine learning applications. This book begins by covering the important concepts of machine learning such as supervised, unsupervised, and reinforcement learning, and the basics of Rust. Further, you’ll dive into the more specific fields of machine learning, such as computer vision and natural language processing, and look at the Rust libraries that help create applications for those domains. We will also look at how to deploy these applications either on site or over the cloud. After reading Practical Machine Learning with Rust, you will have a solid understanding of creating high computation libraries using Rust. Armed with the knowledge of this amazing language, you will be able to create applications that are more performant, memory safe, and less resource heavy. What You Will Learn Write machine learning algorithms in Rust Use Rust libraries for different tasks in machine learning Create concise Rust packages for your machine learning applications Implement NLP and computer vision in Rust Deploy your code in the cloud and on bare metal servers Who This Book Is For Machine learning engineers and software engineers interested in building machine learning applications in Rust.

📄 File Format: PDF
💾 File Size: 3.7 MB
7
Views
0
Downloads
0.00
Total Donations

📄 Text Preview (First 20 pages)

ℹ️

Registered users can read the full content for free

Register as a Gaohf Library member to read the complete e-book online for free and enjoy a better reading experience.

📄 Page 1
Practical Machine Learning with Rust Creating Intelligent Applications in Rust — Joydeep Bhattacharjee
📄 Page 2
Practical Machine Learning with Rust: Creating Intelligent Applications in Rust ISBN-13 (pbk): 978-1-4842-5120-1 ISBN-13 (electronic): 978-1-4842-5121-8 https://doi.org/10.1007/978-1-4842-5121-8 Copyright © 2020 by Joydeep Bhattacharjee This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. Trademarked names, logos, and images may appear in this book. Rather than use a trademark symbol with every occurrence of a trademarked name, logo, or image we use the names, logos, and images only in an editorial fashion and to the benefit of the trademark owner, with no intention of infringement of the trademark. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights. While the advice and information in this book are believed to be true and accurate at the date of publication, neither the authors nor the editors nor the publisher can accept any legal responsibility for any errors or omissions that may be made. The publisher makes no warranty, express or implied, with respect to the material contained herein. Managing Director, Apress Media LLC: Welmoed Spahr Acquisitions Editor: Celestin Suresh John Development Editor: Matthew Moodie Coordinating Editor: Aditee Mirashi Cover designed by eStudioCalamar Cover image designed by Freepik (www.freepik.com) Distributed to the book trade worldwide by Springer Science+Business Media New York, 233 Spring Street, 6th Floor, New York, NY 10013. Phone 1-800-SPRINGER, fax (201) 348-4505, e-mail orders-ny@springer-sbm.com, or visit www.springeronline.com. Apress Media, LLC is a California LLC and the sole member (owner) is Springer Science + Business Media Finance Inc (SSBM Finance Inc). SSBM Finance Inc is a Delaware corporation. For information on translations, please e-mail rights@apress.com, or visit http://www.apress. com/rights-permissions. Apress titles may be purchased in bulk for academic, corporate, or promotional use. eBook versions and licenses are also available for most titles. For more information, reference our Print and eBook Bulk Sales web page at http://www.apress.com/bulk-sales. Any source code or other supplementary material referenced by the author in this book is available to readers on GitHub via the book's product page, located at www.apress.com/978-1-4842-5120-1. For more detailed information, please visit http://www.apress.com/source-code. Printed on acid-free paper Joydeep  Bhattacharjee Bangalore, India
📄 Page 3
To my wife, Saionee, for patiently hearing my ideas and giving me advice, support, and motivation. To my mom, father-in-law, and mother-in-law for believing in me throughout the years.
📄 Page 4
v About the Author ���������������������������������������������������������������������������������xi Acknowledgments �����������������������������������������������������������������������������xiii Introduction ����������������������������������������������������������������������������������������xv Table of Contents Chapter 1: Basics of Rust ���������������������������������������������������������������������1 1.1 Why Rust? .......................................................................................................1 1.2 A Better Reference ..........................................................................................2 1.3 Rust Installation ..............................................................................................5 1.4 Package Manager and Cargo ..........................................................................7 1.5 Creating New Applications in Rust ..................................................................7 1.6 Variables in Rust .............................................................................................9 1.6.1 Mutation and Shadowing ......................................................................11 1.6.2 Variable Scoping ...................................................................................13 1.7 Data Types .....................................................................................................13 1.8 Functions ......................................................................................................14 1.9 Conditions .....................................................................................................15 1.9.1 If Conditions ..........................................................................................15 1.9.2 Pattern Matching ..................................................................................16 1.10 References and Borrowing .........................................................................17 1.10.1 Mutable References ...........................................................................20
📄 Page 5
vi 1.11 Object-Oriented Programming ....................................................................22 1.11.1 Structures ...........................................................................................22 1.11.2 Traits ...................................................................................................23 1.11.3 Methods and impl ...............................................................................24 1.11.4 Enumerations .....................................................................................26 1.12 Writing Tests ................................................................................................27 1.13 Summary.....................................................................................................28 1.14 References ..................................................................................................29 Chapter 2: Supervised Learning ���������������������������������������������������������31 2.1 What Is Machine Learning? ...........................................................................31 2.2 Dataset Specific Code ...................................................................................32 2.3 Rusty_Machine Library .................................................................................41 2.4 Linear Regression .........................................................................................42 2.5 Gaussian Process ..........................................................................................52 2.6 Generalized Linear Models ............................................................................54 2.7 Evaluation of Regression Models ..................................................................57 2.7.1 MAE and MSE .......................................................................................57 2.7.2 R-Squared Error ....................................................................................59 2.8 Classification Algorithms ..............................................................................61 2.8.1 Iris Dataset ...........................................................................................62 2.8.2 Logistic Regression ..............................................................................67 2.8.3 Decision Trees ......................................................................................68 2.8.4 Random Forest .....................................................................................70 2.8.5 XGBoost ................................................................................................72 2.8.6 Support Vector Machines......................................................................77 2.8.7 K Nearest Neighbors .............................................................................79 Table of ConTenTs
📄 Page 6
vii 2.8.8 Neural Networks ...................................................................................84 2.8.9 Model Evaluation ..................................................................................94 2.9 Conclusion ..................................................................................................102 2.10 Bibliography ..............................................................................................102 Chapter 3: Unsupervised and Reinforcement Learning ��������������������107 3.1 K-Means Clustering ....................................................................................108 3.2 Gaussian Mixture Model .............................................................................112 3.3 Density-Based Spatial Clustering of Applications with Noise (DBSCAN) ....119 3.4 Principal Component Analysis .....................................................................121 3.5 Testing an Unsupervised Model ..................................................................123 3.6 Reinforcement Learning..............................................................................127 3.7 Conclusion ..................................................................................................137 3.8 Bibliography ................................................................................................137 Chapter 4: Working with Data ����������������������������������������������������������141 4.1 JSON ...........................................................................................................141 4.2 XML .............................................................................................................149 4.3 Scraping ......................................................................................................154 4.4 SQL ..............................................................................................................158 4.5 NoSQL .........................................................................................................166 4.6 Data on s3 ................................................................................................... 172 4.7 Data Transformations ..................................................................................178 4.8 Working with Matrices ................................................................................183 4.9 Conclusion ..................................................................................................186 4.10 Bibliography ..............................................................................................186 Table of ConTenTs
📄 Page 7
viii Chapter 5: Natural Language Processing �����������������������������������������187 5.1 Sentence Classification ...............................................................................188 5.2 Named Entity Recognition ...........................................................................201 5.3 Chatbots and Natural Language Understanding (NLU) ................................213 5.3.1 Building an Inference Engine ..............................................................219 5.4 Conclusion ..................................................................................................227 Chapter 6: Computer Vision ��������������������������������������������������������������229 6.1 Image Classification ....................................................................................229 6.1.1 Convolutional Neural Networks (CNN) ................................................230 6.1.2 Rust and Torch ....................................................................................232 6.1.3 Torch Dataset ......................................................................................232 6.1.4 CNN Model ..........................................................................................240 6.1.5 Model Building and Debugging ..........................................................246 6.1.6 Pretrained Models ..............................................................................249 6.2 Transfer Learning ........................................................................................254 6.2.1 Training ...............................................................................................256 6.2.2 Neural Style Transfer ..........................................................................257 6.3 Tensorflow and Face Detection ...................................................................264 6.4 Conclusion ..................................................................................................275 6.5 Bibliography ................................................................................................276 Chapter 7: Machine Learning Domains ���������������������������������������������277 7.1 Statistical Analysis ......................................................................................277 7.2 Writing High Performance Code ..................................................................290 7.3 Recommender Systems ..............................................................................294 7.3.1 Command Line ....................................................................................296 7.3.2 Downloading Data ..............................................................................299 7.3.3 Data ....................................................................................................300 Table of ConTenTs
📄 Page 8
ix 7.3.4 Model Building ....................................................................................302 7.3.5 Model Prediction .................................................................................307 7.4 Conclusion ..................................................................................................312 7.5 Bibliography ................................................................................................313 Chapter 8: Using Rust Applications ��������������������������������������������������315 8.1 Rust Plug-n-Play .........................................................................................315 8.1.1 Python ................................................................................................316 8.1.2 Java ....................................................................................................327 8.2 Rust in the Cloud .........................................................................................336 8.3 Conclusion ..................................................................................................346 8.4 Bibliography ................................................................................................346 Index �������������������������������������������������������������������������������������������������347 Table of ConTenTs
📄 Page 9
xi About the Author Joydeep Bhattacharjee is a Principal Engineer who works for Nineleaps Technology Solutions. After graduating from National Institute of Technology at Silchar, he started working in the software industry, where he stumbled upon Python. Through Python, he stumbled upon machine learning. He is the author of fastText Quick Start Guide (Packt, 2018). He has more than seven years’ experience in the software industry and around four years developing machine learning applications. He finds great pleasure in developing intelligent systems that can parse and process data to solve challenging problems at work. He believes in sharing knowledge and loves mentoring. He also maintains a machine learning blog on Medium.
📄 Page 10
xiii Acknowledgments First and foremost, I would like to thank all the open source maintainers of the Rust crates mentioned in this book and the developers of the Rust languages itself, without which this book would not have been possible. Additionally, I would like to thank my friend Sherin Thomas for his help on the PyTorch sections. Thanks to the Apress team for believing in me, to Celestin believing in my ideas, and to Aditee for pushing me on the initial drafts and for coordinating the whole process.
📄 Page 11
xv Introduction This book is all about exploring Machine Learning in Rust lang. We will learn about the intricacies of creating machine learning applications and how they fit in the Rust worldview. We will start from the very beginning by understanding some of the important concepts of Machine Learning as well as the basics of Rust lang. In the later chapters we will dive into the more specific areas of machine learning, such as data processing, computer vision, and natural language processing; and look at the Rust libraries that would make creating applications for those domains easier. We will also look at how to deploy those applications either onsite or over the cloud. By the end of the book, the reader will have a solid understanding of creating high computation libraries using Rust. Armed with the knowledge of this amazing language, they can begin working toward creating applications that are more performant, memory safe, and less-resource heavy. Who Is the Target Audience? This book is best suited for the programmer who works in industrial optimization problems and is looking for ways to write better code and create better applications. Although this book does not assume any machine learning experience and will explain all concepts, it would still be best if there is some machine learning experience, especially using one of the major programming languages such as Python. This book does not assume any Rust knowledge and will be good for a budding Rust developer interested in machine learning or someone who is not satisfied with the current ecosystem and would like to take a look at the options available.
📄 Page 12
1© Joydeep Bhattacharjee 2020 J. Bhattacharjee, Practical Machine Learning with Rust, https://doi.org/10.1007/978-1-4842-5121-8_1 CHAPTER 1 Basics of Rust In this chapter we will explore Rust as a language and the unique constructs and programming models that Rust supports. We will also discuss the biggest selling points of Rust and what makes this language particularly appealing to machine learning applications. Once we have an overview of Rust as a language, we will start with its installation. Then we will move on to Cargo, which is the official package manager of Rust, and how we can create applications in Rust. Later we will look at Rust programming constructs such as variables, looping constructs, and the ownership model. We will end this chapter by writing unit tests for Rust code and showing how the tests can be run using the standard package manager. By the end of this chapter, you should have a fair understanding of how to write and compile simple applications in Rust. 1.1 Why Rust? There is a general understanding that there are differences between low-level systems programming languages and high-level application programming languages. Received wisdom says that if you want to create performant applications and create libraries that work on bare metal, you will need to work in a language such as C or C++, and if you want to create applications for business use cases, you need to program in languages such as Java/Python/JavaScript.
📄 Page 13
2 The aim of the Rust language is to sit in the intersection between high- level languages and low-level languages. Programs that are close to the metal necessarily handle memory directly, which means that there is no garbage collection in the mix. In high-level languages, memory is managed for the programmer. Implementing garbage collection has costs associated with it. At the same time, garbage collection strategies that we have are not perfect, and there are still examples of memory leaks in programs with automatic memory management. One of the main reasons for memory leaks in higher-level languages are when packages are created to give an interface in the higher-level language but the core implementation is in a lower- level language. For example, the Python library pandas has quite a few memory leaks. Also, absence of evidence does not mean evidence of absence, and hence there is no formal proof that bringing in garbage collection strategies will remove all of the possible memory leaks. The other issue is with referencing. References are easy to understand in principle, and they are inexpensive and essential to achieving developer performance in software creation. As such, languages that target low- level software creation such as C and C++ allow unrestricted creation of references and mutation of the referenced object. 1.2 A Better Reference Typically, in object systems, objects live in a global object space called the heap or object store. There are no strict constraints on which part of the object store the object can access, because there are no restrictions on the way the object references are passed around. This has repercussions when preventing representation exposure for aggregate objects. The components that constitute an aggregate object are considered to be contained within that aggregate, and part of its representation. But, because the object Chapter 1 BasiCs of rust
📄 Page 14
3 store is global, there is, in general, no way to prevent other objects from accessing that representation. Enforcing the notion of containment with the standard reference semantics is impossible. A better solution can be to restrict the visibility of different types of objects that are created. This is done by saying that all objects have a context associated with them. All paths from the root of the system must pass through the objects’ owner. In Rust, types maintain a couple of key invariants that are listed here. To start, every storage location is guaranteed to have either • 1 mutable reference and 0 immutable references to it, or • 0 mutable references and n immutable references to it. We will see how this translates to actual code in a later part of this chapter. This invariant prevents races in concurrent systems as it prohibits concurrent reads and writes to a single memory location. By itself, however, this invariant is not enough to guarantee memory safety, especially in the presence of movable objects. For instance, since a given variable becomes unusable after the object has been moved, the storage locations associated with that variable may be reused or freed. As a result, any references previously created will be dangling after a move. This issue is also resolved by the previous ownership rules in Rust. References to an object are created transitively through its owner. The type of system guarantees that the owner does not change after a move while references are outstanding. Conversely the type of systems allows change of ownership when there are no outstanding references. Examples of this will be discussed in more detail in a later part of the chapter. All of what has just been mentioned is even more important in a machine learning application. Training machine learning applications involve a lot of data, and the more variation in the data the better, which translates to a lot of object creation during the training phase. You probably don’t want memory errors. After deployment of the models, the Chapter 1 BasiCs of rust
📄 Page 15
4 models that get created are matrices and tensors, which are implemented as a collection of floats. It is probably not desirable to keep past objects and predictions dangling in memory even after they have no more use. There are advantages from creating a concurrent system as well. Rust types guarantee that there will be no race conditions and hence programmers can safely create machine learning applications that try to spread out the computation as much as possible. Along with all this talk about memory safety and high performance, we also have high-level features such as type inference so we will not need to write types for all the variables. We will see when defining types are important in a later part of the chapter. Another interesting point from the earlier discussion is that when writing Rust code, we are not thinking about memory. So, from a usage point of view, it feels like memory is being managed for us. Then there are other constructs such as closures, iterators, and standard libraries, which make writing code in Rust more like writing in a high-level language. For machine learning applications, this is crucial. High-level languages such as Python have succeeded in machine learning because of the expressiveness of the language that supports free-form experimentation and developer productivity. In this chapter we will be taking a look at the basics of Rust and the programming constructs that make Rust the language it is. We primarily cover topics such as Structs and Enums that look and feel different in this language and might not be what we would expect in this language. We will skip a lot of important things such as individual data types, which are similar to other languages such as C and Java. One of the core designs of Rust is that the programming feel should be the same as C and Java, which are more popular so that programmers coming from these languages don’t have to do a lot of mental overhauling while also gaining a lot of memory advantages that have not been considered before. Chapter 1 BasiCs of rust
📄 Page 16
5 1.3 Rust Installation In this section we explore how to install Rust based on the operating system. The command and a possible output are shown. $ curl https://sh.rustup.rs -sSf | sh info: downloading installer Welcome to Rust! This will download and install the official compiler for the Rust programming language, and its package manager, Cargo. It will add the cargo, rustc, rustup and other commands to Cargo's bin directory, located at: /home/ubuntu/.cargo/bin This path will then be added to your PATH environment variable by modifying the profile file located at: /home/ubuntu/.profile You can uninstall at any time with rustup self uninstall and these changes will be reverted. Current installation options: default host triple: x86_64-unknown-linux-gnu default toolchain: stable modify PATH variable: yes 1) Proceed with installation (default) 2) Customize installation 3) Cancel installation >1 Chapter 1 BasiCs of rust
📄 Page 17
6 info: syncing channel updates for 'stable-x86_64-unknown-linux-gnu' info: latest update on 2019-02-28, rust version 1.33.0 (2aa4c46cf 2019-02-28) info: downloading component 'rustc' 84.7 MiB / 84.7 MiB (100 %) 67.4 MiB/s ETA: 0 s info: downloading component 'rust-std' 56.8 MiB / 56.8 MiB (100 %) 51.6 MiB/s ETA: 0 s info: downloading component 'cargo' info: downloading component 'rust-docs' info: installing component 'rustc' 84.7 MiB / 84.7 MiB (100 %) 10.8 MiB/s ETA: 0 s info: installing component 'rust-std' 56.8 MiB / 56.8 MiB (100 %) 12.6 MiB/s ETA: 0 s info: installing component 'cargo' info: installing component 'rust-docs' 8.5 MiB / 8.5 MiB (100 %) 2.6 MiB/s ETA: 0 s info: default toolchain set to 'stable' stable installed - rustc 1.33.0 (2aa4c46cf 2019-02-28) Rust is installed now. Great! To get started, you need Cargo's bin directory ($HOME/.cargo/ bin) in your PATHenvironment variable. Next time you log in this will be done automatically. To configure your current shell, run source $HOME/.cargo/env If we study the earlier output, we will see the following points. • rustup script has been successfully able to identify my distribution and will be installing rust binaries that are compatible with it. Chapter 1 BasiCs of rust
📄 Page 18
7 • Installation of Rust along with Cargo (the official rust package manager) will be run through this command. • The commands will be added to <home/>.cargo/bin and will be accessible from the command line. 1.4 Package Manager and Cargo Cargo is a convenient build tool for development of Rust applications and libraries. The package information is supposed to be saved in a toml (Toms Obvious, Minimal Language) file. The toml file format is relatively new, and according to the github toml repo, it is designed to map unambiguously to a hash table. 1.5 Creating New Applications in Rust Creating a new application is simple in Rust. $ cargo new myfirstapp Created binary (application) `myfirstapp` package Check the Cargo.toml file. You should see something like the following. [package] name = "myfirstapp" version = "0.1.0" authors = ["Joydeep Bhattacharjee"] edition = "2018" [dependencies] As you can see, there is some basic information added here. Important items are the name and the version. If you check the contents of the src/ folder, you can also see that there is a main.rs file. Check the contents of the main.rs file. You can see that Chapter 1 BasiCs of rust
📄 Page 19
8 there is a minimal file written with main function. To run a Rust app, you will need the main function that acts as the entry point for the code. The code in this case is a simple printing of hello world. fn main() { println!("Hello, world!"); } We can now build the application using the build command. This will generate a binary file that can be used to run the application. Once development of the application is done, we can use the --release flag to create an optimized binary. This needs to be done because by default, cargo builds disable many optimizations so that they are useful for testing. So when creating builds for production usage, the release flag should be used. $ cargo build Compiling myfirstapp v0.1.0 (/tmp/myfirstapp) Finished dev [unoptimized + debuginfo] target(s) in 8.47s $ ls target/debug/myfirstapp target/debug/myfirstapp $ ./target/debug/myfirstapp Hello, world! While developing, we can also use the cargo run command to shortcut the procedure just shown. $ cargo run Finished dev [unoptimized + debuginfo] target(s) in 0.42s Running `target/debug/myfirstapp` Hello, world! Chapter 1 BasiCs of rust
📄 Page 20
9 1.6 Variables in Rust In Rust, variables are defined using the let keyword. The types of the variables will be inferred for us. Take a look at the next example. let x = "learning rust"; println!("{}", x); println is used to see the variable. There is a note on the special construct println! here. When you see the ! sign after a function name, that means that a macro is being used. Macros are special metaprogramming constructs in Rust, which are outside the scope of this book. The macro println is being used because Rust does not support variable args in functions and hence println has to be a macro. We can see the type of the variable using the following code. #![feature(core_intrinsics)] fn print_type_of<T>(_: &T) { println!("{}", unsafe { std::intrinsics:: type_name::<T>() }); } fn main() { let x = "learning rust"; println!("{}", x); print_type_of(&x); } This will not run in a default stable version though and will need to be compiled in the nightly version. The nightly compiler will need to be enabled. Nightly version is the place where unstable or potentially unstable code is kept, and so language features such as the one that we are discussing right now will only be available in a nightly version. Chapter 1 BasiCs of rust
The above is a preview of the first 20 pages. Register to read the complete e-book.

💝 Support Author

0.00
Total Amount (¥)
0
Donation Count

Login to support the author

Login Now

Recommended for You

Loading recommended books...
Failed to load, please try again later
Back to List