Author:Dr. Shahin Rostami
A practical book on Data Analysis with Rust Notebooks that teaches you the concepts and how they’re implemented in practice.
Tags
Support Statistics
¥.00 ·
0times
Text Preview (First 20 pages)
Registered users can read the full content for free
Register as a Gaohf Library member to read the complete e-book online for free and enjoy a better reading experience.
Page
1
DR. SHAHIN ROSTAMI DATA ANALYSIS WITH RUST NOTEBOOKS licensed to yz@mtx.gg on 18th January 2022
Page
2
Biography Dr. Shahin Rostami is the Founder & Principal Consultant at Polyra Limited, a company specialising in Data Science Research, Development, and Consulting. He holds a Ph.D. in the field of Computational Intelligence with applications to Concealed Weapon Detection. His research interests lie within Data Science and Artificial Intelligence, ranging from theory to their application to Digital Healthcare and Threat Detection. Before his leap into industry research & development as the Head of Data Science at a Xim Limited, he held the position of Senior Academic (Associate Professor) in Data Science & Artificial Intelligence at Bournemouth University, where he was a faculty member for 7 years and has since become a Visiting Fellow. He also led the Computational Intelligence Research Initiative (CIRI), and supervised 5 Ph.D. and many Ms.c. students in related subjects. He has founded and held the position of Programme Leader for many programmes at the postgraduate level: MS.c. Data Science and Artificial Intelligence (DSAI); MS.c. Digital Health and Artificial Intelligence (DHAI); and MS.c. Applied Data Analytics (ADA). He has designed and taught both postgraduate and undergraduate curriculum, such as Search and Optimisation, Artificial Intelligence, and Data Mining and Analytic Technologies. He continues his academic activities and collaboration with many universities through joint publications and reviewing for high-impact journals and conferences, organisation and chairing of special sessions and conferences, supervision of PhD students on university research projects, guest lectures, and open access dissemination of research and education content including those on YouTube. He has authored four books on the subjects of Data Science, Visualisation, and Evolutionary Computation. He has also authored and published a profitable Software as a Service (SaaS), a full-featured visualisation API for producing beautiful interactive visualisations that have been used in publications by companies and institutions in industry, government, and academia. shahinrostami.com YT: ShahinRostami @ShahinRostami Github: shahinrostami licensed to yz@mtx.gg on 18th January 2022
Page
3
u/shahinrostami Patreon: patreon.datacrayon.com IG: Data.Crayon Contents © 2021 Dr. Shahin Rostami licensed to yz@mtx.gg on 18th January 2022
Page
4
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .25 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .29 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .37 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .40 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .48 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .57 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .62 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .68 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .75 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .90 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .97 version 2021.9.3 Table of Contents Preface Setup Anaconda, Jupyter, and Rust Plotting with Plotters Plotting with Plotly Better Plotting with Plotly Finishing Touches for Visualisation Multidimensional Arrays and Operations with NDArray Better Output for 2D Arrays Loading Datasets from CSV into NDArray Typed Arrays from String Arrays for Dataset Operation Descriptive Statistics with NDArray Unique Array Elements and their Frequency NDArray Index Arrays and Mask Index Arrays Interactive Chord Diagrams Visualisation of Co-occurring Types Box Plots at the Olympics licensed to yz@mtx.gg on 18th January 2022
Page
5
Preface Page 1 Preface Preface The Rust programming language has become a popular choice amongst software engineers since its release in 2010. Besides being something new and interesting, Rust promised to offer exceptional performance and reliability. In particular, Rust achieves memory-safety and thread-safety through its ownership model. Instead of runtime checks, this safety is assured at compile time by Rust's borrow checker. This prevents undefined behaviour such as dangling pointers! println!("Hello World!"); Hello World! I first encountered Rust sometime around 2015 when I was updating my teaching materials on memory management in C. A year later in 2016, I implemented a simple multi-objective evolutionary algorithm in Rust as an academic exercise (available: https://github.com/shahinrostami/simple_ea). I didn't have any formal training with Rust, nor did I complete any tutorial series, I just figured things out using the documentation as I progressed through my project. Some example code from this project takes ZDT1 from its mathematical expression in Equation 1 to the Rust implementation below. f (x )1 1 f (x)2 g(x ,… ,x )2 D h(f , g)1 = x1 = g ⋅ h = 1 + 9 ⋅ d=2 ∑ D (D − 1) xd = 1 − f1/g (1) licensed to yz@mtx.gg on 18th January 2022
Page
6
Preface Page 2 pub fn zdt1(parameters: [f32; 30]) -> [f32; 2] { let f1 = parameters[0]; let mut g = 1_f32; for i in 1..parameters.len() { g = g + ((9_f32 / (parameters.len() as f32 - 1_f32)) * parameters[i]); } let h = 1_f32 - (f1 / g).sqrt(); let f2 = g * h; return [f1, f2]; } It was interesting to see that since writing this code in 2016, some of my dependencies have been deprecated and replaced. My greatest challenge was breaking away from what I already knew. Until this point, I was familiar with languages such as C, C++, C#, Java, Python, MATLAB, etc., with the majority of my time spent working with memory managed languages. I found myself resisting Rust's intended usage, and it is still something I'm working on. Now that I am about to commence my sabbatical from my University post, I've decided to try Rust again. This time, I'm going to write a book which focusses on using Rust and Jupyter Notebooks to implement algorithms and conduct experiments, most likely in the fields of search, optimisation, and machine learning. Can we write and execute all our code in a Jupyter Notebook? Yes! Should we? Probably not. However, I enjoy the workflow, and making this an enjoyable process is important to me. Dr. Shahin Rostami @ShahinRostami Looking through the November commits for Evcxr and incredibly flattered to see my book mentioned! Note I aim to generate everything in this book through code. This means you will see the code for all my figures and tables, including things like flowcharts. This book is currently available in early access form. It is being actively worked on and updated. Every section is intended to be independent, so you will find some repetition as you progress from one section to another. licensed to yz@mtx.gg on 18th January 2022
Page
7
Preface Page 3 and incredibly flattered to see my book mentioned! github.com/google/evcxr/c… #rustlang #rust 6 54 PM · Dec 7, 2020 7 Copy link to Tweet Tweet your reply licensed to yz@mtx.gg on 18th January 2022
Page
8
Setup Anaconda, Jupyter, and Rust Page 4 Setup Anaconda, Jupyter, and Rust Software Setup We are taking a practical approach in the following sections. As such, we need the right tools and environments available in order to keep up with the examples and exercises. We will be using Rust along with packages that will form our scientific stack, such as ndarray (for multi-dimensional containers) and plotly (for interactive graphing), etc. We will write all of our code within a Jupyter Notebook, but you are free to use other IDEs. Download SourceContents Software Setup Install Miniconda Create Your Environment Install Packages Install Jupyer Lab Extensions Install Rust Install the EvCxR Jupyter Kernel A Quick Test Conclusion licensed to yz@mtx.gg on 18th January 2022
Page
9
Setup Anaconda, Jupyter, and Rust Page 5 Figure 1 - A Jupyter Notebook being edited within Jupyter Lab. Theme from https://github.com/shahinrostami/theme-purple-please Install Miniconda There are many different ways to get up and running with an environment that will facilitate our work. One approach I can recommend is to install and use Miniconda. Miniconda is a free minimal installer for conda. It is a small, bootstrap version of Anaconda that includes only conda, Python, the packages they depend on, and a small number of other useful packages, including pip, zlib and a few others. — https://docs.conda.io/en/latest/miniconda.html You can skip Miniconda entirely if you prefer and install Jupyter Lab directly, however, I prefer using it to manage other environments too. You can find installation instructions for Miniconda on their website, but if you're using Linux (e.g. Ubuntu) you can execute the following commands from in your terminal: wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh chmod +x Miniconda3-latest-Linux-x86_64.sh ./Miniconda3-latest-Linux-x86_64.sh This will download the installation files and start the interactive installation process. Follow the process to the end, where you should see the following message: Thank you for installing Miniconda3! All that's left is to close and re-open the terminal window. Create Your Environment Once Miniconda is installed, we need to create and configure our environment. If you added Miniconda to your PATH environment during the installation process, then you can run these commands directly from Terminal, Powershell, or CMD. Now we can create and configure our conda environment using the following commands. conda create -n darn python=3 You can replace darn (Data Analytics with Rust Notebooks) with a name of your choosing. This will create a conda environment named darn with the latest Python 3 package ready to go. You should be presented with a list of packages that will be installed and asked if you wish to proceed. To do so, just enter the character y . If this operation is successful, you should see the following output at the end: licensed to yz@mtx.gg on 18th January 2022
Page
10
Setup Anaconda, Jupyter, and Rust Page 6 Preparing transaction: done Verifying transaction: done Executing transaction: done # # To activate this environment, use # # $ conda activate darn # # To deactivate an active environment, use # # $ conda deactivate As the message suggests, you will need to type the following command to activate and start entering commands within our environment named darn . conda activate darn Once you do that, you should see your terminal prompt now leads with the environment name within parentheses: (darn) melica:~ shahin$ This will allow you to identify which environment you are currently operating in. If you restart your machine, you should be able to use conda activate darn within your conda prompt to get back into the same environment. Install Packages If your environment was already configured and ready, you would be able to enter the command jupyter lab to launch an instance of the Jupyter Lab IDE in the current directory. However, if we try that in our newly created environment, we will receive an error: Note The example above shows the macOS machine name "melica" and the user "shahin". You will see something different on your machine, and it may appear in a different format on a different operating system such as Windows. As long as the prompt leads with "(darn)", you are on the right track. licensed to yz@mtx.gg on 18th January 2022
Page
11
Setup Anaconda, Jupyter, and Rust Page 7 (darn) melica:~ shahin$ jupyter lab -bash: jupyter: command not found So let's fix that. Let's install Jupyter Lab and use the -y option which automatically says "yes" to any questions asked during the installation process. conda install -c conda-forge jupyterlab=2.2.9 We'll also need cmake later on. conda install -c anaconda cmake -y Finally, let's install nodejs. This is needed to run our Jupyter Lab extension in the next section. conda install -c conda-forge nodejs=15 -y Install Jupyer Lab Extensions There's one last thing we need to do before we move on, and that's installing any Jupyter Lab extensions that we may need. One particular extension that we need is the plotly extension, which will allow our Jupyter Notebooks to render our Plotly visualisations. Within your conda environment, simply run the following command: jupyter labextension install jupyterlab-plotly This may take some time, especially when it builds your jupyterlab assets, so keep an eye on it until you're returned control over the conda prompt, i.e. when you see the following: (darn) melica:~ shahin$ Optionally, you may wish to install the purple looking theme from Figure 1 above. jupyter labextension install @shahinrostami/theme-purple-please Now we're good to go! Install Rust Now we'll install Rust using rustup, but you can check out the other installation methods if you need them. curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh licensed to yz@mtx.gg on 18th January 2022
Page
12
Setup Anaconda, Jupyter, and Rust Page 8 The code samples in this book will work in many versions of Rust, but I can confirm them to be working with version 1.42.0 . You can get the same version with: rustup default 1.42.0 You will be given instructions for adding Cargo's bin directory to your PATH environment variable. source $HOME/.cargo/env This will work until your close your terminal, so make sure to add it to your shell profile. I use Z shell (Zsh) so this meant adding the following to .zshrc : export PATH="$HOME/.cargo/bin:$PATH" You can make sure everything works by closing and re-opening your terminal and typing cargo . If this returns the usage documentation then you're all set. Install the EvCxR Jupyter Kernel Now we'll install the EvCxR Jupyter Kernel. If you're wondering how it's pronounced, it's been mentioned to be "Evic-ser". This is what will allow us to execute Rust code in a Jupyter Notebook. You can get other installation methods methods for EvCxR if you need then, but we will be using: cargo install evcxr_jupyter --version 0.5.3 evcxr_jupyter --install A Quick Test Let's test if everything is working as it should be. In your conda prompt, within your conda environment, run the following command jupyter lab Note Don't forget to activate your environment when opening the terminal. licensed to yz@mtx.gg on 18th January 2022
Page
13
Setup Anaconda, Jupyter, and Rust Page 9 This should start the Jupyter Lab server and launch a browser window with the IDE ready to use. Figure 2 - A fresh installation of Jupyter Lab. Let's create a new notebook. In the Launcher tab which has opened by default, click "Rust" under the Notebook heading. This will create a new and empty notebook named Untitled.ipynb in the current directory. If everything is configured as it should be, you should see no errors. Type the following into the first cell and click the "play" button to execute it and create a new cell. println!("Hello World!"); Hello World! If we followed all the instructions and didn't encounter any errors, everything should be working. We should see "Hello World!" in the output cell. Conclusion In this section, we've downloaded, installed, configured, and tested our environment such that we're ready to run the following examples and experiments. If you ever find that you're missing Jupyter Lab packages, you can install them in the same way as we installed Jupyter Lab and the others in this section. licensed to yz@mtx.gg on 18th January 2022
Page
14
Plotting with Plotters Page 10 Plotting with Plotters Plotting with Plotters I had originally planned to use Plotters for all the graphing in this book. However, shortly after finding Plotters, I found out that a Rust library had enabled Plotly support. You will see this in later sections, but for now, here is an example of how Plotters works. :dep plotters = { git = "https://github.com/38/plotters", default_features = false, features = ["evcxr", "line_series"] } extern crate plotters; use plotters::prelude::*; use plotters::series::*; let figure = evcxr_figure((640, 480), |root| { root.fill(&WHITE); let mut chart = ChartBuilder::on(&root) .caption("y=x^2", ("Arial", 50).into_font()) .margin(5) .x_label_area_size(30) .y_label_area_size(30) .build_ranged(-1f32..1f32, -0.1f32..1f32)?; chart.configure_mesh().draw()?; chart.draw_series(LineSeries::new( (-50..=50).map(|x| x as f32 / 50.0).map(|x| (x, x * x)), &RED, )).unwrap() .label("y = x^2") .legend(|(x,y)| PathElement::new(vec![(x,y), (x + 20,y)], &RED)); chart.configure_series_labels() .background_style(&WHITE.mix(0.8)) .border_style(&BLACK) .draw()?; Ok(()) }); figure Download SourceContents Plotting with Plotters licensed to yz@mtx.gg on 18th January 2022
Page
15
Plotting with Plotters Page 11 y=x^2 0.0 0.2 0.4 0.6 0.8 -0.8 -0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 0.8 1.0 y = x^2 licensed to yz@mtx.gg on 18th January 2022
Page
16
Plotting with Plotly Page 12 Plotting with Plotly Preamble :dep plotly = {version = "0.4.0"} extern crate plotly; use plotly::{Plot, Scatter}; use plotly::common::{Mode}; use std::fs; Plotly for Visualisation In my other book, Practical Evolutionary Algorithms, I relied on the Plotly graphic libraries to generate visualisations throughout each notebook. When I started writing Rust Notebooks a Plotly solution was not available, however, I found Plotters to be a suitable alternative for rendering visualisations. Less than 24 hours after making that decision, a plotting library for Rust powered by Plotly.js was posted on Reddit and caught my attention. At the time of writing this section, there is no documented support for rendering within Jupyter Notebook cells, however, it is possible to use the .to_html() function to save to a HTML file, and then load and print that HTML file with Rust. We'll store this in a file named temp_plot.html . let plotly_file = "temp_plot.html"; Let's demonstrate this with the first code example listed on the Plotly with Rust README. Download SourceContents Preamble Plotly for Visualisation Conclusion licensed to yz@mtx.gg on 18th January 2022
Page
17
Plotting with Plotly Page 13 let trace1 = Scatter::new(vec![1, 2, 3, 4], vec![10, 15, 13, 17]) .name("trace1") .mode(Mode::Markers); let trace2 = Scatter::new(vec![2, 3, 4, 5], vec![16, 5, 11, 9]) .name("trace2") .mode(Mode::Lines); let trace3 = Scatter::new(vec![1, 2, 3, 4], vec![12, 9, 15, 12]).name("trace3"); let mut plot = Plot::new(); plot.add_trace(trace1); plot.add_trace(trace2); plot.add_trace(trace3); Next, we will save this to a file using the .to_html() function, read it to plotly_contents using Rust, print it using println!() , and finally delete the file created by .to_html() as we don't need it after it is embedded. plot.to_html(plotly_file); let plotly_contents = fs::read_to_string(plotly_file).unwrap(); println!("EVCXR_BEGIN_CONTENT text/html\n{}\nEVCXR_END_CONTENT", plotly_contents); fs::remove_file(plotly_file)?; Conclusion In this section we've demonstrated how to embed Plotly visualisations in a Jupyter Notebook with a small workaround. The only disadvantage to this solution that I've noticed is the large file size, e.g. this notebook weighs in at around 3.4MB. It could be that 1 2 3 4 5 6 8 10 12 14 16 trace1 trace2 trace3 licensed to yz@mtx.gg on 18th January 2022
Page
18
Plotting with Plotly Page 14 this feature is added to Plotly for Rust soon, or that it already exists and is just awaiting some documentation, but I'm happy with the solution so far. Note Since writing this section, Plotly for Rust is now able to display plots in output cells, such as those in Jupyter Lab with evcxr_display() . The approach described in these sections gives more control over the markup, template, and size of the page, which is pertinent when displaying multiple plots on a single page with the intention to export and share the generated HTML with interactive plots. licensed to yz@mtx.gg on 18th January 2022
Page
19
Better Plotting with Plotly Page 15 Better Plotting with Plotly Preamble :dep plotly = {version = "0.4.0"} :dep nanoid = {version = "0.3.0"} extern crate plotly; extern crate nanoid; use plotly::{Plot, Scatter}; use plotly::common::{Mode}; use nanoid::nanoid; use std::fs; let plotly_file = "temp_plot.html"; Introduction In the last section, we covered how to get plotting with Ploty using Plotly for Rust paired with our very own workaround. If you continued experimenting with this approach before starting this section you may have encountered some limitations: File size. The notebook file from the previous section, plotting-with- plotly.ipynb , weighed in at around MB. This is an unusually large file for what was only a few paragraphs and a single interactive plot. Multiple plots. If you tried to output a second Plotly plot in the same notebook, only the first one would be rendered. File size, again. If you did solve the issue regarding multiple plots, your file size would grow linearly for every plot output. A second plot would take you from MB to Download SourceContents Preamble Introduction Example Plotly Plot Reducing the File Size Allowing Multiple Plots Archived: Loading Plotly with RequireJS Loading Plotly on Demand Putting Everything Together Conclusion 3.4 3.4 6.8 licensed to yz@mtx.gg on 18th January 2022
Page
20
Better Plotting with Plotly Page 16 MB. We're going to improve our workaround so that we can produce many of our nice interactive plots without bloating our notebooks and any HTML files we may save to. Example Plotly Plot Let's use the code from the previous section to generate our plot. We will then save this to a file as HTML, and load it back into a string for further processing. let trace1 = Scatter::new(vec![1, 2, 3, 4], vec![10, 15, 13, 17]) .name("trace1") .mode(Mode::Markers); let trace2 = Scatter::new(vec![2, 3, 4, 5], vec![16, 5, 11, 9]) .name("trace2") .mode(Mode::Lines); let trace3 = Scatter::new(vec![1, 2, 3, 4], vec![12, 9, 15, 12]).name("trace3"); let mut plot = Plot::new(); plot.add_trace(trace1); plot.add_trace(trace2); plot.add_trace(trace3); plot.to_html(plotly_file); let plotly_contents = fs::read_to_string(plotly_file).unwrap(); Reducing the File Size If you open the HTML output that was saved to temp_plot.html , you may notice that the entire contents of plotly.js have also been embedded. This will be true for all output created by Plotly for Rust's .to_html() function. This also means that if we have two of these plots in our notebook using the workaround, we will have two copies of plotly.js also embedded. Because we're using the Plotly Jupyter Lab extension, @jupyterlab/plotly-extension , we don't need to embed plotly.js at all. So let's extract the part of this HTML file that we actually need. We can do this by slicing out a substring starting from one part of the string that we know starts off the part we need, <div id=\"plotly-html-element\" class=\"plotly-graph-div let start_bytes = plotly_contents .find("<div id=\"plotly-html-element\" class=\"plotly-graph-div\"") .unwrap_or(0); and ending at another part that we know immediately follows the last part we need </div></body></html> . licensed to yz@mtx.gg on 18th January 2022
Comments 0
Loading comments...
Reply to Comment
Edit Comment