Date: February 25, 2025
In the world of data science and automation, running Jupyter Notebooks manually can be repetitive and time-consuming, especially when working with different datasets or parameters. This is where Papermill comes in!
Papermill is an open-source Python library that allows you to:
With Papermill, you can efficiently run the same notebook multiple times with different inputs, making it a powerful tool for data pipelines, automated reporting, and machine learning workflows.
Papermill is a Python library, so you can install it using pip
:
pip install papermill
After installation, check if Papermill is installed correctly by running:
papermill --help
Papermill works with Jupyter Notebooks, so make sure Jupyter is installed:
pip install notebook
Once installed, you can use Papermill to execute a notebook from the command line:
papermill input_notebook.ipynb output_notebook.ipynb -p param_name param_value
Example: Running a notebook while setting a parameter
papermill my_notebook.ipynb output.ipynb -p num_epochs 10
You can also run Papermill inside a Python script:
import papermill as pm
pm.execute_notebook(
"input_notebook.ipynb",
"output_notebook.ipynb",
parameters={"num_epochs": 10, "learning_rate": 0.01}
)
Now you have Papermill installed and ready to use!
Papermill is a powerful tool that allows for automation and parameterization of Jupyter Notebooks. Below are its key functionalities:
What it does: Papermill allows users to pass parameters into a Jupyter Notebook at runtime, enabling dynamic execution with different inputs.
How it works: A notebook can contain a specially tagged cell (usually marked as "parameters" in Jupyter) where variables are defined. Papermill can override these variables at execution time.
Why it's useful:
Example Usage:
# This cell is tagged as "parameters" in Jupyter
alpha = 0.1
beta = 0.5
Running the notebook with different parameters using Papermill:
papermill input_notebook.ipynb output_notebook.ipynb -p alpha 0.2 -p beta 0.7
What it does: Papermill automates the execution of Jupyter Notebooks programmatically.
How it works: It runs the notebook in a clean execution environment, executing each cell sequentially from top to bottom.
Why it's useful:
Example Usage:
papermill input_notebook.ipynb output_notebook.ipynb
What it does: Papermill records execution metadata, including runtime parameters, execution duration, and timestamps.
How it works: When a notebook is executed, metadata is embedded in its JSON structure. This metadata can be inspected later to track how a notebook was executed.
Why it's useful:
What it does: Papermill captures errors that occur during execution and preserves partial execution states.
How it works: If an error occurs, Papermill stops execution but keeps the notebook intact with executed cells up to the failure point.
Why it's useful:
Example Usage:
papermill input_notebook.ipynb output_notebook.ipynb --log-output
What it does: Papermill integrates with workflow orchestration tools like Apache Airflow, Kubeflow, and AWS Step Functions.
How it works: It allows notebooks to be treated as reusable components in larger data pipelines.
Why it's useful:
What it does: Papermill supports reading and writing notebooks from various storage systems and executing notebooks on remote servers.
How it works: It can read and write notebooks from:
It can also execute notebooks remotely on:
Why it's useful:
Example Usage:
papermill s3://my-bucket/input_notebook.ipynb s3://my-bucket/output_notebook.ipynb
What it does: Papermill saves executed notebooks with results and outputs.
How it works: After execution, the modified notebook is stored with:
Why it's useful:
Below is an example of how to use Papermill for automating Jupyter Notebooks.
Firstly ensure you have papermill installed by running the following command in your terminal:pip install papermill
papermill input_notebook.ipynb output_notebook.ipynb -p alpha {value1} -p beta {value2}
papermill input_notebook.ipynb output_notebook.ipynb -p alpha 1 -p beta 1
papermill input_notebook.ipynb output_notebook.ipynb -p alpha 2 -p beta 2
papermill input_notebook.ipynb output_notebook.ipynb -p alpha 3 -p beta 3
Output Notebook
CSV File
Below are some screenshots to illustrate key functionalities of Papermill.
Input Notebook: papermill_demo.ipynb
Output Notebook: output_notebook.ipynb
Parameters: alpha, beta
This shows Papermill executing a Jupyter Notebook with parameters.
Papermill automatically logs execution details for tracking and debugging.
This shows a Jupyter Notebook cell tagged as "parameters" with variables defined.
This shows a notebook with an error during execution and the cells executed up to the failure point.
This shows an executed notebook with updated parameter values and outputs.
Papermill is widely used in various domains to automate Jupyter Notebooks.
These use cases highlight the versatility of Papermill in automating and parameterizing Jupyter Notebooks, making it a valuable tool for data scientists, engineers, and analysts.
Papermill is a powerful and versatile tool that significantly enhances the capabilities of Jupyter Notebooks. By enabling parameterization, automated execution, and detailed execution tracking, Papermill streamlines workflows and boosts productivity for data scientists, engineers, and analysts.
Whether you are running experiments with different datasets, automating machine learning model training, generating periodic reports, or integrating notebooks into complex data pipelines, Papermill provides the flexibility and scalability needed to handle these tasks efficiently.
With its support for cloud deployments and various storage systems, Papermill also facilitates seamless integration into modern data science and machine learning environments. By leveraging Papermill, you can save time, ensure reproducibility, and scale your workflows effectively.
We hope this blog has provided you with a comprehensive understanding of Papermill's key features and use cases. Start exploring Papermill today and unlock the full potential of your Jupyter Notebooks!