How to Write Clean Code in Python Notebooks

Python notebooks are popular for both data analysis and data science. They're interactive, allowing you to write code, run it, and see the results in the same environment. But just like any code, your notebooks can become a mess if you're not careful. In this blog post, we'll cover some effective practices for writing clean, readable, and maintainable code in Python notebooks.

Use Markdown Cells for Documentation

Python notebooks are not just about code; they also allow us to incorporate Markdown cells. These can be beneficial to structure your notebook and add explanations, making the notebook more understandable to others (or to yourself in the future).

For instance, instead of commenting on your code like this:

1# The following function calculates the square of a number
2def square(n):
3    return n ** 2

You can make use of a Markdown cell above the code cell:

1In the next cell, we define a function to calculate the square of a number.

1def square(n):
2    return n ** 2

This allows for clear separation between your documentation and your code.

Break Down Complex Code into Multiple Cells

Python notebooks allow you to execute chunks of code independently. This feature can be used to improve readability and debug code more effectively. Rather than writing all your code in a single cell, you can break it down into smaller pieces, each accomplishing a specific task.

For instance, consider this chunk of code:

1import pandas as pd
2
3df = pd.read_csv('data.csv')
4df = df.dropna()
5df['new_column'] = df['old_column'] * 2
6df.to_csv('new_data.csv')

This can be split into several cells, making it easier to understand and debug:

import pandas as pd

df = pd.read_csv('data.csv')

df = df.dropna()

df['new_column'] = df['old_column'] * 2

df.to_csv('new_data.csv')

Organize Your Notebook with Sections and Subsections

You can use Markdown cells to create sections and subsections, similar to how you would structure a standard report or document. This keeps your notebook organized and makes it easier for others to understand the flow of your analysis.

For instance:

# Section 1: Data Preprocessing

## Subsection 1.1: Data Loading

## Subsection 1.2: Data Cleaning

# Section 2: Exploratory Data Analysis

Avoid Hard-Coding Values

Hard-coding values in your code can lead to mistakes and make it harder to maintain. Instead, assign important values to variables at the beginning of your notebook.

Instead of this:

1df = pd.read_csv('data.csv')
2df = df[df['Age'] > 30]

Do this:

1data_file = 'data.csv'
2min_age = 30
3
4df = pd.read_csv(data_file)
5df = df[df['Age'] > min_age]

Now, if you need to change the file name or the minimum age, you can do it in one place, instead of searching through your entire notebook.

Include Error Handling

When your code encounters an error, it's helpful to know why. You can include error handling in your code to catch exceptions and provide helpful error messages.

For example:

1try:
2    df = pd.read_csv('non_existent_file.csv')
3except FileNotFoundError:
4    print("The data file does not exist. Please check the file name and try again.")

This way, if your data file is not found, you get a useful error message instead of a generic Python exception.

Regularly Restart the Kernel and Run All Cells

Over time, your notebook's state may become messy due to running cells out of order or modifying variables. To ensure that your code is reproducible and doesn't depend on any particular cell execution order, regularly restart the kernel and run all cells from the top. This action can help you catch hidden bugs that might go unnoticed otherwise.

Minimize the Output

While it's helpful to use print statements for debugging and understanding how your code works, it's a good practice to minimize the output when you're done. This doesn't mean you have to remove all print statements, but you should ensure that your code doesn't produce an overwhelming amount of output, which can make your notebook hard to navigate.

For example, if you're looping through a large list and printing the result for each iteration, consider removing the print statement or only printing for a subset of the iterations once you're done.

Share Code Across Notebooks

Often, you'll find that you have a function or a class that could be useful in multiple notebooks. Instead of duplicating the code, consider putting it in a separate Python file and importing it. This makes your code more modular and easier to maintain.

Here's an example:

1# util.py
2def useful_function(x):
3    return x ** 2
4
5# In your notebook
6from util import useful_function
7
8print(useful_function(5))

In this way, you keep your notebooks concise and focused on the task at hand, while common utility functions and classes are maintained separately.

These practices, along with the previous ones, will help you create Python notebooks that are clean, efficient, and understandable, both to others and your future self. Clean code is a continuous practice, but the effort pays off in better productivity, easier debugging, and more maintainable code.

‍

Happy coding!