Python Logging in Jupyter Notebooks

Introduction

Logging is an essential part of software development, especially when building complex data pipelines, dashboards, or scientific notebooks. In a Jupyter Notebook environment, logging can be slightly different than in a standard Python script. This blog post aims to guide you through setting up and using Python's built-in logging module in Jupyter Notebooks.

‍

Why Logging in Jupyter Notebooks?

Debugging: Easier to debug errors and exceptions.
Monitoring: Keep track of variable values, data transformations, and function calls.
Audit Trail: Maintain a record of actions for compliance and review.

Setting Up Logging

Firstly, let's import the logging module and configure it. The basic setup involves setting the logging level and format.

import logging

logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')

When you set up logging using logging.basicConfig(), the format parameter allows you to specify the layout of log messages. This is done through a format string that can contain various placeholders, encapsulated in percentage signs %. These placeholders get substituted with actual log record attributes when a log message is emitted.

Here are some commonly used placeholders:

%(asctime)s: The time when the log record was created.
%(levelname)s: The level of the log (DEBUG, INFO, etc.).
%(message)s: The log message itself.
%(name)s: The name of the logger.
%(filename)s: The filename where the log call was made.
%(lineno)d: The line number in the file where the log call was made.

Basic Logging

The most straightforward way to log is to use the logging levels provided by the Python logging module: DEBUG, INFO, WARNING, ERROR, and CRITICAL.

logging.debug("This is a debug message")
logging.info("This is an info message")
logging.warning("This is a warning message")
logging.error("This is an error message")
logging.critical("This is a critical message")

Certainly. Debug levels in Python's logging module help you control the granularity of log output. Here's a brief explanation of each:

DEBUG: Provides detailed information for diagnostic purposes. Use this level to output everything, including data that might help diagnose issues or understand the flow of the application.
INFO: Confirms that things are working as expected. Useful for general runtime confirmations and tracking the state of the application.
WARNING: Indicates something unexpected happened or may happen soon, but the software is still functioning. Use this level to log events that might cause problems but are not necessarily errors.
ERROR: Records errors that have occurred, affecting some functionality but not causing the program to terminate. Use this level to log severe issues that prevent certain operations from being carried out.
CRITICAL: Logs severe errors that cause the program to terminate. Use this level for unrecoverable errors that stop the application from running.

Each level has a numeric value (DEBUG=10, INFO=20, WARNING=30, ERROR=40, CRITICAL=50). Setting the logging level to a particular value will capture all logs at that level and above. For example, setting the level to WARNING will capture WARNING, ERROR, and CRITICAL logs, but ignore DEBUG and INFO.

Tricks to use logging into jupyter notebooks

1. Logging in Functions and Classes

Logging can be particularly useful when encapsulated within functions or classes.

1def data_transformation(data):
2    logging.info("Data transformation started.")
3    # Your code here
4    logging.info("Data transformation completed.")
5
6class DataPipeline:
7    def __init__(self):
8        logging.info("DataPipeline initialized")

2. Custom Handlers and Formatters

In Jupyter Notebooks, you might want to display logs in the notebook itself rather than the console. You can achieve this by adding a custom handler.

1from logging import StreamHandler
2
3handler = StreamHandler()
4formatter = logging.Formatter('%(asctime)s - %(levelname)s - %(message)s')
5handler.setFormatter(formatter)
6
7logger = logging.getLogger()
8logger.addHandler(handler)

3. Logging Variable Data

To log variable data, you can use string formatting.

variable = 42
logging.info(f"The answer to the ultimate question is %s.", variable")

4. Avoid duplicated handlers

Running a cell multiple times can add duplicate handlers, leading to repeated log messages. Make sure to remove existing handlers before adding new ones:

1logger = logging.getLogger()
2if logger.hasHandlers():
3    logger.handlers.clear()

5. Use Rich Output

You can use libraries like rich to make the log output more readable and colorful:

from rich.logging import RichHandler
logging.basicConfig(level=logging.INFO, handlers=[RichHandler()])

6. Dynamic Log Level Switching

You can dynamically change the log level without resetting the entire logger. This is useful for debugging specific cells.

logger.setLevel(logging.DEBUG)  # Switch to DEBUG level temporarily

7. Use Context Managers for Temporary Logging Levels

For temporary logging level changes, you can use a context manager to ensure the level reverts back after a specific block of code.

1from contextlib import contextmanager
2
3@contextmanager
4def temporary_log_level(logger, level):
5    old_level = logger.level
6    logger.setLevel(level)
7    yield
8    logger.setLevel(old_level)
9
10with temporary_log_level(logger, logging.DEBUG):
11    # Debug level logs will show here

Best practices to write great log messages

1. Be Descriptive but Concise

Log messages should provide enough context to understand what's happening but be concise enough to not overwhelm the reader. Use clear language that describes the action, state, or condition.

👍 Good: logging.info("Connection to database established.")
‍👎 Bad: logging.info("DB OK.")

2. Include Relevant Variables or Identifiers

When logging events, include any relevant variables, identifiers, or parameters that could be useful for debugging or auditing. Use string formatting to include these in the log message.

👍 Good: logging.info("User %s successfully authenticated.", user_id)
‍👎 Bad: logging.info("Authentication successful.")

3. Choose the Appropriate Log Level

Use the correct log level to indicate the severity or importance of the log message. This helps in filtering logs and understanding the system state quickly.

DEBUG for detailed diagnostic information.
INFO for confirmation of successful operations.
WARNING for unexpected situations that don't cause errors.
ERROR for issues that disrupt normal functionality.
CRITICAL for severe problems that cause program termination.

4. Use Consistent Formatting

Maintain a consistent format for your log messages. This makes it easier to search, filter, and analyze logs. Consistency should apply to the structure of the message, the terminology used, and even the tense.

👍 Good: logging.info("File uploaded: filename={}, size={}KB".format(file_name, file_size))
‍👎 Bad: logging.info("Uploaded file. Name of file is {}. Size is {} kilobytes.".format(file_name, file_size))

5. Avoid Logging Sensitive Information

Be cautious about the data you log. Never log sensitive information like passwords, API keys, or personally identifiable information (PII). This is crucial for security and compliance reasons.

👍 Good: logging.info("User {} requested password reset.".format(user_id))
‍👎 Bad: logging.info("User {} requested password reset. New password is {}.".format(user_id, new_password))

Caveats and solutions in Jupyter Notebooks

Statefulness: Jupyter Notebooks are stateful, which means logging configurations persist across cells. Reset the kernel to clear configurations.
Multiple Handlers: Running a cell multiple times can add duplicate handlers. Make sure to remove existing handlers before adding new ones.
Kernel Restart Required for Global Changes: If you make global changes to the logging configuration and want them to take effect, you may need to restart the Jupyter Notebook kernel, which will also clear all your variables and imports.
Asynchronous Output: Jupyter Notebooks can sometimes produce asynchronous output, making logs appear out of order. This can be confusing when you're trying to debug the sequence of events.

Conclusion

Logging in Jupyter Notebooks is a straightforward yet powerful way to monitor, debug, and audit your data applications. It becomes even more potent when used in a comprehensive platform like MINEO, where Python notebooks serve as the backbone for various data-centric tasks.

By incorporating logging into your notebooks, you can build more robust, maintainable, and transparent data apps.

‍

Happy coding!