Transforming Data Summaries: Moving Beyond Pandas Describe with Two Powerful Libraries

Probably the first (or second) thing I do when I load any Pandas or Polars DataFrame is describe it, using the df.describe() method.

However, I always find its output to be pretty naive and almost of no use. In other words, it hardly highlights any key information about the data.

Source: Daily Dose of Data Science

But some time back, I came across two pretty cool libraries that IMMENSELY supercharge this DataFrame summary.

Since then, I don’t think I have ever used the describe() method.

Let’s learn about them!

The first one is Skimpy

It is a Jupyter-based tool that provides a standardized and comprehensive data summary.

This includes data shape, column data types, column summary statistics, distribution charts, missing stats, etc., as shown below:

Source: Daily Dose of Data Science

What’s more, the summary is grouped by datatypes for faster analysis.

This is the code to use Skimpy:

Source: Daily Dose of Data Science

SummaryTools

The second one is SummaryTools, which does almost the exact same thing as Skimpy, i.e., it generates a standardized report:

Source: Daily Dose of Data Science

This is the code to use SummaryTools:

Source: Daily Dose of Data Science

Two pretty cool things about SummaryTools are that it can create:

1. A collapsible summary of the dataset, as illustrated below:
source; Daily Dose of Data Science

2. A tabbed summary of the dataset, as shown below:

source: Daily Dose of Data Science

Happy Learning!

ABOUT THE AUTHOR

Harshit Sanwal

Harshit Sanwal

Marketing Analyst, DataMantra