Transforming Data Summaries: Moving Beyond Pandas Describe with Two Powerful Libraries

Probably the first (or second) thing I do when I load any Pandas or Polars DataFrame is describe it, using the df.describe() method.

However, I always find its output to be pretty naive and almost of no use. In other words, it hardly highlights any key information about the data.

But some time back, I came across two pretty cool libraries that IMMENSELY supercharge this DataFrame summary.

Since then, I don’t think I have ever used the describe() method.

Let’s learn about them!

It is a Jupyter-based tool that provides a standardized and comprehensive data summary.

This includes data shape, column data types, column summary statistics, distribution charts, missing stats, etc., as shown below:

What’s more, the summary is grouped by datatypes for faster analysis.

This is the code to use Skimpy:

SummaryTools

The second one is SummaryTools, which does almost the exact same thing as Skimpy, i.e., it generates a standardized report:

This is the code to use SummaryTools:

Two pretty cool things about SummaryTools are that it can create:

1. A collapsible summary of the dataset, as illustrated below:

2. A tabbed summary of the dataset, as shown below:

Happy Learning!

ABOUT THE AUTHOR

Marketing Analyst, DataMantra