Transforming Data Summaries: Moving Beyond Pandas Describe with Two Powerful Libraries

Probably the first (or second) thing I do when I load any Pandas or Polars DataFrame is describe it, using the df.describe()
method.
However, I always find its output to be pretty naive and almost of no use. In other words, it hardly highlights any key information about the data.

But some time back, I came across two pretty cool libraries that IMMENSELY supercharge this DataFrame summary.
Since then, I don’t think I have ever used the describe()
method.
Let’s learn about them!
The first one is Skimpy
It is a Jupyter-based tool that provides a standardized and comprehensive data summary.
This includes data shape, column data types, column summary statistics, distribution charts, missing stats, etc., as shown below:

What’s more, the summary is grouped by datatypes for faster analysis.
This is the code to use Skimpy:

SummaryTools
The second one is SummaryTools, which does almost the exact same thing as Skimpy, i.e., it generates a standardized report:

This is the code to use SummaryTools:

Two pretty cool things about SummaryTools are that it can create:
1. A collapsible summary of the dataset, as illustrated below:
2. A tabbed summary of the dataset, as shown below:

Happy Learning!
ABOUT THE AUTHOR

Harshit Sanwal
Marketing Analyst, DataMantra