Transforming Data Summaries: Moving Beyond Pandas Describe with Two Powerful Libraries
Probably the first (or second) thing I do when I load any Pandas or Polars DataFrame is describe it, using the df.describe()
method.
However, I always find its output to be pretty naive and almost of no use. In other words, it hardly highlights any key information about the data.
But some time back, I came across two pretty cool libraries that IMMENSELY supercharge this DataFrame summary.
Since then, I don’t think I have ever used the describe()
method.
Let’s learn about them!
The first one is Skimpy
It is a Jupyter-based tool that provides a standardized and comprehensive data summary.
This includes data shape, column data types, column summary statistics, distribution charts, missing stats, etc., as shown below:
What’s more, the summary is grouped by datatypes for faster analysis.
This is the code to use Skimpy:
SummaryTools
The second one is SummaryTools, which does almost the exact same thing as Skimpy, i.e., it generates a standardized report:
This is the code to use SummaryTools:
Two pretty cool things about SummaryTools are that it can create:
1. A collapsible summary of the dataset, as illustrated below:2. A tabbed summary of the dataset, as shown below:
Happy Learning!
ABOUT THE AUTHOR
Harshit Sanwal
Marketing Analyst, DataMantra