Range — A Measure of Spread

Lets see how scattered our data is ?

Range — A Measure of Spread

The range is a simple measure of spread or dispersion in statistics, representing the difference between the maximum and minimum values in a dataset. It provides an indication of how spread out the values are.

How to Calculate the Range

  1. Identify the Maximum Value: Find the highest value in the dataset.
  2. Identify the Minimum Value: Find the lowest value in the dataset.
  3. Subtract the Minimum from the Maximum: The range is the result of this subtraction.

Range=Maximum Value−Minimum Value

Example:-

Let’s Consider the following dataset: 3,7,8,15,203, 7, 8, 15, 203,7,8,15,20

  1. Maximum Value: 20
  2. Minimum Value: 3
  3. Range: 20−3= 17

Also, What the range provides is a quick and rough estimate of the spread of data values within a set.

Let’s say, we have below data.

Since the range of Class A is smaller than in Class B, can we claim that the age distribution in Class A is more clustered (closely related) than in Class B ?

In other words, are the ages listed in Class A more uniform than in Class B ?

Not so fast ! This is, in fact, the biggest limitation of using the range to describe the spread of data within a set.

The reason is that it can drastically be affected by outliers (values that are not typical as compared to the rest of the elements in the set).

If we disregard the outliers in Class B (ages 11 and 18), the “new” range becomes…

This is now equal to the range of Class A. So the “big take” from this example is to be very careful when interpreting the values of the range, especially when comparing two sets.

Let’s see some more use cases of the Range in data analysis:-

     1. Quick Summary of Spread: The range gives a quick sense of the spread of the data.

  • Example: In a classroom, knowing that the range of test scores is 40 (with a minimum score of 50 and a maximum of 90) quickly indicates the spread of scores.

2. Initial Data Analysis: The range can be used in exploratory data analysis to get an initial sense of the variability.

  • Example: In quality control, the range of measurements can quickly indicate if a process is producing items within acceptable limits. If you have worked in the manufacturing sector, by computing the range the manufacturers can ensure consistency and adherence to specifications, and control charts are often used in the same. UCL & LCL Limit

3. Comparing Variability: When comparing datasets, the range can give an idea of which dataset has more variability.

  • Example: Comparing the range of daily temperatures in two different cities can indicate which city has more variable weather.

My Favourite one now,

4) Finance: Range plays a significant role in financial analysis and risk management. For instance, in stock market analysis, calculating the range of price fluctuations over a specific period helps investors assess the volatility of a particular stock.

A wider range indicates higher volatility, which may imply greater risk but also potential for higher returns. Traders often use range-based indicators like average True range (ATR) to determine stop-loss levels or identify potential entry and exit points.

Limitations of the Range

  • Ignores Intermediate Values: The range does not consider any data points other than the minimum and maximum, so it ignores the distribution of the rest of the values.
  • Sensitive to Outliers: A single extreme value can significantly alter the range, making it less reliable for datasets with outliers.
  • Limited Information: The range does not provide information about the distribution of the values within the dataset.

To be honest they are not oftenly used in Data Science because there are other smart measures like Variance and Standard Deviation who works most of the time.

Bonus Tip:-

What is the Rule of Thumb?

As per the rule of thumb, the range of data lies within four standard deviations. Two standard deviations above the mean and two standard deviations below the mean.

Python Code for Calculating Range:-

import pandas as pd
# Example dataset
data = {'values': [3, 7, 8, 15, 20]}
df = pd.DataFrame(data)
# Calculate the range
data_range = df['values'].max() - df['values'].min()
print("Range:", data_range)
def calculate_range(data):
max_value = max(data)
min_value = min(data)
return max_value - min_value
# Example dataset
data = [3, 7, 8, 15, 20]
data_range = calculate_range(data)
print("Range:", data_range)

About the Author

                                     Feel Free to reach out to tarunsachdeva7997@gmail.com