

Blog do Marcellini
July 24, 2025
โ Course Index ยท โ Statistics Courses ยท โ Statistics Section

In this part, we will address the assessment of normality in real data, both in a visual way (histograms, QโQ plots) and in a theoretical way (Law of Large Numbers and Central Limit Theorem). Approximate normality is the bridge between descriptive analysis and statistical inference.
๐ Objectives of this post
In this part, based on Levine et al., Statistics for Managers Using Microsoft Excel, we explore when and how the normal distribution can be used as a valid approximation in real-world contexts.
๐ฏ Objectives:
๐ง Letโs deepen our understanding!
We call approximately normal distributions those variables that, even without following the normal curve exactly, present enough characteristics for statistical methods based on normality to be applied.
๐ Main characteristics:
โ Important notes:
๐ Typical examples:
โ Approximately normal variables:
โ Non-normally distributed variables:
๐ Note: Some variables can approach normality after transformations, such as logarithm or square root.
Examples of variables with approximately normal distribution:
โ Important: Even if the actual distribution is not perfectly normal, a normal approximation is often sufficient for practical applications and statistical inference.
๐ The QโQ Plot (QuantileโQuantile Plot) is a chart used to compare the distribution of sample data with a theoretical distribution โ usually the normal.
๐ฏ Objectives:
๐ How to interpret:
๐ Note: The QโQ Plot is especially useful with large samples, since small imperfections are expected and do not compromise the overall interpretation.
๐ Situation:
๐ Chart:

Chart generated in R from 200 simulated observations of \(X \sim \mathcal N(170,\,8^2)\).
๐ Interpretation: The histogram shows a bell shape, symmetric around the mean. Small variations are expected, but the approximation to the normal distribution is very good.
๐ Situation: The same sample of 200 adult heights (\(\mu=170,\; \sigma=8\)) was used to build the QโQ plot.
๐ Chart:

Chart generated in R with 200 simulated observations of \(X \sim \mathcal N(170,\,8^2)\).
๐ Interpretation:
๐ฏ Objective: Generate a sample of heights and visualize the histogram and the QโQ plot directly in RStudio.


๐ Note: The code generates the plots directly in the RStudio window.
๐ฏ Objective: Build the Histogram and the QโQ Plot of the height sample using RStudio.
๐ (1) Generate the sample:
rnorm() to create random data from a normal distribution.๐ (2) Build the Histogram:
hist() function to visualize the data distribution.๐ (3) Build the QโQ Plot:
โ Important: Visualize and interpret the plots on screen before applying statistical methods!
Before applying any statistical technique, it is essential to explore the data visually. Plots such as histograms and QโQ plots help verify fundamental assumptions, like normality, the presence of outliers, and symmetry of the distribution.
Applying statistical tests without this prior check may lead to misleading or statistically invalid conclusions. Visualization allows you to detect patterns, deviations, and anomalies that numbers alone may not revealโtherefore, it is a critical step in the data analysis workflow.
๐ฏ Objective: Apply what youโve learned to generate new plots in RStudio.
๐ง ๐ Task:
๐ (1) Generate a new sample of 200 normally distributed observations with:
๐ (2) Build:
๐ (3) Compare visually:
๐ก Hint: Use the same functions: rnorm(), hist(), qqnorm(), qqline().
๐งโ๐ป R Code:


๐ Interpretation: The new data also approximately follow a normal distribution.
๐ฏ Objective: Build the Histogram and the QโQ Plot of the height sample using Excel.
๐ Histogram in Excel:
Enter the sample data in a column.
Select the data.
Go to Insert โ Statistical Charts โ Histogram.
Adjust the number of bins as needed.
๐ QโQ Plot in Excel:
Sort the sample data (ascending).
Compute the theoretical quantile positions: =NORM.INV((ROW()-0.5)/Total, Mean, StdDev) (Tip: you can obtain Mean and StdDev from the data using AVERAGE(range) and STDEV.S(range).)
Build an XY (Scatter) plot of sample data vs. theoretical quantiles.
Add a linear trendline for reference.
๐ Note: The QโQ Plot is manual in Excel, but easy to build!
Large samples tend to reflect the true population mean.
Variability decreases as we increase the sample size.
๐ Summary: LLN ensures that sample means approach the population mean.
The mean of large samples tends to follow a normal distribution.
Regardless of the original distribution!
Conclusion: The CLT is the theoretical basis for using the normal distribution in practice.
\(\sigma\) small \(\rightarrow\) narrower curve.
\(\sigma\) large \(\rightarrow\) flatter curve.

A normal curve with larger \(\sigma\) is narrower? (T or F)
According to the LLN, small samples already reflect the true mean? (T or F)
The CLT explains the prevalence of normality? (T or F)
A normal curve with larger \(\sigma\) is narrower? (F)
According to the LLN, small samples already reflect the true mean? (F)
The CLT explains the prevalence of normality? (T)
๐ Note: Understanding normality is essential to correctly apply statistical tests and make data-driven decisions!
In this final part of the course, you learned:
๐ฏ Part 1: Introduction to the Normal Distribution
๐ฏ Part 2: z-Score and z-Table
๐ฏ Part 3: Plots, CLT, and Approximate Normality (๐ you are here!)
โ Course Index ยท โ Statistics Courses ยท โ Statistics Section
๐ Back to Top
Blog do Marcellini โ Exploring Statistics with Rigor and Beauty.
๐ Created by Blog do Marcellini with โค๏ธ and code.