Looking at Distributions

Sociology 312/412/512, University of Oregon

Aaron Gullickson

What is a Distribution?

The concept of a distribution

When we refer to the distribution of a variable, we are referring to how the different values of that variable are distributed across the given observations.

Look at it

We can make a plot that shows the distribution.
We make different kinds of plots for categorical and quantitative variables.
- Barplots for categorical variables
- Histograms for quantitative variables

Measure it

We can calculate summary measures of the center and spread of the distribution.
We an only calculate summary measures for quantitative variables.

The Center of a Distribution

Calculating the mean

The mean (represented mathematically as $\bar{x}$ ) is calculated by taking the sum of the variable divided by the number of observations, or in math speak: $\bar{x} = \frac{\sum_{i = 1}^{n} x_{i}}{n}$

😱 Equations??!!

Don’t panic! We will walk through what these symbols mean.

$x_{i}$ : We use a lower-case letter like $x$ or $y$ to refer to a generic variable. The subscript indicates a particular observation. So, $x_{1}$ means the value of variable $x$ for the first observation. The $x_{i}$ subscripts means some generic observation’s value of $x$ .
$n$ : We use $n$ to refer generically to the number of observations. So, $x_{n}$ gives the value of $x$ for the last observation.
We use the $\sum (s o m e t h i n g)$ term to say sum something up. In this case, $\sum_{i = 1}^{n} x_{i}$ means to “sum the variable $x$ from the first observation to the last.”

Calculate the mean in R

$\bar{x} = \frac{\sum_{i = 1}^{n} x_{i}}{n}$

To calculate the mean we just sum up all the values of $x$ and divide by the number of observations. The sum command will sum up a variable and the nrow command will give us the number of observations, so:

sum(movies$runtime)/nrow(movies)

[1] 106.8222

The mean move runtime is 106.8 minutes.

Alternatively, we could just use the mean command in R: 😎

mean(movies$runtime)

[1] 106.8222

Looking at Distributions Sociology 312/412/512, University of Oregon Aaron Gullickson

Looking at Distributions
What is a Distribution?
The concept of a distribution
Calculating frequencies
Proportions and percents
How can we plot the percent?
Constructing a barplot using ggplot
Code and output
Visualize quantitative variables with a histogram
How a histogram is created
Code and output for making a histogram
What are we looking for in a histogram?
The Center of a Distribution
What does “center” mean?
Calculating the mean
Calculate the mean in R
Calculating the median
Why are the mean and median different?
Skewness can create large differences
Modal Categories
Percentiles and the Five-Number Summary
Percentiles/Quantiles
The five-number summary
Anatomy of the boxplot
Code and output for boxplot
Measuring the Spread of a Distribution
Distributions can vary in their spread
Measures of spread
Calculating the standard deviation
Calculating the standard deviation