Statistics for Beginners

Q: What topics are covered in intro to statistics?

A typical intro stats course covers: types of data, descriptive statistics (mean, median, standard deviation), data visualization (histograms, boxplots, scatterplots), probability basics, normal distribution, sampling distributions, confidence intervals, hypothesis testing (z-tests, t-tests, chi-square), correlation, and simple linear regression.

Q: Is statistics harder than calculus?

Statistics and calculus are hard in different ways. Calculus is procedural - learn the rules, apply them. Statistics is conceptual - you need to understand what tests mean, when to use them, and how to interpret results. Many students find stats harder because there's more judgment involved and fewer clear-cut 'just follow the formula' problems. But stats requires less raw math ability than calculus.

Q: How do I study for a statistics exam?

Focus on understanding concepts, not memorizing formulas. Practice interpreting results - what does a p-value of 0.03 mean in context? Work through practice problems that require choosing the right test, not just computing. Make a decision tree for which test to use when. Do past exams under timed conditions.

Q: What is the difference between a population and a sample?

A population is the entire group you want to study (e.g., all UTSC students). A sample is a subset of that population (e.g., 200 randomly selected UTSC students). We use samples because studying entire populations is usually impossible or impractical. Statistics gives us tools to make inferences about populations based on samples.

Q: What is a p-value in simple terms?

A p-value is the probability of getting results as extreme as yours (or more extreme) if the null hypothesis were true. A small p-value (typically < 0.05) means your results would be unlikely under the null hypothesis, so you reject it. A p-value is NOT the probability that the null hypothesis is true - this is a common misconception.

Contents

1. Statistical Thinking 2. Descriptive Statistics 3. Data Visualization 4. Probability Basics 5. Distributions 6. Sampling & Estimation 7. Hypothesis Testing 8. Correlation & Regression 9. Choosing the Right Test 10. Common Mistakes 11. FAQ

Statistical Thinking

Statistics is about making decisions under uncertainty. You have incomplete information (a sample) and need to draw conclusions about something bigger (a population). The whole field exists because we can't study everything - we have to work with what we can observe.

Key Vocabulary

Population: The entire group you're interested in (all UTSC students, all Canadians, all widgets from a factory)
Sample: A subset of the population that you actually observe
Parameter: A number that describes the population (e.g., true mean height of all Canadians). Usually unknown.
Statistic: A number calculated from your sample (e.g., average height of 200 surveyed Canadians). Used to estimate parameters.
Variable: A characteristic being measured (height, grade, opinion)

Types of Data

Type	Description	Examples
Categorical (Nominal)	Categories with no order	Eye color, major, country
Categorical (Ordinal)	Categories with a meaningful order	Letter grades (A, B, C), survey ratings (strongly agree → disagree)
Numerical (Discrete)	Countable numbers	Number of courses, number of siblings
Numerical (Continuous)	Measurable on a scale	Height, weight, GPA, temperature

Why This Matters

The type of data determines which statistical methods you can use. You can't calculate a mean for eye color (nominal). You can't do a t-test on letter grades (ordinal). Getting this wrong is the first mistake students make on exams.

Descriptive Statistics

Descriptive statistics summarize your data. They answer: "What does this dataset look like?"

Measures of Center

Mean (x̄): Sum of all values divided by the count. Sensitive to outliers.
Median: The middle value when data is sorted. Resistant to outliers.
Mode: The most frequent value. Can have multiple modes or none.

Mean: x̄ = (Σx_i) / n
Median: middle value of sorted data (or average of two middle values if n is even)

When to Use Which

Mean when data is roughly symmetric with no extreme outliers (test scores, heights). Median when data is skewed or has outliers (income, house prices). If someone earns $10M and 9 people earn $50K, the mean salary is $1.05M - misleading. The median is $50K - accurate.

Measures of Spread

Range: Max - Min. Simple but heavily affected by outliers.
IQR (Interquartile Range): Q3 - Q1. The range of the middle 50% of data. Resistant to outliers.
Variance (s²): Average of squared deviations from the mean. Hard to interpret directly (units are squared).
Standard Deviation (s): Square root of variance. In the same units as the data. Most commonly used.

Variance: s² = Σ(x_i - x̄)² / (n - 1)
Standard Deviation: s = √s²

Why n-1? This is called Bessel's correction. We divide by n-1 (not n) when calculating sample variance because a sample tends to underestimate population variability. Dividing by n-1 corrects this bias. On exams, always use n-1 for samples.

Percentiles and Quartiles

Q1 (25th percentile): 25% of data falls below this value
Q2 (50th percentile): The median
Q3 (75th percentile): 75% of data falls below this value
IQR = Q3 - Q1
Outlier rule: Values below Q1 - 1.5×IQR or above Q3 + 1.5×IQR are considered outliers

Data Visualization

Different chart types serve different purposes. Choosing the wrong one misrepresents your data.

Chart	Best For	Data Type
Histogram	Distribution shape of one numerical variable	Continuous
Boxplot	Comparing distributions, spotting outliers	Continuous
Bar chart	Comparing counts/proportions across categories	Categorical
Scatterplot	Relationship between two numerical variables	Two continuous
Line chart	Trends over time	Time series
Pie chart	Parts of a whole (use sparingly)	Categorical

Reading a Boxplot

Boxplots show five-number summary at a glance: minimum, Q1, median, Q3, maximum. The "box" spans from Q1 to Q3 (the IQR). The line inside is the median. "Whiskers" extend to the most extreme non-outlier values. Individual dots beyond the whiskers are outliers.

Distribution Shapes

Symmetric: Mean ≈ Median. Bell-shaped (normal) or uniform.
Right-skewed: Long tail to the right. Mean > Median. (Income, house prices)
Left-skewed: Long tail to the left. Mean < Median. (Age at retirement, exam scores with easy test)
Bimodal: Two peaks. May indicate two groups mixed together.

Probability Basics

Probability is the language of uncertainty. It tells you how likely events are, which is the foundation for everything in inferential statistics.

Rules

P(A) is always between 0 and 1 (or 0% to 100%)
Complement rule: P(not A) = 1 - P(A)
Addition rule: P(A or B) = P(A) + P(B) - P(A and B)
Multiplication rule (independent): P(A and B) = P(A) × P(B)
Conditional probability: P(A|B) = P(A and B) / P(B)

Bayes' Theorem: P(A|B) = P(B|A) × P(A) / P(B)

Independence vs. Mutual Exclusivity

These are different concepts that students constantly confuse. Independent: knowing A happened doesn't change the probability of B (coin flips). Mutually exclusive: A and B can't both happen (rolling a 3 and a 5 on one die). If events are mutually exclusive, they CANNOT be independent (unless one has probability 0).

Counting

Permutations (order matters): P(n,r) = n! / (n-r)!
Combinations (order doesn't matter): C(n,r) = n! / (r!(n-r)!)
Quick test: Is "ABC" different from "CBA"? If yes → permutation. If no → combination.

Distributions

Normal Distribution

The most important distribution in statistics. Bell-shaped, symmetric, defined by mean (μ) and standard deviation (σ).

Empirical Rule (68-95-99.7):
68% of data within 1σ of the mean
95% of data within 2σ of the mean
99.7% of data within 3σ of the mean

Z-Scores

A z-score tells you how many standard deviations a value is from the mean.

z = (x - μ) / σ

z = 0 means the value equals the mean. z = 2 means 2 standard deviations above the mean. Use z-tables or calculators to find probabilities.

Interpreting Z-Scores

If your exam score has z = 1.5, you scored 1.5 standard deviations above the class average - roughly better than 93% of students. Z-scores let you compare across different scales: a z = 1.5 in chemistry is "the same amount of above-average" as z = 1.5 in physics, even if the raw scores look completely different.

Other Common Distributions

Distribution	When It Appears	Key Feature
Binomial	Fixed number of yes/no trials	n trials, probability p each
Poisson	Counting events in a fixed interval	Rare events, rate λ
t-distribution	Small samples, unknown σ	Like normal but heavier tails
Chi-square (χ²)	Categorical data, goodness-of-fit	Always right-skewed, df determines shape
F-distribution	Comparing variances, ANOVA	Ratio of two chi-squares

Binomial Distribution

P(X = k) = C(n,k) × p^k × (1-p)^n-k
Mean: μ = np Standard deviation: σ = √(np(1-p))

Requirements: Fixed number of trials (n), each trial is independent, each trial has same probability of success (p), outcomes are binary (success/failure).

Sampling & Estimation

Sampling Distributions

If you take many samples from a population and compute the mean of each sample, those means form a sampling distribution. This is the key idea behind all of inferential statistics.

Central Limit Theorem (CLT)

The most important theorem in statistics:

For large enough n (usually n ≥ 30), the sampling distribution of x̄ is approximately normal,
with mean μ and standard error σ/√n - regardless of the population's shape.

Why CLT Matters

The CLT is why we can use normal-distribution-based methods (z-tests, confidence intervals) even when the population isn't normal. As long as your sample is large enough, the sample mean is approximately normally distributed. This single theorem powers most of the hypothesis testing you'll learn.

Confidence Intervals

A confidence interval gives a range of plausible values for a population parameter.

CI = point estimate ± margin of error
For a mean: x̄ ± z* × (s/√n) or x̄ ± t* × (s/√n)

Interpretation: "We are 95% confident that the true population mean falls between [lower, upper]."

Common misconception: A 95% CI does NOT mean there's a 95% probability the parameter is in this interval. The parameter is fixed - it's either in or out. The 95% refers to the method: if we repeated this process many times, 95% of our intervals would contain the true value.

Margin of Error

Margin of error depends on three things:

Confidence level: Higher confidence → wider interval (99% CI is wider than 95% CI)
Sample size: Larger n → narrower interval (more data = more precision)
Variability: More spread in data → wider interval

Hypothesis Testing

This is where most students get confused. Hypothesis testing is a structured way to decide if your data provides enough evidence against a claim.

The Framework

State hypotheses:
- H₀ (null): The "nothing interesting is happening" claim. Always includes = sign.
- H₁ (alternative): What you're trying to show. Can be ≠, <, or >.
Choose significance level (α): Usually 0.05. This is your threshold for "unlikely enough to reject H₀."
Calculate test statistic: Measures how far your sample result is from what H₀ predicts.
Find p-value: Probability of getting a test statistic this extreme (or more) if H₀ is true.
Make a decision: If p-value ≤ α → reject H₀. If p-value > α → fail to reject H₀.

Critical Language

You never "accept" the null hypothesis - you "fail to reject" it. This matters because absence of evidence is not evidence of absence. Also: "statistically significant" means p ≤ α. It does NOT mean "important" or "large effect." A tiny, meaningless difference can be statistically significant with a large enough sample.

Types of Errors

	H₀ True	H₀ False
Reject H₀	Type I Error (α) - false alarm	Correct decision (Power)
Fail to reject H₀	Correct decision	Type II Error (β) - missed signal

Type I (α): Concluding there's an effect when there isn't one. Like a fire alarm going off with no fire.

Type II (β): Missing a real effect. Like a fire alarm NOT going off when there IS a fire.

Power = 1 - β: The probability of correctly detecting a real effect. Higher power is better. Increase power by: increasing sample size, increasing α, or when the true effect is larger.

Common Tests

Test	What It Tests	Requirements
One-sample z-test	Is the population mean equal to μ₀?	σ known, large n or normal population
One-sample t-test	Is the population mean equal to μ₀?	σ unknown, roughly normal or large n
Two-sample t-test	Are two population means equal?	Independent samples, roughly normal or large n
Paired t-test	Is the mean difference zero?	Paired/matched observations
Chi-square goodness-of-fit	Does data fit an expected distribution?	Categorical data, expected counts ≥ 5
Chi-square test of independence	Are two categorical variables related?	Two categorical variables, expected counts ≥ 5
One-sample z-test for proportion	Is the population proportion equal to p₀?	np₀ ≥ 10 and n(1-p₀) ≥ 10

Test Statistic Formulas

z-test (mean): z = (x̄ - μ₀) / (σ/√n)

t-test (mean): t = (x̄ - μ₀) / (s/√n), df = n - 1

z-test (proportion): z = (p̂ - p₀) / √(p₀(1-p₀)/n)

Chi-square: χ² = Σ (O - E)² / E

Correlation & Regression

Correlation (r)

Measures the strength and direction of a linear relationship between two numerical variables.

r = 1: perfect positive linear relationship
r = -1: perfect negative linear relationship
r = 0: no linear relationship (but there could still be a nonlinear one!)
|r| > 0.7: strong, 0.3-0.7: moderate, < 0.3: weak

Correlation ≠ Causation

This gets repeated a lot, but it's genuinely the most important thing in statistics. Ice cream sales and drowning deaths are strongly correlated - but ice cream doesn't cause drowning. Both increase in summer (confounding variable). Always ask: could a third variable explain this relationship?

Simple Linear Regression

ŷ = b₀ + b₁x

b₁ = r × (s_y/s_x) b₀ = ȳ - b₁x̄

b₁ (slope): For each 1-unit increase in x, ŷ changes by b₁ units
b₀ (intercept): The predicted value of y when x = 0 (may not be meaningful)
ŷ (y-hat): The predicted value - NOT the actual observed value
Residual: y - ŷ (actual minus predicted). Positive = model underpredicted.

R² (Coefficient of Determination)

R² = r². It tells you what percentage of the variation in y is explained by x.

Example: If r = 0.8, then R² = 0.64, meaning 64% of the variation in y is explained by the linear relationship with x. The other 36% is unexplained (other factors, randomness).

Checking Regression Assumptions

Linearity: Scatterplot should show a roughly linear pattern
Independence: Observations are independent (no time-series autocorrelation)
Normal residuals: Residuals should be roughly normally distributed
Equal variance (homoscedasticity): Residuals should have constant spread across all x values

Check these by plotting residuals vs. fitted values. If you see a pattern (funnel shape, curve), the assumptions are violated.

Choosing the Right Test

This is the skill that separates students who get A's from students who memorize formulas. On exams, you'll often need to identify which test to use from a word problem.

Decision Guide

What are you comparing?
- One group vs. a known value → one-sample test
- Two independent groups → two-sample test
- Same group measured twice → paired test
- Relationship between variables → correlation/regression
What type of data?
- Numerical (means) → z-test or t-test
- Categorical (counts/proportions) → chi-square or z-test for proportions
Do you know σ?
- Yes → z-test (rare in practice)
- No → t-test (almost always)

Exam Strategy

Look for keywords in the problem. "Proportion" or "percentage" → z-test for proportions. "Average" or "mean" → t-test. "Relationship between" → correlation/regression. "Categories" or "counts" → chi-square. "Before and after" or "same subjects" → paired t-test.

Common Mistakes

Confusing p-value with probability of H₀ - p-value is P(data | H₀), not P(H₀ | data). This is the single most common misconception in statistics.
"Accepting" the null hypothesis - You never accept H₀. You "fail to reject" it. Not finding evidence against something is not the same as proving it's true.
Using mean for skewed data - Median is better for skewed distributions. Report both and explain which is more appropriate.
Confusing correlation with causation - Always consider confounding variables and study design (observational vs. experimental).
Forgetting to check conditions - Every test has assumptions (normality, independence, sample size). State and verify them.
Wrong test choice - Using a z-test when σ is unknown (use t-test), or using a two-sample test when data is paired.
Misinterpreting confidence intervals - "95% confident" refers to the method, not the probability that this specific interval contains the parameter.
Using n instead of n-1 for sample variance - Always use n-1 (Bessel's correction) when working with samples.
Extrapolating beyond the data - A regression line for study hours (1-8) tells you nothing about what happens at 20 hours.
Ignoring practical significance - A result can be statistically significant but practically meaningless. A drug that lowers blood pressure by 0.1 mmHg might have p = 0.01 with 100,000 subjects, but the effect is clinically worthless.

Struggling with statistics?

Koa's AI tutor explains concepts in plain language, creates practice problems, and adapts to how you learn.

Try Koa Free

Frequently Asked Questions

What topics are covered in intro to statistics?

A typical intro stats course covers: types of data, descriptive statistics (mean, median, standard deviation), data visualization (histograms, boxplots, scatterplots), probability basics, normal distribution, sampling distributions, confidence intervals, hypothesis testing (z-tests, t-tests, chi-square), correlation, and simple linear regression.

Is statistics harder than calculus?

Statistics and calculus are hard in different ways. Calculus is procedural - learn the rules, apply them. Statistics is conceptual - you need to understand what tests mean, when to use them, and how to interpret results. Many students find stats harder because there's more judgment involved. But stats requires less raw math ability than calculus.

How do I study for a statistics exam?

Focus on understanding concepts, not memorizing formulas. Practice interpreting results - what does a p-value of 0.03 mean in context? Work through practice problems that require choosing the right test, not just computing. Make a decision tree for which test to use when. Do past exams under timed conditions.

What is a p-value in simple terms?

A p-value is the probability of getting results as extreme as yours (or more extreme) if the null hypothesis were true. A small p-value (typically < 0.05) means your results would be unlikely under the null hypothesis, so you reject it. A p-value is NOT the probability that the null hypothesis is true.

What is the difference between a population and a sample?

A population is the entire group you want to study (e.g., all UTSC students). A sample is a subset that you actually observe (e.g., 200 randomly selected UTSC students). We use samples because studying entire populations is usually impossible. Statistics gives us tools to make inferences about populations based on samples.

Do I need to memorize all the formulas?

Most stats exams provide a formula sheet. Your job isn't to memorize formulas - it's to know which formula to use and how to interpret the result. That said, you should be able to write z = (x̄ - μ₀)/(σ/√n) and the confidence interval formula from memory, since these come up constantly. Focus on understanding over memorization.

Statistical Thinking

Key Vocabulary

Types of Data

Descriptive Statistics

Measures of Center

Measures of Spread

Percentiles and Quartiles

Data Visualization

Reading a Boxplot

Distribution Shapes

Probability Basics

Rules

Counting

Distributions

Normal Distribution

Z-Scores

Other Common Distributions

Binomial Distribution

Sampling & Estimation

Sampling Distributions

Central Limit Theorem (CLT)

Confidence Intervals

Margin of Error

Hypothesis Testing

The Framework

Types of Errors

Common Tests

Test Statistic Formulas

Correlation & Regression

Correlation (r)

Simple Linear Regression

R² (Coefficient of Determination)

Checking Regression Assumptions

Choosing the Right Test

Decision Guide

Common Mistakes

Struggling with statistics?

Frequently Asked Questions

More Study Resources