Statistics for Beginners

A complete guide to intro statistics for university students. From descriptive stats to hypothesis testing - everything you need for your Stats 101 course.

Contents
1. Statistical Thinking 2. Descriptive Statistics 3. Data Visualization 4. Probability Basics 5. Distributions 6. Sampling & Estimation 7. Hypothesis Testing 8. Correlation & Regression 9. Choosing the Right Test 10. Common Mistakes 11. FAQ

Statistical Thinking

Statistics is about making decisions under uncertainty. You have incomplete information (a sample) and need to draw conclusions about something bigger (a population). The whole field exists because we can't study everything - we have to work with what we can observe.

Key Vocabulary

Types of Data

TypeDescriptionExamples
Categorical (Nominal)Categories with no orderEye color, major, country
Categorical (Ordinal)Categories with a meaningful orderLetter grades (A, B, C), survey ratings (strongly agree → disagree)
Numerical (Discrete)Countable numbersNumber of courses, number of siblings
Numerical (Continuous)Measurable on a scaleHeight, weight, GPA, temperature
Why This Matters

The type of data determines which statistical methods you can use. You can't calculate a mean for eye color (nominal). You can't do a t-test on letter grades (ordinal). Getting this wrong is the first mistake students make on exams.

Descriptive Statistics

Descriptive statistics summarize your data. They answer: "What does this dataset look like?"

Measures of Center

Mean: x̄ = (Σxi) / n
Median: middle value of sorted data (or average of two middle values if n is even)
When to Use Which

Mean when data is roughly symmetric with no extreme outliers (test scores, heights). Median when data is skewed or has outliers (income, house prices). If someone earns $10M and 9 people earn $50K, the mean salary is $1.05M - misleading. The median is $50K - accurate.

Measures of Spread

Variance: s² = Σ(xi - x̄)² / (n - 1)
Standard Deviation: s = √s²

Why n-1? This is called Bessel's correction. We divide by n-1 (not n) when calculating sample variance because a sample tends to underestimate population variability. Dividing by n-1 corrects this bias. On exams, always use n-1 for samples.

Percentiles and Quartiles

Data Visualization

Different chart types serve different purposes. Choosing the wrong one misrepresents your data.

ChartBest ForData Type
HistogramDistribution shape of one numerical variableContinuous
BoxplotComparing distributions, spotting outliersContinuous
Bar chartComparing counts/proportions across categoriesCategorical
ScatterplotRelationship between two numerical variablesTwo continuous
Line chartTrends over timeTime series
Pie chartParts of a whole (use sparingly)Categorical

Reading a Boxplot

Boxplots show five-number summary at a glance: minimum, Q1, median, Q3, maximum. The "box" spans from Q1 to Q3 (the IQR). The line inside is the median. "Whiskers" extend to the most extreme non-outlier values. Individual dots beyond the whiskers are outliers.

Distribution Shapes

Probability Basics

Probability is the language of uncertainty. It tells you how likely events are, which is the foundation for everything in inferential statistics.

Rules

Bayes' Theorem: P(A|B) = P(B|A) × P(A) / P(B)
Independence vs. Mutual Exclusivity

These are different concepts that students constantly confuse. Independent: knowing A happened doesn't change the probability of B (coin flips). Mutually exclusive: A and B can't both happen (rolling a 3 and a 5 on one die). If events are mutually exclusive, they CANNOT be independent (unless one has probability 0).

Counting

Distributions

Normal Distribution

The most important distribution in statistics. Bell-shaped, symmetric, defined by mean (μ) and standard deviation (σ).

Empirical Rule (68-95-99.7):
68% of data within 1σ of the mean
95% of data within 2σ of the mean
99.7% of data within 3σ of the mean

Z-Scores

A z-score tells you how many standard deviations a value is from the mean.

z = (x - μ) / σ

z = 0 means the value equals the mean. z = 2 means 2 standard deviations above the mean. Use z-tables or calculators to find probabilities.

Interpreting Z-Scores

If your exam score has z = 1.5, you scored 1.5 standard deviations above the class average - roughly better than 93% of students. Z-scores let you compare across different scales: a z = 1.5 in chemistry is "the same amount of above-average" as z = 1.5 in physics, even if the raw scores look completely different.

Other Common Distributions

DistributionWhen It AppearsKey Feature
BinomialFixed number of yes/no trialsn trials, probability p each
PoissonCounting events in a fixed intervalRare events, rate λ
t-distributionSmall samples, unknown σLike normal but heavier tails
Chi-square (χ²)Categorical data, goodness-of-fitAlways right-skewed, df determines shape
F-distributionComparing variances, ANOVARatio of two chi-squares

Binomial Distribution

P(X = k) = C(n,k) × pk × (1-p)n-k
Mean: μ = np    Standard deviation: σ = √(np(1-p))

Requirements: Fixed number of trials (n), each trial is independent, each trial has same probability of success (p), outcomes are binary (success/failure).

Sampling & Estimation

Sampling Distributions

If you take many samples from a population and compute the mean of each sample, those means form a sampling distribution. This is the key idea behind all of inferential statistics.

Central Limit Theorem (CLT)

The most important theorem in statistics:

For large enough n (usually n ≥ 30), the sampling distribution of x̄ is approximately normal,
with mean μ and standard error σ/√n - regardless of the population's shape.
Why CLT Matters

The CLT is why we can use normal-distribution-based methods (z-tests, confidence intervals) even when the population isn't normal. As long as your sample is large enough, the sample mean is approximately normally distributed. This single theorem powers most of the hypothesis testing you'll learn.

Confidence Intervals

A confidence interval gives a range of plausible values for a population parameter.

CI = point estimate ± margin of error
For a mean: x̄ ± z* × (s/√n)   or   x̄ ± t* × (s/√n)

Interpretation: "We are 95% confident that the true population mean falls between [lower, upper]."

Common misconception: A 95% CI does NOT mean there's a 95% probability the parameter is in this interval. The parameter is fixed - it's either in or out. The 95% refers to the method: if we repeated this process many times, 95% of our intervals would contain the true value.

Margin of Error

Margin of error depends on three things:

  1. Confidence level: Higher confidence → wider interval (99% CI is wider than 95% CI)
  2. Sample size: Larger n → narrower interval (more data = more precision)
  3. Variability: More spread in data → wider interval

Hypothesis Testing

This is where most students get confused. Hypothesis testing is a structured way to decide if your data provides enough evidence against a claim.

The Framework

  1. State hypotheses:
    • H₀ (null): The "nothing interesting is happening" claim. Always includes = sign.
    • H₁ (alternative): What you're trying to show. Can be ≠, <, or >.
  2. Choose significance level (α): Usually 0.05. This is your threshold for "unlikely enough to reject H₀."
  3. Calculate test statistic: Measures how far your sample result is from what H₀ predicts.
  4. Find p-value: Probability of getting a test statistic this extreme (or more) if H₀ is true.
  5. Make a decision: If p-value ≤ α → reject H₀. If p-value > α → fail to reject H₀.
Critical Language

You never "accept" the null hypothesis - you "fail to reject" it. This matters because absence of evidence is not evidence of absence. Also: "statistically significant" means p ≤ α. It does NOT mean "important" or "large effect." A tiny, meaningless difference can be statistically significant with a large enough sample.

Types of Errors

H₀ TrueH₀ False
Reject H₀Type I Error (α) - false alarmCorrect decision (Power)
Fail to reject H₀Correct decisionType II Error (β) - missed signal

Type I (α): Concluding there's an effect when there isn't one. Like a fire alarm going off with no fire.

Type II (β): Missing a real effect. Like a fire alarm NOT going off when there IS a fire.

Power = 1 - β: The probability of correctly detecting a real effect. Higher power is better. Increase power by: increasing sample size, increasing α, or when the true effect is larger.

Common Tests

TestWhat It TestsRequirements
One-sample z-testIs the population mean equal to μ₀?σ known, large n or normal population
One-sample t-testIs the population mean equal to μ₀?σ unknown, roughly normal or large n
Two-sample t-testAre two population means equal?Independent samples, roughly normal or large n
Paired t-testIs the mean difference zero?Paired/matched observations
Chi-square goodness-of-fitDoes data fit an expected distribution?Categorical data, expected counts ≥ 5
Chi-square test of independenceAre two categorical variables related?Two categorical variables, expected counts ≥ 5
One-sample z-test for proportionIs the population proportion equal to p₀?np₀ ≥ 10 and n(1-p₀) ≥ 10

Test Statistic Formulas

z-test (mean): z = (x̄ - μ₀) / (σ/√n)

t-test (mean): t = (x̄ - μ₀) / (s/√n),   df = n - 1

z-test (proportion): z = (p̂ - p₀) / √(p₀(1-p₀)/n)

Chi-square: χ² = Σ (O - E)² / E

Correlation & Regression

Correlation (r)

Measures the strength and direction of a linear relationship between two numerical variables.

Correlation ≠ Causation

This gets repeated a lot, but it's genuinely the most important thing in statistics. Ice cream sales and drowning deaths are strongly correlated - but ice cream doesn't cause drowning. Both increase in summer (confounding variable). Always ask: could a third variable explain this relationship?

Simple Linear Regression

ŷ = b₀ + b₁x

b₁ = r × (sy/sx)     b₀ = ȳ - b₁x̄

R² (Coefficient of Determination)

R² = r². It tells you what percentage of the variation in y is explained by x.

Example: If r = 0.8, then R² = 0.64, meaning 64% of the variation in y is explained by the linear relationship with x. The other 36% is unexplained (other factors, randomness).

Checking Regression Assumptions

  1. Linearity: Scatterplot should show a roughly linear pattern
  2. Independence: Observations are independent (no time-series autocorrelation)
  3. Normal residuals: Residuals should be roughly normally distributed
  4. Equal variance (homoscedasticity): Residuals should have constant spread across all x values

Check these by plotting residuals vs. fitted values. If you see a pattern (funnel shape, curve), the assumptions are violated.

Choosing the Right Test

This is the skill that separates students who get A's from students who memorize formulas. On exams, you'll often need to identify which test to use from a word problem.

Decision Guide

  1. What are you comparing?
    • One group vs. a known value → one-sample test
    • Two independent groups → two-sample test
    • Same group measured twice → paired test
    • Relationship between variables → correlation/regression
  2. What type of data?
    • Numerical (means) → z-test or t-test
    • Categorical (counts/proportions) → chi-square or z-test for proportions
  3. Do you know σ?
    • Yes → z-test (rare in practice)
    • No → t-test (almost always)
Exam Strategy

Look for keywords in the problem. "Proportion" or "percentage" → z-test for proportions. "Average" or "mean" → t-test. "Relationship between" → correlation/regression. "Categories" or "counts" → chi-square. "Before and after" or "same subjects" → paired t-test.

Common Mistakes

  1. Confusing p-value with probability of H₀ - p-value is P(data | H₀), not P(H₀ | data). This is the single most common misconception in statistics.
  2. "Accepting" the null hypothesis - You never accept H₀. You "fail to reject" it. Not finding evidence against something is not the same as proving it's true.
  3. Using mean for skewed data - Median is better for skewed distributions. Report both and explain which is more appropriate.
  4. Confusing correlation with causation - Always consider confounding variables and study design (observational vs. experimental).
  5. Forgetting to check conditions - Every test has assumptions (normality, independence, sample size). State and verify them.
  6. Wrong test choice - Using a z-test when σ is unknown (use t-test), or using a two-sample test when data is paired.
  7. Misinterpreting confidence intervals - "95% confident" refers to the method, not the probability that this specific interval contains the parameter.
  8. Using n instead of n-1 for sample variance - Always use n-1 (Bessel's correction) when working with samples.
  9. Extrapolating beyond the data - A regression line for study hours (1-8) tells you nothing about what happens at 20 hours.
  10. Ignoring practical significance - A result can be statistically significant but practically meaningless. A drug that lowers blood pressure by 0.1 mmHg might have p = 0.01 with 100,000 subjects, but the effect is clinically worthless.
Koa

Struggling with statistics?

Koa's AI tutor explains concepts in plain language, creates practice problems, and adapts to how you learn.

Try Koa Free

Frequently Asked Questions

What topics are covered in intro to statistics?
A typical intro stats course covers: types of data, descriptive statistics (mean, median, standard deviation), data visualization (histograms, boxplots, scatterplots), probability basics, normal distribution, sampling distributions, confidence intervals, hypothesis testing (z-tests, t-tests, chi-square), correlation, and simple linear regression.
Is statistics harder than calculus?
Statistics and calculus are hard in different ways. Calculus is procedural - learn the rules, apply them. Statistics is conceptual - you need to understand what tests mean, when to use them, and how to interpret results. Many students find stats harder because there's more judgment involved. But stats requires less raw math ability than calculus.
How do I study for a statistics exam?
Focus on understanding concepts, not memorizing formulas. Practice interpreting results - what does a p-value of 0.03 mean in context? Work through practice problems that require choosing the right test, not just computing. Make a decision tree for which test to use when. Do past exams under timed conditions.
What is a p-value in simple terms?
A p-value is the probability of getting results as extreme as yours (or more extreme) if the null hypothesis were true. A small p-value (typically < 0.05) means your results would be unlikely under the null hypothesis, so you reject it. A p-value is NOT the probability that the null hypothesis is true.
What is the difference between a population and a sample?
A population is the entire group you want to study (e.g., all UTSC students). A sample is a subset that you actually observe (e.g., 200 randomly selected UTSC students). We use samples because studying entire populations is usually impossible. Statistics gives us tools to make inferences about populations based on samples.
Do I need to memorize all the formulas?
Most stats exams provide a formula sheet. Your job isn't to memorize formulas - it's to know which formula to use and how to interpret the result. That said, you should be able to write z = (x̄ - μ₀)/(σ/√n) and the confidence interval formula from memory, since these come up constantly. Focus on understanding over memorization.

More Study Resources

GPA Calculator Calculus Exam Prep Psychology 101 Guide Pomodoro Study Timer

View all study resources →

Get AI help with statistics problems Try Koa Free →