Statistical Thinking
Statistics is about making decisions under uncertainty. You have incomplete information (a sample) and need to draw conclusions about something bigger (a population). The whole field exists because we can't study everything - we have to work with what we can observe.
Key Vocabulary
- Population: The entire group you're interested in (all UTSC students, all Canadians, all widgets from a factory)
- Sample: A subset of the population that you actually observe
- Parameter: A number that describes the population (e.g., true mean height of all Canadians). Usually unknown.
- Statistic: A number calculated from your sample (e.g., average height of 200 surveyed Canadians). Used to estimate parameters.
- Variable: A characteristic being measured (height, grade, opinion)
Types of Data
| Type | Description | Examples |
| Categorical (Nominal) | Categories with no order | Eye color, major, country |
| Categorical (Ordinal) | Categories with a meaningful order | Letter grades (A, B, C), survey ratings (strongly agree → disagree) |
| Numerical (Discrete) | Countable numbers | Number of courses, number of siblings |
| Numerical (Continuous) | Measurable on a scale | Height, weight, GPA, temperature |
Why This Matters
The type of data determines which statistical methods you can use. You can't calculate a mean for eye color (nominal). You can't do a t-test on letter grades (ordinal). Getting this wrong is the first mistake students make on exams.
Descriptive Statistics
Descriptive statistics summarize your data. They answer: "What does this dataset look like?"
Measures of Center
- Mean (x̄): Sum of all values divided by the count. Sensitive to outliers.
- Median: The middle value when data is sorted. Resistant to outliers.
- Mode: The most frequent value. Can have multiple modes or none.
Mean: x̄ = (Σxi) / n
Median: middle value of sorted data (or average of two middle values if n is even)
When to Use Which
Mean when data is roughly symmetric with no extreme outliers (test scores, heights). Median when data is skewed or has outliers (income, house prices). If someone earns $10M and 9 people earn $50K, the mean salary is $1.05M - misleading. The median is $50K - accurate.
Measures of Spread
- Range: Max - Min. Simple but heavily affected by outliers.
- IQR (Interquartile Range): Q3 - Q1. The range of the middle 50% of data. Resistant to outliers.
- Variance (s²): Average of squared deviations from the mean. Hard to interpret directly (units are squared).
- Standard Deviation (s): Square root of variance. In the same units as the data. Most commonly used.
Variance: s² = Σ(xi - x̄)² / (n - 1)
Standard Deviation: s = √s²
Why n-1? This is called Bessel's correction. We divide by n-1 (not n) when calculating sample variance because a sample tends to underestimate population variability. Dividing by n-1 corrects this bias. On exams, always use n-1 for samples.
Percentiles and Quartiles
- Q1 (25th percentile): 25% of data falls below this value
- Q2 (50th percentile): The median
- Q3 (75th percentile): 75% of data falls below this value
- IQR = Q3 - Q1
- Outlier rule: Values below Q1 - 1.5×IQR or above Q3 + 1.5×IQR are considered outliers
Data Visualization
Different chart types serve different purposes. Choosing the wrong one misrepresents your data.
| Chart | Best For | Data Type |
| Histogram | Distribution shape of one numerical variable | Continuous |
| Boxplot | Comparing distributions, spotting outliers | Continuous |
| Bar chart | Comparing counts/proportions across categories | Categorical |
| Scatterplot | Relationship between two numerical variables | Two continuous |
| Line chart | Trends over time | Time series |
| Pie chart | Parts of a whole (use sparingly) | Categorical |
Reading a Boxplot
Boxplots show five-number summary at a glance: minimum, Q1, median, Q3, maximum. The "box" spans from Q1 to Q3 (the IQR). The line inside is the median. "Whiskers" extend to the most extreme non-outlier values. Individual dots beyond the whiskers are outliers.
Distribution Shapes
- Symmetric: Mean ≈ Median. Bell-shaped (normal) or uniform.
- Right-skewed: Long tail to the right. Mean > Median. (Income, house prices)
- Left-skewed: Long tail to the left. Mean < Median. (Age at retirement, exam scores with easy test)
- Bimodal: Two peaks. May indicate two groups mixed together.
Probability Basics
Probability is the language of uncertainty. It tells you how likely events are, which is the foundation for everything in inferential statistics.
Rules
- P(A) is always between 0 and 1 (or 0% to 100%)
- Complement rule: P(not A) = 1 - P(A)
- Addition rule: P(A or B) = P(A) + P(B) - P(A and B)
- Multiplication rule (independent): P(A and B) = P(A) × P(B)
- Conditional probability: P(A|B) = P(A and B) / P(B)
Bayes' Theorem: P(A|B) = P(B|A) × P(A) / P(B)
Independence vs. Mutual Exclusivity
These are different concepts that students constantly confuse. Independent: knowing A happened doesn't change the probability of B (coin flips). Mutually exclusive: A and B can't both happen (rolling a 3 and a 5 on one die). If events are mutually exclusive, they CANNOT be independent (unless one has probability 0).
Counting
- Permutations (order matters): P(n,r) = n! / (n-r)!
- Combinations (order doesn't matter): C(n,r) = n! / (r!(n-r)!)
- Quick test: Is "ABC" different from "CBA"? If yes → permutation. If no → combination.
Distributions
Normal Distribution
The most important distribution in statistics. Bell-shaped, symmetric, defined by mean (μ) and standard deviation (σ).
Empirical Rule (68-95-99.7):
68% of data within 1σ of the mean
95% of data within 2σ of the mean
99.7% of data within 3σ of the mean
Z-Scores
A z-score tells you how many standard deviations a value is from the mean.
z = (x - μ) / σ
z = 0 means the value equals the mean. z = 2 means 2 standard deviations above the mean. Use z-tables or calculators to find probabilities.
Interpreting Z-Scores
If your exam score has z = 1.5, you scored 1.5 standard deviations above the class average - roughly better than 93% of students. Z-scores let you compare across different scales: a z = 1.5 in chemistry is "the same amount of above-average" as z = 1.5 in physics, even if the raw scores look completely different.
Other Common Distributions
| Distribution | When It Appears | Key Feature |
| Binomial | Fixed number of yes/no trials | n trials, probability p each |
| Poisson | Counting events in a fixed interval | Rare events, rate λ |
| t-distribution | Small samples, unknown σ | Like normal but heavier tails |
| Chi-square (χ²) | Categorical data, goodness-of-fit | Always right-skewed, df determines shape |
| F-distribution | Comparing variances, ANOVA | Ratio of two chi-squares |
Binomial Distribution
P(X = k) = C(n,k) × pk × (1-p)n-k
Mean: μ = np Standard deviation: σ = √(np(1-p))
Requirements: Fixed number of trials (n), each trial is independent, each trial has same probability of success (p), outcomes are binary (success/failure).
Sampling & Estimation
Sampling Distributions
If you take many samples from a population and compute the mean of each sample, those means form a sampling distribution. This is the key idea behind all of inferential statistics.
Central Limit Theorem (CLT)
The most important theorem in statistics:
For large enough n (usually n ≥ 30), the sampling distribution of x̄ is approximately normal,
with mean μ and standard error σ/√n - regardless of the population's shape.
Why CLT Matters
The CLT is why we can use normal-distribution-based methods (z-tests, confidence intervals) even when the population isn't normal. As long as your sample is large enough, the sample mean is approximately normally distributed. This single theorem powers most of the hypothesis testing you'll learn.
Confidence Intervals
A confidence interval gives a range of plausible values for a population parameter.
CI = point estimate ± margin of error
For a mean: x̄ ± z* × (s/√n) or x̄ ± t* × (s/√n)
Interpretation: "We are 95% confident that the true population mean falls between [lower, upper]."
Common misconception: A 95% CI does NOT mean there's a 95% probability the parameter is in this interval. The parameter is fixed - it's either in or out. The 95% refers to the method: if we repeated this process many times, 95% of our intervals would contain the true value.
Margin of Error
Margin of error depends on three things:
- Confidence level: Higher confidence → wider interval (99% CI is wider than 95% CI)
- Sample size: Larger n → narrower interval (more data = more precision)
- Variability: More spread in data → wider interval
Hypothesis Testing
This is where most students get confused. Hypothesis testing is a structured way to decide if your data provides enough evidence against a claim.
The Framework
- State hypotheses:
- H₀ (null): The "nothing interesting is happening" claim. Always includes = sign.
- H₁ (alternative): What you're trying to show. Can be ≠, <, or >.
- Choose significance level (α): Usually 0.05. This is your threshold for "unlikely enough to reject H₀."
- Calculate test statistic: Measures how far your sample result is from what H₀ predicts.
- Find p-value: Probability of getting a test statistic this extreme (or more) if H₀ is true.
- Make a decision: If p-value ≤ α → reject H₀. If p-value > α → fail to reject H₀.
Critical Language
You never "accept" the null hypothesis - you "fail to reject" it. This matters because absence of evidence is not evidence of absence. Also: "statistically significant" means p ≤ α. It does NOT mean "important" or "large effect." A tiny, meaningless difference can be statistically significant with a large enough sample.
Types of Errors
| H₀ True | H₀ False |
| Reject H₀ | Type I Error (α) - false alarm | Correct decision (Power) |
| Fail to reject H₀ | Correct decision | Type II Error (β) - missed signal |
Type I (α): Concluding there's an effect when there isn't one. Like a fire alarm going off with no fire.
Type II (β): Missing a real effect. Like a fire alarm NOT going off when there IS a fire.
Power = 1 - β: The probability of correctly detecting a real effect. Higher power is better. Increase power by: increasing sample size, increasing α, or when the true effect is larger.
Common Tests
| Test | What It Tests | Requirements |
| One-sample z-test | Is the population mean equal to μ₀? | σ known, large n or normal population |
| One-sample t-test | Is the population mean equal to μ₀? | σ unknown, roughly normal or large n |
| Two-sample t-test | Are two population means equal? | Independent samples, roughly normal or large n |
| Paired t-test | Is the mean difference zero? | Paired/matched observations |
| Chi-square goodness-of-fit | Does data fit an expected distribution? | Categorical data, expected counts ≥ 5 |
| Chi-square test of independence | Are two categorical variables related? | Two categorical variables, expected counts ≥ 5 |
| One-sample z-test for proportion | Is the population proportion equal to p₀? | np₀ ≥ 10 and n(1-p₀) ≥ 10 |
Test Statistic Formulas
z-test (mean): z = (x̄ - μ₀) / (σ/√n)
t-test (mean): t = (x̄ - μ₀) / (s/√n), df = n - 1
z-test (proportion): z = (p̂ - p₀) / √(p₀(1-p₀)/n)
Chi-square: χ² = Σ (O - E)² / E
Correlation & Regression
Correlation (r)
Measures the strength and direction of a linear relationship between two numerical variables.
- r = 1: perfect positive linear relationship
- r = -1: perfect negative linear relationship
- r = 0: no linear relationship (but there could still be a nonlinear one!)
- |r| > 0.7: strong, 0.3-0.7: moderate, < 0.3: weak
Correlation ≠ Causation
This gets repeated a lot, but it's genuinely the most important thing in statistics. Ice cream sales and drowning deaths are strongly correlated - but ice cream doesn't cause drowning. Both increase in summer (confounding variable). Always ask: could a third variable explain this relationship?
Simple Linear Regression
ŷ = b₀ + b₁x
b₁ = r × (sy/sx) b₀ = ȳ - b₁x̄
- b₁ (slope): For each 1-unit increase in x, ŷ changes by b₁ units
- b₀ (intercept): The predicted value of y when x = 0 (may not be meaningful)
- ŷ (y-hat): The predicted value - NOT the actual observed value
- Residual: y - ŷ (actual minus predicted). Positive = model underpredicted.
R² (Coefficient of Determination)
R² = r². It tells you what percentage of the variation in y is explained by x.
Example: If r = 0.8, then R² = 0.64, meaning 64% of the variation in y is explained by the linear relationship with x. The other 36% is unexplained (other factors, randomness).
Checking Regression Assumptions
- Linearity: Scatterplot should show a roughly linear pattern
- Independence: Observations are independent (no time-series autocorrelation)
- Normal residuals: Residuals should be roughly normally distributed
- Equal variance (homoscedasticity): Residuals should have constant spread across all x values
Check these by plotting residuals vs. fitted values. If you see a pattern (funnel shape, curve), the assumptions are violated.
Choosing the Right Test
This is the skill that separates students who get A's from students who memorize formulas. On exams, you'll often need to identify which test to use from a word problem.
Decision Guide
- What are you comparing?
- One group vs. a known value → one-sample test
- Two independent groups → two-sample test
- Same group measured twice → paired test
- Relationship between variables → correlation/regression
- What type of data?
- Numerical (means) → z-test or t-test
- Categorical (counts/proportions) → chi-square or z-test for proportions
- Do you know σ?
- Yes → z-test (rare in practice)
- No → t-test (almost always)
Exam Strategy
Look for keywords in the problem. "Proportion" or "percentage" → z-test for proportions. "Average" or "mean" → t-test. "Relationship between" → correlation/regression. "Categories" or "counts" → chi-square. "Before and after" or "same subjects" → paired t-test.
Common Mistakes
- Confusing p-value with probability of H₀ - p-value is P(data | H₀), not P(H₀ | data). This is the single most common misconception in statistics.
- "Accepting" the null hypothesis - You never accept H₀. You "fail to reject" it. Not finding evidence against something is not the same as proving it's true.
- Using mean for skewed data - Median is better for skewed distributions. Report both and explain which is more appropriate.
- Confusing correlation with causation - Always consider confounding variables and study design (observational vs. experimental).
- Forgetting to check conditions - Every test has assumptions (normality, independence, sample size). State and verify them.
- Wrong test choice - Using a z-test when σ is unknown (use t-test), or using a two-sample test when data is paired.
- Misinterpreting confidence intervals - "95% confident" refers to the method, not the probability that this specific interval contains the parameter.
- Using n instead of n-1 for sample variance - Always use n-1 (Bessel's correction) when working with samples.
- Extrapolating beyond the data - A regression line for study hours (1-8) tells you nothing about what happens at 20 hours.
- Ignoring practical significance - A result can be statistically significant but practically meaningless. A drug that lowers blood pressure by 0.1 mmHg might have p = 0.01 with 100,000 subjects, but the effect is clinically worthless.
Struggling with statistics?
Koa's AI tutor explains concepts in plain language, creates practice problems, and adapts to how you learn.
Try Koa Free
Frequently Asked Questions
What topics are covered in intro to statistics?
A typical intro stats course covers: types of data, descriptive statistics (mean, median, standard deviation), data visualization (histograms, boxplots, scatterplots), probability basics, normal distribution, sampling distributions, confidence intervals, hypothesis testing (z-tests, t-tests, chi-square), correlation, and simple linear regression.
Is statistics harder than calculus?
Statistics and calculus are hard in different ways. Calculus is procedural - learn the rules, apply them. Statistics is conceptual - you need to understand what tests mean, when to use them, and how to interpret results. Many students find stats harder because there's more judgment involved. But stats requires less raw math ability than calculus.
How do I study for a statistics exam?
Focus on understanding concepts, not memorizing formulas. Practice interpreting results - what does a p-value of 0.03 mean in context? Work through practice problems that require choosing the right test, not just computing. Make a decision tree for which test to use when. Do past exams under timed conditions.
What is a p-value in simple terms?
A p-value is the probability of getting results as extreme as yours (or more extreme) if the null hypothesis were true. A small p-value (typically < 0.05) means your results would be unlikely under the null hypothesis, so you reject it. A p-value is NOT the probability that the null hypothesis is true.
What is the difference between a population and a sample?
A population is the entire group you want to study (e.g., all UTSC students). A sample is a subset that you actually observe (e.g., 200 randomly selected UTSC students). We use samples because studying entire populations is usually impossible. Statistics gives us tools to make inferences about populations based on samples.
Do I need to memorize all the formulas?
Most stats exams provide a formula sheet. Your job isn't to memorize formulas - it's to know which formula to use and how to interpret the result. That said, you should be able to write z = (x̄ - μ₀)/(σ/√n) and the confidence interval formula from memory, since these come up constantly. Focus on understanding over memorization.