How It Works
Courses
Pedagogy
FAQ
About Us
Login
JOIN BETA

Mathematics for Machine Learning

Our Mathematics for Machine Learning course provides a comprehensive foundation of the essential mathematical tools required to study modern machine learning.

This course is divided into three main categories: linear algebra, multivariable calculus, and probability & statistics. The linear algebra section covers crucial machine learning fundamentals such as matrices, vector spaces, diagonalization, projections, singular value decomposition, and regression, as well as dimensionality reduction techniques such as principal component analysis. The multivariable calculus section develops the tools needed for optimization and learning algorithms, including vector and matrix calculus. Finally, the probability and statistics section covers random variables, point estimation, maximum likelihood estimation, and confidence intervals.

On completing this course, students will be well-prepared for a university-level machine learning course that tackles concepts such as gradient descent, neural networks, backpropagation, support vector machines, naive Bayes classifiers, and Gaussian mixture models.

Overview

Outcomes

Content

The course begins with foundational topics in set theory and logic. Students develop a precise mathematical language for describing sets, functions, and predicates, and are introduced to key ideas such as supremum and infimum, as well as argmax and argmin notation. Logical reasoning and Boolean functions are also studied, providing important conceptual motivation for understanding the expressive limitations of some simple machine learning models, such as why certain functions (e.g. XOR) cannot be represented by a single-layer neural network.

Students then explore matrices in-depth. They study Gaussian elimination, solve systems of equations, learn about determinants and their properties, and compute inverse matrices. These tools form the computational backbone of many machine learning algorithms.

As part of this course, students perform a deep dive into vector spaces, exploring linear independence, subspaces, bases, dimension, rank, and nullity. They also study operations such as the Hadamard product, which appears naturally in element-wise computations in neural networks. Students generalize key concepts to abstract vector spaces and inner product spaces, and study orthogonality, projections, and the Gram–Schmidt process.

Students will learn how to find eigenvectors of a matrix, compute matrix diagonalizations, and extend these ideas to orthogonal diagonalization of symmetric matrices. These techniques underpin dimensionality reduction methods and are central to understanding the structure of data.

The course then develops key applications of linear algebra in machine learning. Students study singular value decomposition and its relationship to principal component analysis, and learn how these tools are used for dimensionality reduction, feature extraction, and identifying structure in high-dimensional datasets. Linear least-squares problems and regression techniques are also studied as foundational methods for fitting models to data.

A solid grasp of multivariable calculus is essential for understanding modern machine learning algorithms. In this course, students become well-versed in partial derivatives and gradient vectors (used in gradient-based optimization), the multivariable chain rule (essential for backpropagation), and vector-valued functions. Students extend differentiation to maps between multi-dimensional vector spaces and develop the tools of vector and matrix calculus used to compute gradients of loss functions in neural networks.

Students also study optimization of multivariable functions, including critical points, second derivative tests, and constrained optimization using Lagrange multipliers. These techniques form the mathematical foundation of algorithms such as support vector machines and constrained optimization problems in machine learning.

Students work with standard multivariable surfaces to build intuition for loss surfaces, and study double integrals as a tool for understanding continuous probability distributions.

On the probability and statistics side, students study discrete and continuous random variables, including probability density functions, cumulative distribution functions, transformations, expectation, moments, variance, and Bayes' theorem. Important probability distributions are explored in detail.

Students extend their understanding to joint, marginal, and conditional distributions, as well as sums and products of random variables. Topics such as covariance, correlation, and multivariate normal distributions are introduced, which are essential in probabilistic modeling.

The course concludes with parametric inference, focusing on point estimation, maximum likelihood estimation, and confidence intervals, which form the statistical foundation for many machine learning methods.

Upon successful completion of this course, students will have mastered the following:
1.
Set Theory
25 topics
1.1. Introduction to Set Theory
1.1.1. Special Sets
1.1.2. Equivalent Sets
1.1.3. The Constructive Definition of a Set
1.1.4. The Conditional Definition of a Set
1.1.5. Describing Sets Using Set-Builder Notation
1.1.6. Describing Planar Regions Using Set-Builder Notation
1.1.7. Indicator Functions
1.1.8. Indicator Functions for Predicates
1.2. Set Operations
1.2.1. The Difference of Sets
1.2.2. Set Complements
1.2.3. The Cartesian Product
1.2.4. Visualizing Cartesian Products
1.2.5. Indexed Sets
1.2.6. Sets and Functions
1.3. Properties of Sets
1.3.1. Subsets
1.3.2. Cardinality of Finite Sets
1.3.3. Infinite Sets
1.3.4. Interior and Boundary Points
1.3.5. Interiors and Boundaries of Sets
1.3.6. Open and Closed Sets
1.3.7. Disjoint Sets
1.3.8. The Maximum and Minimum of a Set
1.3.9. Supremum and Infimum
1.3.10. Argmax and Argmin Notation
1.3.11. Finding Argmax and Argmin From Tables and Graphs
2.
Logic
19 topics
2.4. Logical Statements
2.4.1. Statements and Predicates
2.4.2. The "And" and "Or" Connectives
2.4.3. Truth Tables
2.4.4. The "Not" Connective
2.4.5. Logical Equivalence
2.4.6. Logical Associative and Commutative Laws
2.4.7. The Distributive Laws
2.4.8. The Absorption Laws
2.4.9. De Morgan's Laws for Logic
2.5. Implications and Biconditionals
2.5.1. Conditional Statements
2.5.2. Logical Equivalence with Conditional Statements
2.5.3. Biconditional Statements
2.5.4. Truth Sets of Predicates
2.5.5. The "And" and "Or" Connectives With Predicates
2.5.6. The "Not" Connective With Predicates
2.5.7. Simplifying Predicate Expressions Using De Morgan's Laws
2.5.8. Conditional Statements With Predicates
2.6. Introduction to Boolean Algebra
2.6.1. Boolean Functions
2.6.2. Boolean Functions And Logical Operations
3.
Vectors and Matrices
35 topics
3.7. Vector Geometry
3.7.1. The Vector Equation of a Line
3.7.2. The Parametric Equations of a Line
3.7.3. The Cartesian Equation of a Line
3.7.4. The Vector Equation of a Plane
3.7.5. The Cartesian Equation of a Plane
3.7.6. The Parametric Equations of a Plane
3.7.7. The Intersection of Two Planes
3.7.8. The Shortest Distance Between a Plane and a Point
3.7.9. The Intersection Between a Line and a Plane
3.8. Determinants
3.8.1. The Determinant of an NxN Matrix
3.8.2. Finding Determinants Using Laplace Expansions
3.8.3. Basic Properties of Determinants
3.8.4. Further Properties of Determinants
3.8.5. Row and Column Operations on Determinants
3.8.6. Conditions When a Determinant Equals Zero
3.9. Gaussian Elimination
3.9.1. Systems of Equations as Augmented Matrices
3.9.2. Row Echelon Form
3.9.3. Solving Systems of Equations Using Back Substitution
3.9.4. Elementary Row Operations
3.9.5. Creating Rows or Columns Containing Zeros Using Gaussian Elimination
3.9.6. Solving 2x2 Systems of Equations Using Gaussian Elimination
3.9.7. Solving 2x2 Singular Systems of Equations Using Gaussian Elimination
3.9.8. Solving 3x3 Systems of Equations Using Gaussian Elimination
3.9.9. Identifying the Pivot Columns of a Matrix
3.9.10. Solving 3x3 Singular Systems of Equations Using Gaussian Elimination
3.9.11. Reduced Row Echelon Form
3.9.12. Gaussian Elimination For NxM Systems of Equations
3.10. The Inverse of a Matrix
3.10.1. Finding the Inverse of a 2x2 Matrix Using Row Operations
3.10.2. Finding the Inverse of a 3x3 Matrix Using Row Operations
3.10.3. Matrices With Easy-to-Find Inverses
3.10.4. The Invertible Matrix Theorem in Terms of 2x2 Systems of Equations
3.10.5. Triangular Matrices
3.11. Affine Transformations
3.11.1. Affine Transformations
3.11.2. The Image of an Affine Transformation
3.11.3. The Inverse of an Affine Transformation
4.
Vector Spaces
21 topics
4.12. Vectors in N-Dimensional Space
4.12.1. Vectors in N-Dimensional Euclidean Space
4.12.2. Linear Combinations of Vectors in N-Dimensional Euclidean Space
4.12.3. Linear Span of Vectors in N-Dimensional Euclidean Space
4.12.4. Linear Dependence and Independence
4.12.5. The Hadamard Product
4.13. Subspaces of N-Dimensional Space
4.13.1. Subspaces of N-Dimensional Space
4.13.2. Subspaces of N-Dimensional Space: Geometric Interpretation
4.13.3. The Column Space of a Matrix
4.13.4. The Null Space of a Matrix
4.14. Bases of N-Dimensional Space
4.14.1. Finding a Basis of a Span
4.14.2. Finding a Basis of the Column Space of a Matrix
4.14.3. Finding a Basis of the Null Space of a Matrix
4.14.4. Expressing the Coordinates of a Vector in a Given Basis
4.14.5. Writing Vectors in Different Bases
4.14.6. The Change-of-Coordinates Matrix
4.14.7. Changing a Basis Using the Change-of-Coordinates Matrix
4.15. Dimension and Rank in N-Dimensional Space
4.15.1. The Dimension of a Span
4.15.2. The Rank of a Matrix
4.15.3. The Dimension of the Null Space of a Matrix
4.15.4. The Invertible Matrix Theorem in Terms of Dimension, Rank and Nullity
4.15.5. The Rank-Nullity Theorem
5.
Diagonalization of Matrices
13 topics
5.16. Eigenvectors and Eigenvalues
5.16.1. The Eigenvalues and Eigenvectors of a 2x2 Matrix
5.16.2. Calculating the Eigenvalues of a 2x2 Matrix
5.16.3. Calculating the Eigenvectors of a 2x2 Matrix
5.16.4. The Characteristic Equation of a Matrix
5.16.5. Calculating the Eigenvectors of a 3x3 Matrix With Distinct Eigenvalues
5.16.6. Calculating the Eigenvectors of a 3x3 Matrix in the General Case
5.17. Diagonalization
5.17.1. Diagonalizing a 2x2 Matrix
5.17.2. Properties of Diagonalization
5.17.3. Diagonalizing a 3x3 Matrix With Distinct Eigenvalues
5.17.4. Diagonalizing a 3x3 Matrix in the General Case
5.17.5. Symmetric Matrices
5.17.6. Diagonalization of 2x2 Symmetric Matrices
5.17.7. Diagonalization of 3x3 Symmetric Matrices
6.
Orthogonality & Projections
18 topics
6.18. Inner Products
6.18.1. The Dot Product in N-Dimensional Euclidean Space
6.18.2. The Norm of a Vector in N-Dimensional Euclidean Space
6.18.3. Euclidean, Manhattan, and Minkowski Distance
6.18.4. Introduction to Abstract Vector Spaces
6.18.5. Defining Abstract Vector Spaces
6.18.6. Inner Product Spaces
6.19. Orthogonality
6.19.1. Orthogonal Vectors in Euclidean Spaces
6.19.2. The Cauchy-Schwarz Inequality and the Angle Between Two Vectors
6.19.3. Orthogonal Complements
6.19.4. Orthogonal Sets in Euclidean Spaces
6.19.5. Orthogonal Matrices
6.19.6. Orthogonal Linear Transformations
6.20. Orthogonal Projections
6.20.1. Projecting Vectors Onto One-Dimensional Subspaces
6.20.2. The Components of a Vector with Respect to an Orthogonal or Orthonormal Basis
6.20.3. Projecting Vectors Onto Subspaces in Euclidean Spaces (Orthogonal Bases)
6.20.4. Projecting Vectors Onto Subspaces in Euclidean Spaces (Arbitrary Bases)
6.20.5. Projecting Vectors Onto Subspaces in Euclidean Spaces (Arbitrary Bases): Applications
6.20.6. The Gram-Schmidt Process for Two Vectors
7.
Singular Value Decomposition
12 topics
7.21. Quadratic Forms
7.21.1. Bilinear Forms
7.21.2. Quadratic Forms
7.21.3. Change of Variables in Quadratic Forms
7.21.4. Positive-Definite and Negative-Definite Quadratic Forms
7.21.5. Constrained Optimization of Quadratic Forms
7.21.6. Constrained Optimization of Quadratic Forms: Determining Where Extrema are Attained
7.22. Singular Value Decomposition
7.22.1. The Singular Values of a Matrix
7.22.2. Computing the Singular Values of a Matrix
7.22.3. Singular Value Decomposition of 2x2 Matrices
7.22.4. Singular Value Decomposition of 2x2 Matrices With Zero or Repeated Eigenvalues
7.22.5. Singular Value Decomposition of Larger Matrices
7.22.6. Singular Value Decomposition and the Pseudoinverse Matrix
8.
Applications of Linear Algebra
9 topics
8.23. Linear Least-Squares Problems
8.23.1. The Least-Squares Solution of a Linear System (Without Collinearity)
8.23.2. The Least-Squares Solution of a Linear System (With Collinearity)
8.24. Linear Regression
8.24.1. Linear Regression With Matrices
8.24.2. Polynomial Regression With Matrices
8.24.3. Multiple Linear Regression With Matrices
8.25. Principal Component Analysis
8.25.1. Introduction to Principal Component Analysis
8.25.2. Computing Principal Components
8.25.3. The Connection Between PCA and SVD
8.25.4. Feature Extraction Using PCA
9.
Multivariable Calculus
57 topics
9.26. Quadric Surfaces and Cylinders
9.26.1. Ellipsoids
9.26.2. Hyperboloids
9.26.3. Paraboloids
9.26.4. Elliptic Cones
9.26.5. Cylinders
9.26.6. Intersections of Lines and Planes With Surfaces
9.26.7. Identifying Quadric Surfaces
9.27. Partial Derivatives
9.27.1. The Domain of a Multivariable Function
9.27.2. Level Curves
9.27.3. Level Surfaces
9.27.4. Limits and Continuity of Multivariable Functions
9.27.5. Introduction to Partial Derivatives
9.27.6. Computing Partial Derivatives Using the Rules of Differentiation
9.27.7. Geometric Interpretations of Partial Derivatives
9.27.8. Partial Differentiability of Multivariable Functions
9.27.9. Higher-Order Partial Derivatives
9.27.10. Equality of Mixed Partial Derivatives
9.27.11. Tangent Planes to Surfaces
9.27.12. Linearization of Multivariable Functions
9.27.13. The Multivariable Chain Rule
9.28. Vector-Valued Functions
9.28.1. The Domain of a Vector-Valued Function
9.28.2. Tangent Vectors and Tangent Lines to Curves
9.28.3. The Gradient Vector
9.28.4. The Gradient as a Normal Vector
9.28.5. Directional Derivatives
9.28.6. The Multivariable Chain Rule in Vector Form
9.28.7. Gradients With Respect to a Variable Subset
9.29. Differentiation
9.29.1. The Jacobian
9.29.2. The Inverse Function Theorem
9.29.3. The Jacobian of a Three-Dimensional Transformation
9.29.4. The Derivative of a Multivariable Function
9.29.5. The Second Derivative of a Multivariable Function
9.29.6. Second-Degree Taylor Polynomials of Multivariable Functions
9.30. Matrix Calculus
9.30.1. Total and Tensor Derivatives
9.30.2. The Chain Rule for Total Derivatives
9.30.3. Vector Gradients
9.30.4. Further Vector Gradients
9.30.5. Matrix Gradients
9.31. Approximating Volumes With Riemann Sums
9.31.1. Partitions of Intervals
9.31.2. Calculating Double Summations Over Partitions
9.31.3. Approximating Volumes Using Lower Riemann Sums
9.31.4. Approximating Volumes Using Upper Riemann Sums
9.31.5. Lower Riemann Sums Over General Rectangular Partitions
9.31.6. Upper Riemann Sums Over General Rectangular Partitions
9.31.7. Defining Double Integrals Using Lower and Upper Riemann Sums
9.32. Double Integrals
9.32.1. Double Integrals Over Rectangular Domains
9.32.2. Double Integrals Over Non-Rectangular Domains
9.32.3. Properties of Double Integrals
9.32.4. Type I and II Regions in Two-Dimensional Space
9.32.5. Double Integrals Over Type I Regions
9.32.6. Double Integrals Over Type II Regions
9.33. Optimization
9.33.1. Global vs. Local Extrema and Critical Points of Multivariable Functions
9.33.2. The Second Partial Derivatives Test
9.33.3. Calculating Global Extrema of Multivariable Functions
9.33.4. Lagrange Multipliers With One Constraint
9.33.5. Lagrange Multipliers With Multiple Constraints
9.33.6. Optimizing Multivariable Functions Using Lagrange Multipliers
10.
Probability & Random Variables
41 topics
10.34. Probability
10.34.1. Extending the Law of Total Probability
10.34.2. Bayes' Theorem
10.34.3. Extending Bayes' Theorem
10.35. Random Variables
10.35.1. Probability Density Functions of Continuous Random Variables
10.35.2. Calculating Probabilities With Continuous Random Variables
10.35.3. Continuous Random Variables Over Infinite Domains
10.35.4. Cumulative Distribution Functions for Continuous Random Variables
10.35.5. Approximating Discrete Random Variables as Continuous
10.35.6. Simulating Random Observations
10.36. Transformations of Random Variables
10.36.1. One-to-One Transformations of Discrete Random Variables
10.36.2. Many-to-One Transformations of Discrete Random Variables
10.36.3. The Distribution Function Method
10.36.4. The Change-of-Variables Method for Continuous Random Variables
10.36.5. The Distribution Function Method With Many-to-One Transformations
10.37. Expectation
10.37.1. Expected Values of Discrete Random Variables
10.37.2. Properties of Expectation for Discrete Random Variables
10.37.3. Moments of Discrete Random Variables
10.37.4. Variance of Discrete Random Variables
10.37.5. Properties of Variance for Discrete Random Variables
10.37.6. Expected Values of Continuous Random Variables
10.37.7. Moments of Continuous Random Variables
10.37.8. Variance of Continuous Random Variables
10.37.9. The Rule of the Lazy Statistician
10.37.10. The Law of Total Expectation for Discrete Random Variables
10.38. Discrete Probability Distributions
10.38.1. The Bernoulli Distribution
10.38.2. Modeling With the Binomial Distribution
10.38.3. The CDF of the Binomial Distribution
10.38.4. Mean and Variance of the Binomial Distribution
10.38.5. The Discrete Uniform Distribution
10.38.6. Modeling With Discrete Uniform Distributions
10.38.7. Mean and Variance of the Discrete Uniform Distribution
10.38.8. The Poisson Distribution
10.38.9. Modeling With the Poisson Distribution
10.38.10. The CDF of the Poisson Distribution
10.39. Continuous Probability Distributions
10.39.1. The Continuous Uniform Distribution
10.39.2. Mean and Variance of the Continuous Uniform Distribution
10.39.3. Modeling With Continuous Uniform Distributions
10.39.4. The Gamma Function
10.39.5. The Chi-Square Distribution
10.39.6. The Student's T-Distribution
10.39.7. The Exponential Distribution
11.
Combining Random Variables
29 topics
11.40. Distributions of Two Discrete Random Variables
11.40.1. Double Summations
11.40.2. Joint Distributions for Discrete Random Variables
11.40.3. Marginal Distributions for Discrete Random Variables
11.40.4. Independence of Discrete Random Variables
11.40.5. Conditional Distributions for Discrete Random Variables
11.40.6. The Joint CDF of Two Discrete Random Variables
11.41. Distributions of Two Continuous Random Variables
11.41.1. Joint Distributions for Continuous Random Variables
11.41.2. Marginal Distributions for Continuous Random Variables
11.41.3. Independence of Continuous Random Variables
11.41.4. Conditional Distributions for Continuous Random Variables
11.41.5. The Joint CDF of Two Continuous Random Variables
11.41.6. Properties of the Joint CDF of Two Continuous Random Variables
11.42. Expectation for Joint Distributions
11.42.1. Expected Values of Sums and Products of Random Variables
11.42.2. Variance of Sums of Independent Random Variables
11.42.3. Computing Expected Values From Joint Distributions
11.42.4. Conditional Expectation for Discrete Random Variables
11.42.5. Conditional Variance for Discrete Random Variables
11.42.6. Conditional Expectation for Continuous Random Variables
11.42.7. Conditional Variance for Continuous Random Variables
11.42.8. The Rule of the Lazy Statistician for Two Random Variables
11.43. Covariance of Random Variables
11.43.1. The Covariance of Two Random Variables
11.43.2. Variance of Sums of Random Variables
11.43.3. The Correlation Coefficient for Two Random Variables
11.43.4. The Covariance Matrix
11.44. Normally Distributed Random Variables
11.44.1. Normal Approximations of Binomial Distributions
11.44.2. Combining Two Normally Distributed Random Variables
11.44.3. Combining Multiple Normally Distributed Random Variables
11.44.4. I.I.D Normal Random Variables
11.44.5. The Bivariate Normal Distribution
12.
Parametric Inference
22 topics
12.45. Point Estimation
12.45.1. The Sample Mean
12.45.2. Sampling Distributions
12.45.3. Variance of Sample Means
12.45.4. The Sample Variance
12.45.5. Sample Means From Normal Populations
12.45.6. The Central Limit Theorem
12.45.7. Sampling Proportions From Finite Populations
12.45.8. Point Estimates of Population Proportions
12.45.9. The Sample Covariance Matrix
12.46. Maximum Likelihood
12.46.1. Product Notation
12.46.2. Logarithmic Differentiation
12.46.3. Likelihood Functions for Discrete Probability Distributions
12.46.4. Log-Likelihood Functions for Discrete Probability Distributions
12.46.5. Likelihood Functions for Continuous Probability Distributions
12.46.6. Log-Likelihood Functions for Continuous Probability Distributions
12.46.7. Maximum Likelihood Estimation
12.47. Confidence Intervals
12.47.1. Confidence Intervals for One Mean: Known Population Variance
12.47.2. Confidence Intervals for One Mean: Unknown Population Variance
12.47.3. Confidence Intervals for One Proportion
12.47.4. Confidence Intervals for Two Means: Known and Unequal Population Variances
12.47.5. Confidence Intervals for Linear Regression Slope Parameters
12.47.6. Confidence Intervals for Linear Regression Intercept Parameters