Mathematics for Machine Learning

Our Mathematics for Machine Learning course provides a comprehensive foundation of the essential mathematical tools required to study machine learning.

This course is divided into three main categories: linear algebra, multivariable calculus, and probability & statistics. The linear algebra section covers crucial machine learning fundamentals such as matrices, vector spaces, diagonalization, projections, singular value decomposition, and regression. The multivariable calculus section examines vector-valued functions, partial derivatives, and multiple integrals. Finally, the probability and statistics section covers random variables, point estimation, maximum likelihood, hypothesis testing, and confidence intervals.

On completing this course, students will be well-prepared for a university-level machine learning course that tackles concepts such as gradient descent, neural networks, backpropagation, support vector machines, naive Bayes classifiers, and Gaussian mixture models.

Content

After briefly looking at some essential set theory, logic, and vector geometry, students explore matrices in-depth. They will study Gaussian elimination, solve systems of equations, learn about determinants and their properties, and compute inverse matrices.

As part of this course, students perform a deep dive into vector spaces, exploring linear independence, subspaces, bases, dimension, rank, and nullity. Students will generalize key concepts to abstract vector spaces and inner product spaces. Various aspects of orthogonality in vector spaces are considered, including orthogonal sets, complements, orthogonal matrices, orthogonal projections, and the Gram-Schmidt process.

Students will learn how to find the eigenvectors of a matrix, compute a matrix diagonalization, and extend this understanding to symmetric matrices.

In addition, this course discusses various linear algebra applications relevant to machine learning, such as singular value decomposition, linear least-squares, regression, and principal component analysis.

A solid grasp of some key multivariable calculus concepts is needed to understand fundamental machine learning algorithms successfully. In this course, students will become well-versed in partial derivatives and gradient vectors (for gradient descent), the multivariable chain rule (essential for backpropagation), vector-valued functions, and generally, the differential calculus of maps between multi-dimensional vector spaces (which show up when machine learning models are represented using matrix notation). Students will also work with standard multivariable surfaces to build intuition for the concept of a loss surface of a machine learning model. The remainder of the multivariable calculus discusses double integrals, a crucial tool for fully grasping continuous probability distributions and related concepts.

On the probability and statistics side, students will unravel discrete and continuous random variables. They will familiarize themselves with probability density functions, random variable transformations, expectation, moments, and variance. Some important discrete and continuous probability distributions will be discussed in detail.

Students then extend their knowledge of random variables to include joint, marginal, and conditional probability distributions, sums and products of random variables, conditional expectations, and variances. Special attention will be given to combinations of normally distributed random variables and the bivariate normal distribution.

The statistics part of the course concludes with an in-depth study of parametric inference, exploring point estimation, maximum likelihood estimation, and hypothesis testing. Students will also learn to construct confidence intervals for various parameters, including means, proportions, variances, and regression coefficients.

Upon successful completion of this course, students will have mastered the following:
• Develop a solid understanding of sets, logical quantifiers, vector geometry, and hyperbolic functions.
• Master determinants, Gaussian elimination, and learn how to compute inverse matrices efficiently.
• Develop a thorough understanding of vector spaces, including bases, dimension, rank, nullity, and the rank-nullity theorem.
• Understand the concept of diagonalization and how to apply it.
• Develop knowledge of orthogonality in vector spaces, including orthogonal sets, complements, and orthogonal projections.
• Learn bilinear and quadratic forms, including positive-definite and negative-definite quadratic forms.
• Understand singular value decomposition and its applications, including the pseudoinverse matrix.
• Understand and apply principal component analysis, including its connection with singular value decomposition.
• Solve linear least-squares problems, both with and without collinearity.
• Perform linear, polynomial, and multiple linear regressions.
• Compute partial derivatives, gradient vectors, Jacobian matrices, Hessian matrices, and understand their geometric interpretations.
• Understand vector-valued functions and their properties.
• Extend differential calculus to maps between multi-dimensional vector spaces.
• Identify and graph a variety of standard multivariable surfaces in 3D space.
• Use Riemann sums to approximate volumes.
• Calculate double integrals over rectangular and non-rectangular domains.
• Understand and apply fundamental probability concepts such as the law of total probability and Bayes' theorem.
• Work with discrete and continuous random variables.
• Apply transformations to discrete and continuous random variables, and compute the associated CDFs and PDFs of these distributions.
• Understand and compute expectation and variance for discrete and continuous random variables.
• Understand and work with joint distributions for discrete and continuous random variables.
• Calculate expectation for joint distributions.
• Understand and compute with covariance of random variables.
• Work with normally distributed random variables and understand their combinations.
• Apply point estimation and maximum likelihood.
• Understand and apply hypothesis testing and confidence intervals to situations in context.
1.
Preliminaries
28 topics
1.1. Introduction to Set Theory
 1.1.1. Special Sets 1.1.2. Statements and Predicates 1.1.3. Equivalent Sets 1.1.4. The Constructive Definition of a Set 1.1.5. The Conditional Definition of a Set 1.1.6. Describing Sets Using Set-Builder Notation 1.1.7. Describing Planar Regions Using Set-Builder Notation 1.1.8. Subsets
1.2. Set Operations
 1.2.1. The Difference of Sets 1.2.2. Set Complements 1.2.3. The Cartesian Product 1.2.4. Visualizing Cartesian Products 1.2.5. Indexed Sets 1.2.6. Sets and Functions
1.3. Properties of Sets
 1.3.1. Cardinality of Finite Sets 1.3.2. Infinite Sets 1.3.3. Interior and Boundary Points 1.3.4. Interiors and Boundaries of Sets 1.3.5. Open and Closed Sets
1.4. Vector Geometry
 1.4.1. The Vector Equation of a Line 1.4.2. The Parametric Equations of a Line 1.4.3. The Cartesian Equation of a Line 1.4.4. The Vector Equation of a Plane 1.4.5. The Cartesian Equation of a Plane 1.4.6. The Parametric Equations of a Plane 1.4.7. The Intersection of Two Planes
1.5. The Hyperbolic Functions
 1.5.1. The Hyperbolic Functions 1.5.2. Graphs of the Hyperbolic Functions
2.
Matrices
26 topics
2.6. Determinants
 2.6.1. The Determinant of an NxN Matrix 2.6.2. Finding Determinants Using Laplace Expansions 2.6.3. Basic Properties of Determinants 2.6.4. Further Properties of Determinants 2.6.5. Row and Column Operations on Determinants 2.6.6. Conditions When a Determinant Equals Zero
2.7. Gaussian Elimination
 2.7.1. Systems of Equations as Augmented Matrices 2.7.2. Row Echelon Form 2.7.3. Solving Systems of Equations Using Back Substitution 2.7.4. Elementary Row Operations 2.7.5. Creating Rows or Columns Containing Zeros Using Gaussian Elimination 2.7.6. Solving 2x2 Systems of Equations Using Gaussian Elimination 2.7.7. Solving 2x2 Singular Systems of Equations Using Gaussian Elimination 2.7.8. Solving 3x3 Systems of Equations Using Gaussian Elimination 2.7.9. Identifying the Pivot Columns of a Matrix 2.7.10. Solving 3x3 Singular Systems of Equations Using Gaussian Elimination 2.7.11. Reduced Row Echelon Form 2.7.12. Gaussian Elimination For NxM Systems of Equations
2.8. The Inverse of a Matrix
 2.8.1. Finding the Inverse of a 2x2 Matrix Using Row Operations 2.8.2. Finding the Inverse of a 3x3 Matrix Using Row Operations 2.8.3. Matrices With Easy-to-Find Inverses 2.8.4. The Invertible Matrix Theorem in Terms of 2x2 Systems of Equations 2.8.5. Triangular Matrices
2.9. Affine Transformations
 2.9.1. Affine Transformations 2.9.2. The Image of an Affine Transformation 2.9.3. The Inverse of an Affine Transformation
3.
Vector Spaces
20 topics
3.10. Vectors in N-Dimensional Space
 3.10.1. Vectors in N-Dimensional Euclidean Space 3.10.2. Linear Combinations of Vectors in N-Dimensional Euclidean Space 3.10.3. Linear Span of Vectors in N-Dimensional Euclidean Space 3.10.4. Linear Dependence and Independence
3.11. Subspaces of N-Dimensional Space
 3.11.1. Subspaces of N-Dimensional Space 3.11.2. Subspaces of N-Dimensional Space: Geometric Interpretation 3.11.3. The Column Space of a Matrix 3.11.4. The Null Space of a Matrix
3.12. Bases of N-Dimensional Space
 3.12.1. Finding a Basis of a Span 3.12.2. Finding a Basis of the Column Space of a Matrix 3.12.3. Finding a Basis of the Null Space of a Matrix 3.12.4. Expressing the Coordinates of a Vector in a Given Basis 3.12.5. Writing Vectors in Different Bases 3.12.6. The Change-of-Coordinates Matrix 3.12.7. Changing a Basis Using the Change-of-Coordinates Matrix
3.13. Dimension and Rank in N-Dimensional Space
 3.13.1. The Dimension of a Span 3.13.2. The Rank of a Matrix 3.13.3. The Dimension of the Null Space of a Matrix 3.13.4. The Invertible Matrix Theorem in Terms of Dimension, Rank and Nullity 3.13.5. The Rank-Nullity Theorem
4.
Diagonalization of Matrices
12 topics
4.14. Eigenvectors and Eigenvalues
 4.14.1. The Eigenvalues and Eigenvectors of a 2x2 Matrix 4.14.2. Calculating the Eigenvalues of a 2x2 Matrix 4.14.3. Calculating the Eigenvectors of a 2x2 Matrix 4.14.4. The Characteristic Equation of a Matrix 4.14.5. Calculating the Eigenvectors of a 3x3 Matrix With Distinct Eigenvalues 4.14.6. Calculating the Eigenvectors of a 3x3 Matrix in the General Case
4.15. Diagonalization
 4.15.1. Diagonalizing a 2x2 Matrix 4.15.2. Diagonalizing a 3x3 Matrix With Distinct Eigenvalues 4.15.3. Diagonalizing a 3x3 Matrix in the General Case 4.15.4. Symmetric Matrices 4.15.5. Diagonalization of 2x2 Symmetric Matrices 4.15.6. Diagonalization of 3x3 Symmetric Matrices
5.
Orthogonality & Projections
17 topics
5.16. Inner Products
 5.16.1. The Dot Product in N-Dimensional Euclidean Space 5.16.2. The Norm of a Vector in N-Dimensional Euclidean Space 5.16.3. Introduction to Abstract Vector Spaces 5.16.4. Defining Abstract Vector Spaces 5.16.5. Inner Product Spaces
5.17. Orthogonality
 5.17.1. Orthogonal Vectors in Euclidean Spaces 5.17.2. The Cauchy-Schwarz Inequality and the Angle Between Two Vectors 5.17.3. Orthogonal Complements 5.17.4. Orthogonal Sets in Euclidean Spaces 5.17.5. Orthogonal Matrices 5.17.6. Orthogonal Linear Transformations
5.18. Orthogonal Projections
 5.18.1. Projecting Vectors Onto One-Dimensional Subspaces 5.18.2. The Components of a Vector with Respect to an Orthogonal or Orthonormal Basis 5.18.3. Projecting Vectors Onto Subspaces in Euclidean Spaces (Orthogonal Bases) 5.18.4. Projecting Vectors Onto Subspaces in Euclidean Spaces (Arbitrary Bases) 5.18.5. Projecting Vectors Onto Subspaces in Euclidean Spaces (Arbitrary Bases): Applications 5.18.6. The Gram-Schmidt Process for Two Vectors
6.
Singular Value Decomposition
12 topics
 6.19.1. Bilinear Forms 6.19.2. Quadratic Forms 6.19.3. Change of Variables in Quadratic Forms 6.19.4. Positive-Definite and Negative-Definite Quadratic Forms 6.19.5. Constrained Optimization of Quadratic Forms 6.19.6. Constrained Optimization of Quadratic Forms: Determining Where Extrema are Attained
6.20. Singular Value Decomposition
 6.20.1. The Singular Values of a Matrix 6.20.2. Computing the Singular Values of a Matrix 6.20.3. Singular Value Decomposition of 2x2 Matrices 6.20.4. Singular Value Decomposition of 2x2 Matrices With Zero or Repeated Eigenvalues 6.20.5. Singular Value Decomposition of Larger Matrices 6.20.6. Singular Value Decomposition and the Pseudoinverse Matrix
7.
Applications of Linear Algebra
8 topics
7.21. Principal Component Analysis
 7.21.1. Introduction to Principal Component Analysis 7.21.2. Computing Principal Components 7.21.3. The Connection Between PCA and SVD
7.22. Linear Least-Squares Problems
 7.22.1. The Least-Squares Solution of a Linear System (Without Collinearity) 7.22.2. The Least-Squares Solution of a Linear System (With Collinearity)
7.23. Linear Regression
 7.23.1. Linear Regression 7.23.2. Polynomial Regression 7.23.3. Multiple Linear Regression
8.
Multivariable Calculus
42 topics
 8.24.1. Ellipsoids 8.24.2. Hyperboloids 8.24.3. Paraboloids 8.24.4. Elliptic Cones 8.24.5. Cylinders 8.24.6. Identifying Quadric Surfaces
8.25. Partial Derivatives
 8.25.1. The Domain of a Multivariable Function 8.25.2. Level Curves 8.25.3. Limits and Continuity of Multivariable Functions 8.25.4. Introduction to Partial Derivatives 8.25.5. Computing Partial Derivatives Using the Rules of Differentiation 8.25.6. Geometric Interpretations of Partial Derivatives 8.25.7. Partial Differentiability of Multivariable Functions 8.25.8. Higher-Order Partial Derivatives 8.25.9. Equality of Mixed Partial Derivatives 8.25.10. Tangent Planes to Surfaces 8.25.11. Linearization of Multivariable Functions 8.25.12. The Multivariable Chain Rule
8.26. Vector-Valued Functions
 8.26.1. The Domain of a Vector-Valued Function 8.26.2. Tangent Vectors and Tangent Lines to Curves 8.26.3. The Gradient Vector 8.26.4. Directional Derivatives 8.26.5. The Multivariable Chain Rule in Vector Form
8.27. Differentiation
 8.27.1. The Jacobian 8.27.2. The Inverse Function Theorem 8.27.3. The Jacobian of a Three-Dimensional Transformation 8.27.4. The Derivative of a Multivariable Function 8.27.5. The Second Derivative of a Multivariable Function 8.27.6. Second-Degree Taylor Polynomials of Multivariable Functions
8.28. Approximating Volumes With Riemann Sums
 8.28.1. Partitions of Intervals 8.28.2. Calculating Double Summations Over Partitions 8.28.3. Approximating Volumes Using Lower Riemann Sums 8.28.4. Approximating Volumes Using Upper Riemann Sums 8.28.5. Lower Riemann Sums Over General Rectangular Partitions 8.28.6. Upper Riemann Sums Over General Rectangular Partitions 8.28.7. Defining Double Integrals Using Lower and Upper Riemann Sums
8.29. Double Integrals
 8.29.1. Double Integrals Over Rectangular Domains 8.29.2. Double Integrals Over Non-Rectangular Domains 8.29.3. Properties of Double Integrals 8.29.4. Type I and II Regions in Two-Dimensional Space 8.29.5. Double Integrals Over Type I Regions 8.29.6. Double Integrals Over Type II Regions
9.
Probability & Random Variables
37 topics
9.30. Probability
 9.30.1. The Law of Total Probability (Extended) 9.30.2. Bayes' Theorem 9.30.3. Extending Bayes' Theorem
9.31. Random Variables
 9.31.1. Probability Density Functions of Continuous Random Variables 9.31.2. Calculating Probabilities With Continuous Random Variables 9.31.3. Continuous Random Variables Over Infinite Domains 9.31.4. Cumulative Distribution Functions for Continuous Random Variables 9.31.5. Approximating Discrete Random Variables as Continuous 9.31.6. Simulating Random Observations
9.32. Transformations of Random Variables
 9.32.1. One-to-One Transformations of Discrete Random Variables 9.32.2. Many-to-One Transformations of Discrete Random Variables 9.32.3. The Distribution Function Method 9.32.4. The Change-of-Variables Method for Continuous Random Variables 9.32.5. The Distribution Function Method With Many-to-One Transformations
9.33. Expectation
 9.33.1. Expected Values of Discrete Random Variables 9.33.2. Properties of Expectation for Discrete Random Variables 9.33.3. Moments of Discrete Random Variables 9.33.4. Variance of Discrete Random Variables 9.33.5. Properties of Variance for Discrete Random Variables 9.33.6. Expected Values of Continuous Random Variables 9.33.7. Moments of Continuous Random Variables 9.33.8. Variance of Continuous Random Variables 9.33.9. The Rule of the Lazy Statistician
9.34. Discrete Probability Distributions
 9.34.1. The Bernoulli Distribution 9.34.2. Mean and Variance of the Binomial Distribution 9.34.3. The Discrete Uniform Distribution 9.34.4. Modeling With Discrete Uniform Distributions 9.34.5. Mean and Variance of Discrete Uniform Distributions 9.34.6. The Poisson Distribution 9.34.7. Modeling With the Poisson Distribution
9.35. Continuous Probability Distributions
 9.35.1. The Continuous Uniform Distribution 9.35.2. Mean and Variance of Continuous Uniform Distributions 9.35.3. Modeling With Continuous Uniform Distributions 9.35.4. The Gamma Function 9.35.5. The Chi-Square Distribution 9.35.6. The Student's T-Distribution 9.35.7. The Exponential Distribution
10.
Combining Random Variables
29 topics
10.36. Distributions of Two Discrete Random Variables
 10.36.1. Double Summations 10.36.2. Joint Distributions for Discrete Random Variables 10.36.3. Marginal Distributions for Discrete Random Variables 10.36.4. Independence of Discrete Random Variables 10.36.5. Conditional Distributions for Discrete Random Variables 10.36.6. The Joint CDF of Two Discrete Random Variables
10.37. Distributions of Two Continuous Random Variables
 10.37.1. Joint Distributions for Continuous Random Variables 10.37.2. Marginal Distributions for Continuous Random Variables 10.37.3. Independence of Continuous Random Variables 10.37.4. Conditional Distributions for Continuous Random Variables 10.37.5. The Joint CDF of Two Continuous Random Variables 10.37.6. Properties of the Joint CDF of Two Continuous Random Variables
10.38. Expectation for Joint Distributions
 10.38.1. Expected Values of Sums and Products of Random Variables 10.38.2. Variance of Sums of Independent Random Variables 10.38.3. Computing Expected Values From Joint Distributions 10.38.4. Conditional Expectation for Discrete Random Variables 10.38.5. Conditional Variance for Discrete Random Variables 10.38.6. Conditional Expectation for Continuous Random Variables 10.38.7. Conditional Variance for Continuous Random Variables 10.38.8. The Rule of the Lazy Statistician for Two Random Variables
10.39. Covariance of Random Variables
 10.39.1. The Covariance of Two Random Variables 10.39.2. Variance of Sums of Random Variables 10.39.3. The Correlation Coefficient for Two Random Variables 10.39.4. The Covariance Matrix
10.40. Normally Distributed Random Variables
 10.40.1. Normal Approximations of Binomial Distributions 10.40.2. Combining Two Normally Distributed Random Variables 10.40.3. Combining Multiple Normally Distributed Random Variables 10.40.4. I.I.D Normal Random Variables 10.40.5. The Bivariate Normal Distribution
11.
Parametric Inference
29 topics
11.41. Point Estimation
 11.41.1. The Sample Mean 11.41.2. Statistics and Sampling Distributions 11.41.3. Variance of Sample Means 11.41.4. The Sample Variance 11.41.5. Sample Means From Normal Populations 11.41.6. The Central Limit Theorem 11.41.7. Sampling Proportions From Finite Populations 11.41.8. Point Estimates of Population Proportions 11.41.9. The Sample Covariance Matrix
11.42. Maximum Likelihood
 11.42.1. Product Notation 11.42.2. Logarithmic Differentiation 11.42.3. Likelihood Functions for Discrete Probability Distributions 11.42.4. Log-Likelihood Functions for Discrete Probability Distributions 11.42.5. Likelihood Functions for Continuous Probability Distributions 11.42.6. Log-Likelihood Functions for Continuous Probability Distributions 11.42.7. Maximum Likelihood Estimation
11.43. Hypothesis Testing
 11.43.1. One-Tailed Hypothesis Tests 11.43.2. Two-Tailed Hypothesis Tests 11.43.3. Type I and Type II Errors in Hypothesis Testing 11.43.4. Hypothesis Tests for One Mean: Known Population Variance 11.43.5. Hypothesis Tests for One Mean: Unknown Population Variance 11.43.6. Hypothesis Tests for Two Means: Known Population Variances
11.44. Confidence Intervals
 11.44.1. Confidence Intervals for One Mean: Known Population Variance 11.44.2. Confidence Intervals for One Mean: Unknown Population Variance 11.44.3. Confidence Intervals for Proportions 11.44.4. Confidence Intervals for Two Means: Known and Unequal Population Variances 11.44.5. Confidence Intervals for Variances 11.44.6. Confidence Intervals for Slope Parameters in Linear Regression 11.44.7. Confidence Intervals for Intercept Parameters in Linear Regression