Sets and Quantifiers
Formal mathematical notation is often used in machine learning textbooks and papers. While formal symbols are relatively simple to learn through direct instruction, their meaning can be difficult to pick up from context clues, causing them to become a source of bewilderment and intimidation if not understood beforehand.
- Construct sets using set-builder notation and demonstrate fluency with set operations and terminology.
- Write and interpret functions using arrow notation.
- Translate between formal and informal language using quantifiers.
Hyperbolic tangent is a common activation function in the context of neural networks.
- Evaluate and graph hyperbolic functions.
Determinants, Gaussian Elimination, and Subspace Projection
Many algorithms in machine learning rely on advanced matrix methods that in turn rely on more foundational linear algebra topics. For example, principal component analysis involves finding eigenvalues and eigenvectors, which requires the use of determinants and gaussian elimination (respectively). Likewise, fitting a linear regression model involves using subspace projection to project the desired outputs onto the subspace of outputs that could possibly be generated by the model.
- Compute the determinant of an NxN matrix using Laplace expansions, and use properties of the determinant to simplify calculations.
- Use Gaussian elimination to solve systems of linear equations and compute inverse matrices.
- Find the projection of a vector onto a subspace formed by a span of other vectors.
Eigenvalues and Singular Values
Machine learning systems such as recommender systems often utilize latent factor models to identify the most prominent patterns underlying individual records in a data set, sometimes with the additional goal of reducing the complexity of the data by discarding negligible patterns. This is often accomplished using advanced techniques from linear algebra: the eigenvectors of a square matrix represent independent patterns, their eigenvalues represent the prominence of those patterns, and singular values generalize the idea of eigenvalues to rectangular matrices.
- Understand eigenvalues/eigenvectors geometrically, calculate them algebraically, and use them to diagonalize a matrix.
- Compute and interpret properties of quadratic forms including definiteness and principal axes.
- Compute the singular values of a matrix, understand their relationship to the constrained optimization of quadratic forms, and find the singular value decomposition of a matrix.
Inner Product Spaces
Support vector machines classify data into two classes by drawing the best possible boundary line between the classes. Even when data is not linearly separable, a kernel function can be used to map the data into an inner product space where it becomes linearly separable. This is known as the “kernel trick” and it tends to baffle those who are not familiar with inner product spaces. Knowledge of inner product spaces can also help provide intuition for similarity measures in clustering algorithms (similarity measures are conceptually opposite to inner products).
- Compute dot products, norms, and distances between vectors in N-dimensional Euclidean space.
- Extend the concept of the dot product to the more general concept of an inner product, and use the inner product to compute norms and distances between vectors in abstract vector spaces.
Gradient descent, the most popular family of optimization methods in machine learning, involves computing partial derivatives of multivariable functions. Likewise, in order to extend methods from probability and statistics to multivariate distributions (which are used in e.g. Gaussian mixture models), one must integrate multivariable functions. Finally, the concept of a “hyperplane” (which comes up frequently in the context of classification algorithms) can feel nebulous if one is not already familiar with equations of planes in 3D space.
- Construct equations of lines and planes in 3D space.
- Extend prior knowledge of single-variable derivative rules to compute partial derivatives of multivariable functions, including the chain rule.
- Compute the gradient of a multivariable function and interpret it geometrically as representing direction and magnitude of the function’s greatest rate of increase.
- Extend prior knowledge of single-variable integrals to evaluate double integrals using the fundamental theorem of calculus.
Random Variables and Distributions
To understand the advanced probability topics that appear in machine learning, one must be able to manipulate random variables and distributions. For example, the covariance matrix of multiple random variables is central to principal component analysis because the principal components are themselves the eigenvectors of the covariance matrix. Likewise, uniform and normal distributions are frequently used during parameter initialization – and the multivariate normal distribution is the central figure in the Gaussian mixture model, a popular clustering algorithm.
- Combine prior knowledge of discrete random variables and integration to compute the probability, mean, and variance of a continuous random variable.
- Generalize prior knowledge of univariate probability distributions to joint distributions.
- Compute the mean, variance, and covariance matrix for a given sample of observations.
- Demonstrate fluency with the uniform distribution and the multivariate normal distribution.
Conditional Probability and Likelihood Functions
Conditional probability and likelihood functions are central to machine learning models and algorithms such as naive Bayes classifiers and the expectation maximization algorithm (which is commonly used to fit Gaussian mixture models).
- Apply Bayes’ theorem to compute conditional probabilities and solve problems in real-world context.
- Combine Bayes’ theorem with knowledge of probability distributions to conceptualize, compute, and apply marginal distributions and conditional distributions.
- Understand the concept of a point estimator and what it means for an estimator to be unbiased.
- Compute likelihood functions and fit probability models to data using maximum likelihood estimation.
Hypothesis Testing and Regression
Many machine learning models (e.g. logistic regression and lasso regression) are extensions of linear regression. Confidence intervals are often used to place bounds on the uncertainty of a model’s predictions or parameters. Models are usually trained on a sample of a population, and hypothesis testing can be used to determine whether there is sufficient evidence to draw a conclusion about the population as a whole.
- Carry out z-tests and t-tests: formulate null and alternative hypotheses, compute p-values, and accept or reject the null hypothesis at a desired level of significance.
- Identify type I and type II errors and their consequences in modeling contexts.
- Apply subspace projection to fit linear, polynomial, and multiple linear regression models to data.
- Construct confidence intervals for statistical quantities including linear regression coefficients.