## Probability Theory and Statistics

Some of the fundamental Statistical and Probability Theory needed for Machine Learning are:

- Probability Rules & Axioms
- Bayes’ Theorem
- Prior and Posterior
- Random Variables
- Variance and Expectation
- Combinatorics
- Probability distributions
- Conditional and Joint Distributions
- Standard Distributions (Bernoulli, Binomial, Multinomial, Uniform and Gaussian)
- Moment Generating Functions
- Maximum Likelihood Estimation (MLE) and Maximum a Posteriori Estimation (MAPs)
- Sampling Methods

## Linear Algebra

Linear algebra is a cornerstone because everything in machine learning is a **vector** or a **matrix**.

- Dot products
- Distance
- Rank and inversion
- Eigenvalues and eigenvectors
- Symmetric Matrices
- Orthogonalization & Orthonormalization
- Matrix factorization can be useful in terms of dimensional reduction of feature space like SVD, Latent Semantic Analysis, Non-negative Matrix Factorization, Principal Component Analysis (PCA), Eigendecomposition of a matrix, LU Decomposition, QR Decomposition/Factorization and etc.

## Multivariate Calculus

Calculus is also important in order to understand the learning algorithms, the optimization process, how the error and learning rate is used to minimize the error generated by the cost function at each iteration.

- Differentiation matters because of
**gradient descent**. Gradient descent is almost everywhere. It found its way even into the tree domain in the form of gradient boosting – a gradient descent in function space. - Integral Calculus
- Partial Derivatives
- Vector-Values Functions
- Directional Gradient
- Hessian
- Jacobian
- Laplacian and Lagragian Distribution

## Algorithms and Complex Optimizations

This is important for understanding the computational efficiency and scalability of our Machine Learning Algorithm and for exploiting sparsity in our datasets. Knowledge of data structures (Binary Trees, Hashing, Heap, Stack etc), Dynamic Programming, Randomized & Sublinear Algorithm, Graphs, Gradient/Stochastic Descents and Primal-Dual methods are needed.

## Statistics

Highly recommend learning statistics with a heavy focus on coding up examples, preferably in Python or R.

## Others

This comprises of other Math topics not covered in the four major areas described above. They include Real and Complex Analysis (Sets and Sequences, Topology, Metric Spaces, Single-Valued and Continuous Functions, Limits), Information Theory (Entropy, Information Gain), Function Spaces and Manifolds.

## References

- Matrix Cookbook
- Linear Algebra Lecture By Dr. Strang
- Application of Linear Algebra – Part 1
- Application fo Linear Algebra – Part 2
- Linear Algebra Problems and Methods
- Deep Learning Book
- Khan Academy’s Linear Algebra, Probability & Statistics, Multivariable Calculus and Optimization.
- Coding the Matrix: Linear Algebra through Computer Science Applications by Philip Klein, Brown University
- Linear Algebra – Foundations to Frontiers by Robert van de Geijn, University of Texas.
- Joseph Blitzstein – Harvard Stat 110 lectures
- Larry Wasserman’s book – All of statistics: A Concise Course in Statistical Inference
- Boyd and Vandenberghe’s course on Convex optimisation from Stanford
- Linear Algebra – Foundations to Frontiers on edX.
- Udacity’s Introduction to Statistics.
- Coursera/Stanford’s Machine Learning course by Andrew Ng.
- The real prerequisite for machine learning isn’t math, it’s data analysis

## Connect with us