Institute for Quantitative Social Science, Harvard University
Konstantin Kashin is a Fellow at the Institute for Quantitative Social Science at Harvard University and will be joining Facebook's Core Data Science group in September 2015. Konstantin develops new statistical methods for diverse applications in the social sciences, with a focus on causal inference, text as data, and Bayesian forecasting. He holds a PhD in Political Science and an AM in Statistics from Harvard University.
I've finally updated and uploaded a detailed note on maximum likelihood estimation, based in part on material I taught in Gov 2001. It is available in full [here](http://www.konstantinkashin.com/notes/stat/Maximum_Likelihood_Estimation.pdf).
To summarize the note without getting into too much math, let's first define the likelihood as proportional to the joint probability of the data conditional on the parameter of interest ($\theta$):
$$L(\theta|\mathbf{x}) \propto f(\mathbf{x}|\theta) = \prod\limits_{i=1}^n f(x_i|\theta)$$
The maximum likelihood estimate (MLE) of $\theta$ is the value of $\theta$ in the parameter space $\Omega$ that maximizes the likelihood function:
$$\hat{\theta}_{MLE} = \max_{\theta \in \Omega} L(\theta|\mathbf{x}) = \max_{\theta \in \Omega} \prod\limits_{i=1}^n f(x_i|\theta)$$
This turns out to be equivalent to maximizing the log-likelihood function (which is often simpler):
$$\hat{\theta}_{MLE} = \max_{\theta \in \Omega} \log L(\theta|\mathbf{x}) = \max_{\theta \in \Omega} \ell (\theta|\mathbf{x}) = \max_{\theta \in \Omega} \sum\limits_{i=1}^n \log (f(x_i|\theta))$$