Institute for Quantitative Social Science, Harvard University
Konstantin Kashin is a Fellow at the Institute for Quantitative Social Science at Harvard University and will be joining Facebook's Core Data Science group in September 2015. Konstantin develops new statistical methods for diverse applications in the social sciences, with a focus on causal inference, text as data, and Bayesian forecasting. He holds a PhD in Political Science and an AM in Statistics from Harvard University.
I've finally updated and uploaded a detailed note on maximum likelihood estimation, based in part on material I taught in Gov 2001. It is available in full [here](http://www.konstantinkashin.com/notes/stat/Maximum_Likelihood_Estimation.pdf).
To summarize the note without getting into too much math, let's first define the likelihood as proportional to the joint probability of the data conditional on the parameter of interest ($\theta$):
$$L(\theta|\mathbf{x}) \propto f(\mathbf{x}|\theta) = \prod\limits_{i=1}^n f(x_i|\theta)$$
The maximum likelihood estimate (MLE) of $\theta$ is the value of $\theta$ in the parameter space $\Omega$ that maximizes the likelihood function:
$$\hat{\theta}_{MLE} = \max_{\theta \in \Omega} L(\theta|\mathbf{x}) = \max_{\theta \in \Omega} \prod\limits_{i=1}^n f(x_i|\theta)$$
This turns out to be equivalent to maximizing the log-likelihood function (which is often simpler):
$$\hat{\theta}_{MLE} = \max_{\theta \in \Omega} \log L(\theta|\mathbf{x}) = \max_{\theta \in \Omega} \ell (\theta|\mathbf{x}) = \max_{\theta \in \Omega} \sum\limits_{i=1}^n \log (f(x_i|\theta))$$
This post briefly sketches out the types of bootstrapped confidence intervals commonly used, along with code in R for how to calculate them from scratch. Specifically, I focus on nonparametric confidence intervals. The post is structured around the list of bootstrap confidence interval methods provided by Canty et al. (1996). This is just a quick introduction into the world of bootstrapping - for an excellent R package for doing all sorts of bootstrapping, see the [boot package](http://cran.r-project.org/web/packages/boot/boot.pdf) by Brian Ripley.
A graphical approach to displaying regression coefficients / effect sizes across multiple specifications can often be significantly more powerful and intuitive than presenting a regression table. Moreover, we can easily express uncertainty in the form of confidence intervals around our estimates. As a quick example, suppose that we wanted to compare the effect of British colonial status upon country-level corruption across multiple specifications and two methods (OLS and WLS) from the following paper: Treisman, Daniel. 2000. "The causes of corruption: a cross-national study," *Journal of Public Economics* 76: 399-457.
Following up on the previous post, another way to construct DAGs is using R. I think the [igraph package](http://cran.r-project.org/web/packages/igraph/index.html) is one of the customizable ways to do so. This is a powerful package designed for the visualization and analysis of networks and offers much more functionality than you will use for DAGs.
[rgdal](http://cran.r-project.org/web/packages/rgdal/index.html) provides an interface between R and the [GDAL](http://www.gdal.org/)/[OGR](http://www.gdal.org/ogr/) library, which provides extensive support for a variety of geospatial formats. It is extremely useful for data import and export tasks, particularly because it can read projection information (from .prj files). However, to use rgdal, one must install GDAL and other frameworks on your system first. This is a guide for how to install rgdal on a Mac.
In this post, I'm going to go through how to make plots of distributions (either density plots or histograms) in ggplot2. I'm going to draw upon examples of Fisherian testing in the context of causal inference, but the examples should be completely understandable without knowledge of Fisher's approach to inference.
Here's a quick example of plotting histograms next to one another in `ggplot2`. I wanted to plot the estimated propensity scores for treated and control units for the [Lalonde non-experimental data](/data/dta.nooutcome.RData).
One of the occasionally annoying features of R and thus `ggplot2` is dealing with factors. In this post, I'll go through how to handle ordering of factors in `ggplot2` and the manual assignment of colors to those categories.