This class is a doctoral level methods class. As such, my primary goal is to help students become familiar with doing research that involves large scale, real-world data. There are two features of this type of data that are relevant here: datasets that are large enough that going through them by hand to process them is infeasible, and data which comes from real-world setting where the researcher has to figure out what each variable means rather than constructing his or her own variables.
I have two high-level goals for this class. First, students should understand how to deal with collecting, parsing, munging, and storing large scale data. Most students should be able to do this work themselves by the end of the class. Everyone should understand what work needs to be done, and be able to guide others who have the appropriate technical skills in doing the work. Second, students should understand how to examine variables from the real world, construct new variables by recombining existing data, and determine what kinds of questions can be answered by the data they have.