CAS 992: Big Data

Fall 2013

Instructor: Professor Rick Wash
Email: wash@msu.edu
Meets: Thursdays, 9:00-11:50 in CAS 025
Office Hours: By appointment in room CAS 342

Course Description

This course is a research methods course that covers a new and increasingly popular method of conducting social science research: large scale data analysis. The advent of the Internet has enabled social scientists to have access to extremely large datasets about the behavior of millions (or billions) of people. However, collecting and analyzing this data isn’t straightforward and requires specific skills. The goal of this course is to expose PhD students to the skills required for this type of research, help them to understand both the challenges and the opportunities available, and help them to understand what good big data research is.

Course Structure and Schedule

The high level goal of this class is to teach students how to do big data research. To achieve that goal, the class has been divided into two halves.

The first half of the class will be focused on developing three specific technical skills: data storage in SQL databases, data manipulation and parsing using python, and using existing data to create meaningful measures of phenomena. During this half of class, there will be weekly technical assignments designed to practice and develop these skills. Hopefully these assignments won’t take long, but they will give you an opportunity to put these skills into practice.

The second half of the class will be focused on understanding how to do research based on big data, using the skills developed in the first half of the class. Each class we will discuss a different topic or challenge that comes up when doing big data research. Every week we will read and discuss one or two “big data” research papers. Also, each student will spend the second half of the semester doing their own big data research project, and each week one or two students will do a “work in progress” discussion with the rest of the class.

Bring your laptop computer (if you have one) to class if at all possible.

The weekly schedule of topics is available.

Assignments and Grading

For a PhD class, there are a surprisingly large number of assignments. However, I will work to keep the assignments short and sweet to allow you to focus on learning the skills necessary to do big data research. The assignments will all be linked from the schedule of topics.

The weekly technical assignments in the first half of the class will each be graded on a check / check plus / check minus scale (rought 3.5, 4.0, and 3.0). Together they will total 40% of your final grade.

The final research paper – really, a detailed, work-in-progress proposal – will be 50% of your final grade. 10% will be your in-class presentation, and the remaining 40% will be the final paper that is turned in.

The last 10% of your grade will be class participation, particularly during the discussions of the readings and providing feedback during other students’ presentations.

Readings

The following are some of the papers we will read (this list is subject to change):

All of these papers can be found through the MSU library. For many of them, the links above take you to the main website of the publication. However, I have created a bookmarklet called “Proxify” that automatically redirects a webpage through the MSU Library proxy, allowing you to get access to all of these papers for free. Drag this Proxify link to your bookmarks folder. Then when you are viewing the page of one of the papers, click the link to redirect through the proxy.

There is no textbook for the course. There will be a number of online lecture notes made available to students.

Procedures

Attendance: I don’t take attendance, and it isn’t part of your grade. That said, I think you will find it difficult to learn the material in the class if you do not attend regularly. Please try to attend all classes so you can learn the skills and participate in the discussions the benefit everyone in the class. Doctoral students who have finished their coursework are permitted to sit in on individual classes that cover topics they wish to learn or need a refresher on.

Expectations: I expect all students to be familiar with the documents related to this class on this website, and to be aware of all assignments and responsibilities. Students are also responsible for knowing all announcements in class and over email.

Assignments: Assignments can be submitted via email to the instructor. Assignments will be deducted one letter grade for each day late; in other words turn in your assignment on time. Of course, if negotiated in advance, reasonable exceptions may be granted by the professor.

Academic Dishonesty: Michigan State University and the Department of Telecommunications, Information Studies, and Media both have policies about academic dishonesty. Basically, make sure that everything you turn in with your name on it is your own work, and don’t cheat or lie. If it feels like cheating, it probably is; if you are unsure please ask. Students caught cheating or plagiarizing will receive a 0 for the assignment and be reported to the university. Working together with other students in this class and other classes, however, is encouraged. Make sure that everything you turn in with your name on it is original work of yours.

For classes that involve programming like this, I strongly encourage you to work together and ask each other for help. Often when you have a problem or a nasty bug, the best place to go for help is your colleagues who are also working on similar code. Also, the Internet is a fantastic source of information when you are stuck. Use these resources copiously. However, make sure that you personally write and understand all of the code that you turn in. Directly copying code that you don’t understand from the Internet or from others is both academically dishonest, and will make your application very difficult to debug.

Accommodations for Disabilities: If you have a documented disability from the Resource Center for Persons with Disabilities and wish to discuss academic accommodations, please contact the professor by the end of the second week of class.

Religious Holidays: You may make up course work missed to observe a major religious holiday only if you make arrangements in advance with the instructor.

Required Activity: To make up course work missed to participate in a university-sanctioned event, you must provide the instructor with adequate advance notice and written authorization from a university administrator.