Mathematics of Big Data

Instructor Prof. Gu:

Office: Shan 3481
Email: gu@g.hmc.edu
Phone: ext. 1-8929
Office Hours: 3-4 pm PST Tuesday; also by appointment.

TA/Tutoring Hours:

Names: Ian Li, Teja Reddy, Zoe Shao
Email: math189bigdata@gmail.com
Tutoring Hours:
Ian Li: email me at ili@hmc.edu with potential meeting time and/or specific questions/topics. Zoe Shao: email me at zoshao@hmc.edu with potential meeting time and/or specific questions/topics. Teja Reddy: email me at treddy@hmc.edu with potential meeting time and/or specific questions/topics.

Textbook:

All members of the class will be required to obtain the following text:
Kevin Patrick Murphy, Machine Learning: a Probabilistic Perspective . MIT Press, 2012.

Grading:

● 5% Reading Summary
● 35% Homework
● 20% Midterm Project
● 40% Final Project
● [Up to 5% Extra Credit]

Course Requirements and Evaluation:

Reading Presentations
All readings are compulsory, but some are more compulsory than others.
To encourage the goal of reading active research in the field, we will assign each non-Murphy reading to a group of two students who will write a summary of 1-2 pages to be turned in at the start of class. Each student will do approxiamately two summaries in total. They must be clear and demonstrate that you have read the paper with a high degree of confidence. Credit will be given on a 0-10 scale for each summary. Your summary should be done at a high level, and should focus on the main point of the readings (i.e. avoid complicated math). As long as your summary is reasonable, you will be given full credit.

Homework
The homework is due every week at the beginning of each lecture. There will be two parts for each assignment: math and coding. The homework is split approximately evenly between mathematical analysis and extension of our course material and application of algorithms to real world data.

For coding: You are highly recommended to use Python3. For each problem, the starter code and the sample solution are implemented in Python3. All the results and graphs for the sample solutions were produced under Python 3.5.2 under macOS Sierra; different versions of Python or system environment may produce different results. You are also welcome to use Jupyter Notebooks, but the starter code is not provided in notebook format.

Numpy and Pandas are two important python libraries to know for coding assignment for this course. You might also want to look at Matplotlib for generating plots. If you never used these libraries before, make sure you check out the tutorials online before starting the first assignment.

Note:
1) When doing the coding problem for each homework set, you are not allowed to use any machine learning algorithms implemented by external libraries, such as LinearRegression in sklearn. However, you may use these algorithms in your final project.

2) Each homework has both pdf and tex versions. To have the tex files successfully compiled, make sure that you have downloaded both macros.tex and hmcpset.cls and put them and the hw tex file under same folder.
If you have any questions with regard to the compilation of the tex files, feel free to ask the grutors for help.

3) For each coding problem, please submit your code to GitHub; please print out any graph or printing statements and submit them with the written part.

Midterm
The midterm will either be a take-home exam covering all topics seen in the first week of the course or a project where you will apply the methods learned in the first half of the course (TBD).

Final Project
The final project is the largest component of the course. Each student will discover, explore, and attack a real world problem of your choosing. The detailed description and requirements for the final project can be found under the "Final Project" tab.

GitHub
As we stated in the course overview, students are expected to become comfortable with Github. Hence, each student is required to create a Github account for coding assignment submission and final project submission. If you already have a Github account, that's perfect. If not, please create a personal Github account and go over the tutorials online.
Note: Please make sure to send the username of your Github account to TA for homework grading.

Classroom Policies:

Attendence
Attendence for each lecture is mandatory and is expected of all class members. if you're going to miss a lecture, it is neccessary for you to inform the instructor as soon as possible. You are also responsible for obtaining notes from another class member.

Devices
You are welcome to use your computer or tablet for note-taking (the PowerPoint slides will also be posted shortly after the lecture for your convenience).

Diabilities:

Students who need disability-related accommodations are encouraged to discuss this with the instructor as soon as possible.