Mathematics of Big Data
Instructor Prof. Gu:
Office: Shan 3481
Email: gu@g.hmc.edu
Phone: ext. 1-8929
Office Hours: 3-4 pm PST Tuesday; also by appointment.
TA/Tutoring Hours:
Names: Ian Li, Teja Reddy, Zoe Shao
Email: math189bigdata@gmail.com
Tutoring Hours:
Ian Li: email me at ili@hmc.edu with potential meeting time and/or specific questions/topics.
Zoe Shao: email me at zoshao@hmc.edu with potential meeting time and/or specific questions/topics.
Teja Reddy: email me at treddy@hmc.edu with potential meeting time and/or specific questions/topics.
Textbook:
All members of the class will be required to obtain the
following text:
Kevin Patrick Murphy,
Machine Learning: a Probabilistic Perspective
. MIT Press, 2012.
Grading:
● 5% Reading Summary
● 35% Homework
● 20% Midterm Project
● 40% Final Project
● [Up to 5% Extra Credit]
Course Requirements and Evaluation:
● Reading Presentations
All readings are compulsory, but some are more compulsory
than others.
To encourage the goal of reading active research in
the field, we will assign each non-Murphy reading to a group of
two students who will write a summary of 1-2 pages
to be turned in at the start of class.
Each student will do approxiamately
two summaries in total. They must be clear and
demonstrate that you have read the
paper with a high degree of confidence. Credit will be
given on a 0-10 scale for each summary. Your summary
should be done at a high level, and should focus on
the main point of the readings (i.e. avoid complicated
math). As long as your summary is reasonable, you will
be given full credit.
● Homework
The homework is due every week at the beginning of each lecture.
There will be two parts for each assignment: math and coding.
The homework is split approximately evenly between
mathematical analysis and extension of our course material
and application of algorithms to real world data.
For coding: You are highly recommended to use Python3. For each problem, the starter code
and the sample solution are implemented in Python3. All the results and graphs for the sample
solutions were produced under Python 3.5.2 under macOS Sierra; different versions of
Python or system environment may produce different results. You are also welcome to use Jupyter
Notebooks, but the starter code is not provided in notebook format.
Numpy and Pandas
are two important python libraries to know for coding assignment for this
course. You might also want to look at Matplotlib for
generating plots.
If you never used these libraries before, make sure you
check out the tutorials online before starting the first assignment.
Note:
1) When doing the coding problem for each homework set, you are not allowed to use
any machine learning algorithms implemented by external libraries, such as LinearRegression
in sklearn. However, you may use these algorithms in your final project.
2) Each homework has both pdf and tex versions. To have the tex files successfully compiled,
make sure that you have downloaded both macros.tex
and hmcpset.cls and put them
and the hw tex file under same folder.
If you have any questions with regard to the compilation of
the tex files, feel free to ask the grutors for help.
3) For each coding problem, please submit your code to GitHub; please print out any graph or
printing statements and submit them with the written part.
● Midterm
The midterm will either be a take-home exam covering all
topics seen in the first week of the course or a project
where you will apply the methods learned in the first half
of the course (TBD).
● Final Project
The final project is the largest component of the course.
Each student will discover, explore, and attack a real world
problem of your choosing. The detailed description and requirements
for the final project can be found under the "Final Project" tab.
● GitHub
As we stated in the course overview, students are expected to
become comfortable with Github. Hence, each student is required
to create a Github account for coding assignment submission and
final project submission. If you already have a Github account,
that's perfect. If not, please create a personal Github account
and go over the tutorials online.
Note: Please make sure to send the username of your Github
account to TA for homework grading.
Classroom Policies:
● Attendence
Attendence for each lecture is mandatory and is expected of all
class members. if you're going to miss a lecture, it is neccessary
for you to inform the instructor as soon as possible. You are also
responsible for obtaining notes from another class member.
● Devices
You are welcome to use your computer or tablet for note-taking
(the PowerPoint slides will also be posted shortly after the
lecture for your convenience).
Diabilities:
Students who need disability-related accommodations are encouraged to discuss this with the instructor as soon as possible.