CSCI2244-Randomness and Computation

Spring, 2019





General Course Information Course Content Required Background Software
Textbooks Required Work Grading Detailed Course Schedule

General Course Information


Instructor


Howard Straubing
Computer Science Department
St. Mary's South 251
tel.: 617-552-3977
e-mail: straubin@cs.bc.edu
Office hours: Monday and Wednesday, 2-4, or by appointment.

Teaching Assistants


Kevin Deng
Justin Kim
dengke@bc.edu
kimavz@bc.edu
OH: Thursday, 5PM-7PM
OH: Friday, 1PM-3PM

TA office hours are in Fulton 160

Meeting Times

The course meets Tuesdays and Thursdays, 12:00-1:15, in Fulton 250. There is no class on March 5 and 7, or on April 18.  I will be away at a conference on March 26 and 28. Prof. Alvarez will teach the class on March 26.  March 28 is up in the air, but I will provide recorded material if there is no one to cover the class.


Course Content

Methods from probability and statistics are used throughout Computer Science, in the design and analysis of algorithms, artificial intelligence, machine learning, data mining, natural language processing, cryptography, theory of computation.... This is a course on the mathematical fundamentals of probability and their applications in Computer Science.


Required Background

The official prerequisites for CSCI2244 are Computer Science I (CSCI1101) and calculus. In particular, it is assumed that students taking CSCI2244 are comfortable with fundamental programming techniques, as well as with basic concepts of differential and integral calculus.

I intend to make all the programming assignments in Python, supplemented with the matplotlib library which enables, among other things, the drawing of nice plots.  If you have not programmed in Python before, you will find it reasonably easy to pick up, particularly since we don't use any of the really fancy stuff (just basics of functions, branching and looping, manipulation of one-dimensional lists).

Since we use the basic language of sets and functions throughout the course, it is also strongly recommended that you take CSCI2243 before CSCI2244.  However, it is understood that some students are taking this sequence in the reverse order, so I will provide supplemental materials to help these students bridge whatever gaps are present in their preparation.

Software

All of our programming demos and assignments will be in Python 3, supplemented by the matplotlib library. Instructions for installing the software can be found by following the links in the grid below.  I recommend that you do the installation right away, and inform me of any problems that arise.  If you have already installed Python, but only have Python 2, you will need to install Python 3.  This won't overwrite your installation of Python 2, so you will still be able to use that in any other work that you need it for. But the work in this course has to be in 3; while the differences between the two versions are slight, they are not compatible.


Textbooks


The official textbook for this course is Randomness and Computation, by Prof. Sergio Alvarez.  This book, which is available for free in electronic form, was prepared specifically for this course, and is posted on the Canvas site.  In the detailed course syllabus below, you will find references to the relevant sections of this book.

You might also want to have a look at  Introduction to Probability, 2nd Revised Edition, by Charles M. Grinstead and J. Laurie Snell.  This book,  along with a lot of supplemental material, is available for free on line. The associated code samples are in different programming languages from the one we are using, but it is the text itself that I am most interested in. If you would like to have a paper-and-ink copy of this book, you can buy one from the online bookstore of the American Mathematical Society for $60, a very reasonable price compared to most textbooks.

I will also post lecture notes.  These focus on just what we cover in class, and are less detailed than the material in the books.

The core mathematical material in this course is quite standard, and is also covered in a variety of sources, including printed textbooks, online lecture notes from courses at other universities, even lectures on YouTube.  Feel free to explore, and please let me know if you run across anything that you find particularly useful, and that ought to be shared with the class. You should be aware that different books may use slightly different terminology, and that there is considerable variability in the order in which topics are presented. It goes without saying that there is a huge variation in quality of YouTube offerings.

Required Work

  There will be approximately nine problem assignments during the semester.  Typically, every assignment  will involve some coding, and some written work. The assignments are not meant to be completed in a single late-night session, and I strongly advise you to start every assignment within a day or two of receiving it, so that you will be able to find out where the troublesome parts are and ask meaningful questions in class or in office hours.   This does not mean that assignments will necessarily require many hours to complete, but rather that questions are likely to arise during the work, and you may need some answers before you can proceed further.

Assignments will be posted on this web site.  I will try to post solutions to assignments in a timely fashion.  For this reason (and others) late homework assignments will not be accepted. Assignments will be submitted through the Canvas site for the class.  There is a required format for assignment submissions, described in detail in the first row of the course grid below.

There will be a number --probably four--of brief quizzes, and two in-class midterm exams, at dates to be announced. (The dates for the midterms that appear in the grid below are estimates.) There will be a common final exam for all three sections of CSCI2244 on Thursday, May 9, at 4PM.  This does not mean that you will be taking exactly the same exam as the other sections, it just secures a common time.

While I do not take regular attendance, students are nonetheless expected to attend class, notify me of any absence, and inform themselves about any material or announcements they miss in the event of an absence.


Eyes Front!

My colleague Bob Muller calls this policy the 'laptop-free classroom'. (For 'laptop', read 'laptop and smartphone'.)

A successful class requires your attention, engagement, and participation.  You need to be prepared to ask and answer questions during the lectures, and to attend to the questions and answers of your fellow students and of the instructor.  That screen open to your e-mail or Facebook page distracts not only you, but the students sitting behind you. For this reason, open laptops are not permitted in the classroom, unless they are part of some planned activity for which your computer is required.

You say you're taking notes on that laptop instead of shopping on Amazon? You should be aware that (a) it is very difficult to take notes for this class on a computer, given the large number of mathematical symbols, formulas and diagrams that we use; and (b) taking notes by hand is better for you.  It is acceptable to take notes on a tablet computer with a stylus.

Exceptions to this rule will be made for the rare instances of in-class activities that require students to use their computers. There will be one such activity on the first day of class.

Academic Integrity

While I encourage you to discuss problem assignments both among yourselves and with me, when it is time  to finally sit down and prepare the completed assignment, you are required to work alone.  Thus it is acceptable to learn from another student the general idea for writing program code to perform a particular task, but unacceptable to  take an extended piece of code written by another person and incorporate it into your submitted assignment as your own work.  Similarly, you may get ideas from additional reading that you do both in books and on the Web, but you may not incorporate it verbatim into what you submit and claim that it is your own work. If you have any uncertainty about the application of this policy, please check with me.

It should go without saying that during quizzes and exams there is to be no communication between students, and no use of outside materials except those explicitly authorized in advance.

Failure to comply with these guidelines will be considered a violation of the University policies on academic integrity. Please make sure that you are familiar with these policies, which are posted at http://www.bc.edu/offices/stserv/academic/integrity.html.

Services for Students with Disabilities

If you are a student with a documented disability seeking reasonable accommodations in this course, please contact Kathy Duggan, (617) 552-8093, dugganka@bc.edu, at the Connors Family Learning Center regarding learning disabilities and ADHD, or Rory Stein, (617) 552-3470, steinr@bc.edu, in the Disability Services Office regarding all other types of disabilities, including temporary disabilities. Advance notice and appropriate documentation are required for accommodations.

Grading

To be eligible for  a passing grade in this course, you must complete at least 60% of the assignments in a satisfactory manner. Once this criterion is met, your grade will be computed from the various course components, with the problem assignments, quizzes and midterms, and final exam each weighted approximately equally. I do not have a fixed scale, made in advance, for translating raw scores into letter grades, but instead consider the difficulty of the work and class performance as a whole.  In a large required course such as this one, the median grade is typically B.


Detailed Course Schedule

Take this with a grain of salt.  The list of topics is largely accurate, but now and again I will decide to eliminate or add a topic, or change the order in which I present them.  The dates are almost surely inaccurate. Expect  this section of the website to grow and change as the semester progresses. The numbers next to each topic are the corresponding sections of the course textbook.

Date
Topic
Handouts and Lecture Notes
Assignments           
Week 1
January 15-17
A coin-tossing experiment.

Simulations in Python and matplotlib.
Monkey Tacos

Installing and using the software

NumPy reference

matplotlib plotting reference

Coin-tossing experiments (heavily commented Python code to demonstrate the plotting and numerical tools)

How to prepare the homework (a mockup homework assignment).



Non-required reading

More than you ever wanted to know about flipping coins: 

Is there such a thing as an unfair coin? No, say these statistics professors.
Yes, claims this guy (along with disturbing photographic instructions for how to make such a coin).   
And maybe, say these other statistics professors, they're all unfair.

The 'Hot Hand' paper

Demonstration of a Galton Board (YouTube video), a low-tech mechanical device for simulating the a large number of coin-tosses and drawing the resulting histogram.
Assignment 1, due January 23

text file with PT's coin tosses (one of the homework problems)

LaTeX source for assignment

The class data from the coin-tossing experiment (Excel spreadsheet with macros, showing length of longest run and number of runs).

The class data from the experiment last year.
Week 2
January 22-24
Discrete probability spaces, Independent events (2.1,2.2)
Lecture 1--Probability spaces

Lecture 2--Some worked examples; sampling with and without replacement; independent events.

Simulation of two dice

Simulation of drawing cards from a deck, pulling beans from a jar.


Assignment 2, due January 30

LaTeX source for assignment
Week 3
January 29-31
Counting (2.3)
Lecture 3:  Basic counting principles; birthdays

Plot of birthday probabilities

Lecture 4: Binomial coefficients
Assignment 3, due February 8

LaTeX source for assignment

.csv file containing the real birthday data

Code for reading the .csv file

Week 4
February 5-February 7
Discrete random variables. (3.1-3.4 and 3.6.1 on Expected value) Lecture 5: Discrete random variables

Simulating the roll of two dice using the CDF

iPython notebook of this simulation (this is just an html view of the class demo, both inputs and outputs,  not an active notebook that you can modify)


Assignment 4, due February 20

LaTeX source for assignment
Week 5
February 12-February 14
More discrete random variables. Geometric distribution. Poisson distribution.

Expected value.
Lecture 6: Important discrete random variables

Lecture 7: Expected value of a discrete random variable

Lecture 8: Some worked examples--numbers racket, coupon collector's problem, average-case running time of quicksort.



Week 6
February 19-February 21
 

More on expected value



Week 7
  February 26-28


First test

Conditional Probability and Bayes's theorem
(Chapter 6, but note that the textbook does
conditional probability after continuous probability spaces, so the material here about continuous conditional probability can be skipped for now.)
Last year's midterm

Last year's midterm with solutions. (Advice:  Don't look at this until you've tried to work the problems.)

This year's midterm with solutions.

Lecture 9: Conditional probability and Bayes's Theorem


Assignment 5, part 1, due Friday, March 15

LaTeX source for assignment

Assignment 5, part 2, due Monday, April 1

The data sets for Part 2
Support code for Part 2
Spring break



Week 8
March 12-14
Continuous sample spaces and continuous random variables.
Lecture 10: Continuous Probability Spaces

Simulation of Buffon needle problem
Assignment 6, due March 27

LaTeX source for assignment
Week 9
March 19-21
Continuous sample spaces and continuous random variables.
Lecture 11: Continuous Random Variables

Simulation of dartboard example (iPython notebook).

Lecture 12: More on the exponential distribution

Week 10
March 26-28

Variance, Markov's and Chebyshev's inequalities. Law of Large numbers


Lecture 13: Variance, Markov's and Chebyshev's inequalities, Law of Large Numbers

Week 11
April 2-4
Normal distribution and Central Limit Theorem

Second Test
The second test from last year

The second test from last year, with solutions


The test with solutions
Assignment 7, due April 12

The 911 call dataset (CSV file)

Code for reading the elapsed times from the data file.
Week 12
April 9-11
More on Central Limit Theorem (Chapter 5)


Normal distribution demo (iPython notebook)

Lecture 14:  Normal distribution Part 1(approximation to binomial distribution)

Lecture 15: Normal distribution Part 2 (Central Limit Theorem)


Probability plots and normal distributions in natural data (iPython notebook)
Assignment 8, due April 26

Week 13
April 16 (just one class)
Markov Chains (Chapter 9)

Lecture 16: Markov Chains Part 1 (basics and analysis of absorbing chains)





Assignment 9, due May 2
Week 14
April 23-25
Markov Chains
Lecture 17: Markov Chains Part 2 (analysis of regular Markov chains)

Week 15
April 30-May 2
Review
The final exam from 2018

Solutions to the 2018 final

The final exam from 2014 (Note:  Take the last question, which is about Markov chains, with a grain of salt!  Parts (a) and (b) refer to some linear algebra that we did not study in detail this year.  Part  (d) can be solved without worrying about things like eigenvalues, just by solving a simple linear system.)

Review problems from several years ago (ignore the next to last question about eigenvalues and principal components)

Review problems with solutions