Computing for Big Data
BST 262
Subject & Catalog Number
Course Information
Description
This course will give a critical presentation of software implementations, theoretical/algorithmic software development, and modern software tools to collect, store, and process data at scale. This will include hands-on programming practice, R package development (with C++ integration), software design and good software development practice, multiprocessing with OpenMP, cloud computing on the Harvard computing cluster, container images (Docker), and an introduction to big data stacks. A basic level of programming in R and C++ is required. The goal of the course is not only limited to recipes to manipulate data, but to learn state of the art workflows for software design and dissemination, software tool selection, and maintenance. We will see how big data influences several aspects of data science (for instance, software development and data management) and how we can leverage modern tools to work with data efficiently.
Class Notes
THIS CLASS HAS PRIORITY ENROLLMENT
Priority Wave Groups
Wave 1 | SM60-HDS
Wave 2 | BIO SM2 / BIO SM60 / BIO SM1 / CBQG SM2
Wave 3 | Open Enrollment
-------------------------------------------------------------
Priority Wave Timing
Wave 1 | 8/14/2025 11:00 AM - 8/24/2025 11:59 PM
Wave 2 | 8/25/2025 12:00 AM - 8/27/2025 11:59 PM
Wave 3 | 8/28/2025 12:00 AM Enrollment Deadline (varies by session)
Any student who does not meet the Wave 1 or Wave 2 criteria can add themselves to the waitlist (if enrollment requirements are met) at any time during the enrollment period. At the beginning of each priority wave, students on the waitlist who meet the Wave’s criteria will be automatically enrolled into any remaining seats in the course (pending no time conflicts)
**Cross-Registrants and Non-Degree Students will be enrolled on a space available basis after the enrollment deadline for the course
Available for Harvard Cross Registration