Vibe Coding the MIT Course Catalog

I recently left Microsoft to join MIT's Media Arts and Sciences program. The transition brought an immediate problem: how do you navigate course selection when faced with the "unknown unknowns"? You can easily find courses you already know you want learn, i.e. "known unknowns". But discovering courses you never knew existed, courses that might reshape your thinking entirely, requires different tools altogether.

MIT's official course catalog runs on what appears to be a CGI script. The technology dates back to the 1990s. You cannot search as you type. Everything feels slow. The popular student-built alternative, Hydrant, offers fast search but displays details for one course at a time. Neither tool works well for browsing or screening multiple options simultaneously. More importantly, both tools remain stubbornly human-centric in an age where LLMs should help us make better decisions.

It's time for a hackathon!

I built Courseek to solve this problem while testing how far I could push AI-assisted development. GitHub Copilot had become an expensive alternative to autocompleting functions and generating boilerplate. Could I go all in on vibe-coding this time? In addition, I want to test whether LLM can successfully use my personal (Un)framework. It would be nice to break the React + Tailwind duopoly.

Spoiler alert: it works!

Proof of Concept

MIT's course catalog contains around 2.3k courses. I scraped them from MIT Course Picker with an embarrassingly simple script:

[... document . querySelectorAll ( ".course-name" )] . map (( e ) => e . closest ( ".course-lens" )) . map (( d ) => ({ title: d . querySelector ( ".course-name" )?. textContent , description: d . querySelector ( ".course-description" )?. textContent , semester: d . querySelector ( ".course-semester" )?. textContent , prereq: d . querySelector ( `[data-ex-content=".prereqs"]` )?. textContent , instructor: d . querySelector ( ".course-instructor" )?. textContent , units: d . querySelector ( `[data-ex-content=".units"]` )?. textContent , level: d . querySelector ( `[data-ex-content=".level"]` )?. textContent , }));

The script outputs a JSON array in the console, which I copied and counted the tokens using OpenAI's tokenizer - 343k token for the entire catalog! The count feels high but JSON is more verbose than plain text and several metadata fields are irrelevant for LLM use, so we have essentially established an upper bound. Given that modern LLMs handle million-token contexts easily, I could feed the entire dataset as context for LLM to guide me through course discovery.

Benchmarking the Competition

... continue reading