Which npm package has the largest version number?
I spent way too much time on this
I was recently working on a project that uses the AWS SDK for JavaScript. When updating the dependencies in said project, I noticed that the version of that dependency was v3.888.0 . Eight hundred eighty eight. That’s a big number as far as versions go.
That got me thinking: I wonder what package in the npm registry has the largest number in its version. It could be a major, minor, or patch version, and it doesn’t have to be the latest version of the package. In other words, out of the three numbers in 
TL;DR? Jump to the results to see the answer.
The npm API Obviously npm has some kind of API, so it shouldn’t be too hard to get a list of all… 3,639,812 packages. Oh. That’s a lot of packages. Well, considering npm had 374 billion package downloads in the past month, I’m sure they wouldn’t mind me making a few million HTTP requests. Doing a quick search for “npm api” leads me to a readme in the npm/registry repo on GitHub. There’s a /-/all endpoint listed in the table of contents which seems promising. That section doesn’t actually exist in the readme, but maybe it still works? Terminal window 1 $ curl 'https://registry.npmjs.org/-/all' 2 { "code" : "ResourceNotFound" , "message" : "/-/all does not exist" } Whelp, maybe npm packages have an ID and I can just start at 1 and count up? It looks like packages have an _id field… never mind, the _id field is the package name. Okay, let’s try to find something else. A little more digging brings me to this GitHub discussion about the npm replication API. So npm replicates package info in CouchDB at https://replicate.npmjs.com , and conveniently, they support the _all_docs endpoint. Let’s give that a try: Terminal window 1 $ curl 'https://replicate.npmjs.com/registry/_all_docs' 2 { 3 "total_rows" : 3628088, 4 "offset" : 0, 5 "rows" : [ 6 { 7 "id" : "-", 8 "key" : "-", 9 "value" : { 10 "rev" : "5-f0890cdc1175072e37c43859f9d28403" 11 } 12 }, 13 { 14 "id" : "--------------------------------------------------------------------------------------------------------------------------------whynunu", 15 "key" : "--------------------------------------------------------------------------------------------------------------------------------whynunu", 16 "value" : { 17 "rev" : "1-1d26131b0f8f9702c444e061278d24f2" 18 } 19 }, 20 { 21 "id" : "-----hsad-----", 22 "key" : "-----hsad-----", 23 "value" : { 24 "rev" : "1-47778a3a6f9d8ce1e0530611c78c4ab4" 25 } 26 }, 27 # 997 more packages... Those are some interesting package names. Looks like this data is paginated and by default I get 1,000 packages at a time. When I write the final script, I can set the limit query parameter to the max of 10,000 to make pagination a little less painful. Fortunately, the CouchDB docs have a guide for pagination, and it looks like it’s as simple as using the skip query parameter. Terminal window 1 $ curl 'https://replicate.npmjs.com/registry/_all_docs?skip=1000' 2 "Bad Request" Never mind. According to the GitHub discussion linked above, skip is no longer supported. The “Paging (Alternate Method)” section of the same page says that I can use startkey_docid instead. If I grab the id of the last row, I should be able to use that to return the next set of rows. Fun fact: The 1000th package (alphabetically) on npm is 03-webpack-number-test . Terminal window 1 $ curl 'https://replicate.npmjs.com/registry/_all_docs?startkey_docid="03-webpack-number-test"' 2 { 3 "total_rows" : 3628102, 4 "offset" : 999, 5 "rows" : [ 6 # another 1000 packages... Nice. Also, another 3628102 - 3628088 = 14 packages have been published in the ~15 minutes since I ran the last query. Now, there’s one more piece of the puzzle to figure out. How do I get all the versions for a given package? Unfortunately, it doesn’t seem like I can get package version information along with the base info returned by _all_docs . I have to separately fetch each package’s metadata from https://registry.npmjs.org/
Fetch package data in batches so we’re not just doing 1 HTTP request at a time
Save the package data to a file (again, hopefully I only have to fetch everything once) Once I have all the package data, I can answer the original question of “largest number in version” and look at a few other interesting things. (A few hours and many iterations later…) Terminal window 1 $ bun npm-package-versions.ts 2 Fetching package IDs... 3 Fetched 10000 packages IDs starting from offset 0 4 # this goes on for a while... 5 Finished fetching package IDs 6 Fetched 50 packages in 884ms (57 packages/s ) 7 Fetched 50 packages in 852ms (59 packages/s ) 8 # this goes on for a really long while... See the script section at the end if you want to see what it looks like.
Results Some stats: Time to fetch all ~3.6 million package IDs: A few minutes
Time to fetch version data for each one of those packages: ~12 hours (yikes)
... continue reading