*All but 8 we didn’t scrape (or got deleted between me checking the website and me scraping) and 42 missing from extensions.json.1 Technically we only installed 99.94% of the extensions.
It turns out there’s only 84 thousand Firefox extensions. That sounds feasibly small. That even sounds like it’s less than 50 gigabytes. Let’s install them all!
Scraping every Firefox extension
There’s a public API for the add-ons store. No authentication required, and seemingly no rate limits. This should be easy.
The search endpoint can take an empty query. Let’s read every page:
1 let url = 2 "https://addons.mozilla.org/api/v5/addons/search/?page_size=50&type=extension&app=firefox&appversion=150.0" 3 4 let extensions = [] 5 let page = 1 6 7 while ( true ) { 8 let res = await fetch (url) 9 let data = await res. json () 10 console. log ( `PAGE ${ page ++ }: ${ data . results . length } EXTENSIONS` ) 11 extensions. push ( ... data.results) 12 url = data.next 13 if ( ! data.next) break 14 } 15 16 Bun. write ( "extensions-default.json" , JSON . stringify (extensions))
The search API only gives me 600 pages, meaning I can only see 30 thousand extensions, less than half of them.
A solution I found is to use different sorts. The default sort is sort=recommended,users : first recommended extensions, then sorted by users, descending. Changing to just sort=created gave me some of the long tail:
1 let url = 2 "https://addons.mozilla.org/api/v5/addons/search/?page_size=50&type=extension&app=firefox&appversion=150.0" 3 "https://addons.mozilla.org/api/v5/addons/search/?page_size=50&type=extension&app=firefox&appversion=150.0&sort=created"
16 Bun. write ( "extensions-default.json" , JSON . stringify (extensions)) 17 Bun. write ( "extensions-newest.json" , JSON . stringify (extensions))
... continue reading