I was recently building a feature for Greptile (an AI-powered code review tool), when I hit a weird snag with GitHub's API.
The feature should have been simple: I wanted to add clickable links to GitHub PR comments, so users could jump directly from our reviews to relevant GitHub discussions. We already stored the comment IDs, so I just needed to construct the URLs.
The problem was, when I tested it, the links didn't work.
Searching through GitHub's documentation for answers revealed that their team maintains two separate ID systems. We'd been using GitHub's GraphQL API, which returns node IDs like PRRC_kwDOL4aMSs6Tkzl8 . GitHub designed these node IDs to uniquely identify any object across its entire system. But web URLs required database IDs, integer values visible in URLs and often associated with REST responses, like 2475899260 .
I was looking at either backfilling millions of records or migrating our entire database, and neither sounded fun. So I did what any annoyed engineer would do: I stared at these IDs for way too long, looking for a way out of the migration.
And I found it.
Spotting a pattern between node and database IDs
I looked for a relationship between these two ID formats. I pulled up a few of our stored node IDs and opened the corresponding PR comments from the same pull request in my editor:
Node ID Database ID PRRC_kwDOL4aMSs 6Tkzl8 2475 899260 PRRC_kwDOL4aMSs 6Tkzya 2475 900058 PRRC_kwDOL4aMSs 6Tkz3e 2475 900382
The database IDs incremented sequentially, and the node IDs were almost identical too, differing only in their last few characters. GitHub's documentation mentioned that node IDs are base64 encoded. I tried decoding just the part after PRRC_ :
... continue reading