Obviously, Instagram does not want you to automate engagement. Their HTML is a mess of randomly generated class names and deeply nested divs. The structure changes every deployment. Any script that relies on DOM selectors breaks within weeks because the class name doesn't exist anymore.
But it doesn't matter anyway. Instagram can obfuscate their code all they want because code is for machines. But UI... The UI is for humans. A heart icon has to look like a heart icon. A comment button has to be where users expect it. The layout has to be consistent enough that a person can easily navigate it.
So instead of fighting the DOM, let's just bypass it entirely. Take a screenshot. Find the heart by its visual appearance. Get its coordinates. Move the cursor there. Click. Done.
This works on anything that renders to pixels. Web apps, native apps, games, terminals. If a human can see it and click it, a computer can too. No selectors, no APIs, no platform-specific hooks. Just computer vision and cursor automation.
Unfortunately, you can't just hardcode a position. Things move around all the time. A long caption pushes the action bar down. A location tag adds a line. A carousel of images takes up more vertical space. Every post compresses or expands the layout differently.
Navigate between 2 posts and watch what happens to the hearts' position:
Hearts move between posts. The positions are never the same
Computer vision solves this. Instead of guessing where the hearts should be, you look at the screen and find where they actually are.
Too Much Screen, Too Many False Positives
The naive approach is simple: take the heart icon as a template and find it on the screen. Wherever it matches, that's a heart. It's the most basic computer vision operation you can do.
... continue reading