Welcome to dystopia: I helped ChatGPT pass a CAPTCHA and doomscroll my Facebook

Calvin Wankhede / Android Authority

Last week, OpenAI released what may be the most ambitious (and potentially controversial) update to ChatGPT since its launch: Agent mode. Unlike the standard conversational interface, Agent mode gives ChatGPT control over a virtual machine running Chrome, allowing it to interact with websites like a human would. It can identify elements on websites, scroll, click buttons, fill out forms, and, if granted credentials, even log into your online accounts.

For the first time since the AI’s launch, it can perform tasks instead of spitting out some text on how to do it yourself. On the surface, the potential seems endless. The chatbot could reply to your emails, shop for groceries, book a flight, or perform even more complex tasks spanning multiple websites. The best part is that you can watch ChatGPT “move” its mouse cursor around the virtual web browser and navigate the internet (as you can see in the video below). Admittedly, it’s a lot like a toddler struggling to walk at times, but it’s endlessly fascinating nonetheless.

So what can ChatGPT’s Agent mode actually do with all of these capabilities? To answer that question, I tested the feature with a couple of real-world tasks — the kind you might actually want to offload to an AI assistant. Here’s how it handled them, and what ChatGPT did when it encountered an obstacle.

Putting ChatGPT Agent to work: A grocery run

Amazon’s Alexa can add toilet paper to your cart with a voice command, but ChatGPT’s Agent mode can be entrusted to do a whole lot more. Specifically, it can shop your entire grocery list on any platform of your choice. Case in point: I gave the agent a simple task: buy everything I would need for a homemade pizza from Walmart. I didn’t offer any specific ingredients, items, or even guidance on price just to see what it would pick.

The agent booted up a virtual computer and navigated to Walmart in no time. But it ran into a roadblock almost immediately — Walmart threw up an anti-bot verification screen requiring a human to press and hold a button. Shockingly, the agent recognized this screen and asked me to briefly take control of the browser and complete the task. I took control and about ten seconds later, we were in. I handed control back, and the agent immediately got to work. It looks like CAPTCHAs will need to evolve yet again if they are to keep bots out in the future.

ChatGPT summoned me when it needed a human touch, which it turns out means just solving CAPTCHAs.

Moving on, I watched the agent methodically search for “pizza dough,” “pizza sauce,” “mozzarella cheese,” and “pepperoni.” But to my surprise, the agent didn’t just grab the first result. Instead, it prioritized familiar and well-priced alternatives just like I personally would. In more than one instance, I watched it pick the third or fourth item in the results or call a competing product overpriced. The agent also correctly moved past inaccurate search results like a fully premade frozen pepperoni pizza when it was merely shopping for pepperoni, the ingredient.

Within four minutes, my virtual cart was filled with everything I needed to make a pizza. The agent navigated to the checkout page and then handed control back to me to complete another CAPTCHA, login, and enter my payment details securely. ChatGPT says it cannot see your inputs when you’re in control of its virtual machine, presumably meaning it can’t store your login or credit card info. Despite that, I didn’t elect to enter my login details and therefore, spent the night without any pizza.

... continue reading