
I asked Codex to handle a months-old Verizon billing dispute I’d been putting off. Two hours later, it had fixed the bill, recovered every back payment, and negotiated a $100 credit on top.
Last fall, my wife and I upgraded to new iPhones through Verizon’s trade-in promo. Essentially, you trade in a phone and Verizon credits you each month for most of what the new one costs. We handed over two iPhones at the store and walked out expecting the next bill to barely move.
The next bill was full price for both phones. So was the one after that. So was the one after that.
Fixing it meant a Verizon support chat, a trip to the store, or sitting on hold for an hour. Any of those meant setting aside a few hours, re-explaining the same situation twice, and staying chained to a chat window the whole time. I ended up putting it off for three months.
Then I decided to give the problem to Codex.
GPT-5.5 is genuinely good at computer use, and the Codex App ships with native computer use built in. The Codex Agent can see and operate any app or window on your Mac, including the browser. What better task than to let it sit through the support chat and go back and forth with Verizon for me?
Gathering Context
An agent is only as capable as the context it has.
Before any of this turned into a chat with Verizon, I wanted Codex to investigate the situation, get a real understanding of the problem, and walk in with enough data to handle the chat on its own. It had been six months since the upgrades and the store visit, so my own memory of the details was fuzzy. I didn’t want Codex pinging me every few minutes for context that wasn’t easily verifiable anyway.
I gave it a deliberately vague starter prompt and let it decide what to chase:

It navigated Verizon’s account site on its own, opened bill PDFs from across the last several months, and pulled the right numbers out of pages of dense charges. About twelve minutes, start to finish, while I worked on other things in another window.
I couldn’t remember what the discount actually was, so I asked Codex to find it.
can you look up the discounted bill maybe publicizied somewhere?
It pulled up Verizon’s iPhone 17 launch promo: up to $1,100 off over 36 months for a qualifying trade-in. That worked out to roughly $30 a month per line in credits, which lined up with what I half-remembered from the store and confirmed the gap.
Making a Plan
Asking Codex to draft a strategy before opening the chat.
I wanted to make sure Codex and I were on the same page before letting it loose on a Verizon agent.
It also pre-drafted responses to the four most likely versions of pushback, plus a checklist of things to leave the chat with: case number, PCA confirmation, expected monthly credit per line, retroactive catch-up amount, courtesy credit.
Negotiating with Verizon
Cold but thorough.
Armed with all of this information, I sent Codex off to negotiate with Verizon. Aside from a couple nudges, I let it work on its own.
Codex would type a message, send it, and wait. Kate, the Verizon support agent on the other end, would reply. Sometimes in a few minutes. Longer when she was working the back-end.
What surprised me was how willing Codex was to push back. Every time Kate skipped over a number, the next message asked for it. Monthly credit per line. Catch-up credits. Reference number.
Watching Codex Think
The replies themselves were fine. The thinking between them was the interesting part. The Codex App shows the model’s reasoning as it works, and that’s where Codex’s personality came through.
When Kate gave a generic explanation instead of an account-specific finding, Codex flagged it in its thinking before replying:
She’s now giving a generic explanation rather than an account-specific finding. If you reply, I’d steer her back to the concrete ask: check whether the promo is attached to each device agreement and, if not, file a correction.
Then it asked Kate for the exact monthly credit amount per line, when those credits would start, and whether missed credits would be backdated.
When Kate said the corrections “may take 1-2 billing cycles,” Codex paused before accepting:
Found it. Verizon’s latest is basically: they “need to work on promo corrections on each line,” changes may take 1-2 billing cycles, and unapplied credits should show on the next bill. I’m going to avoid saying “sounds good” until they give exact amounts and a ticket/reference.
Instead of moving on, it pressed for a reference number.
When the courtesy credit came back as “approved” without a number, Codex caught the vagueness:
I asked for the exact amount because “approved credit” is a little too vague for something we may need to verify later.
It asked Kate to put the dollar figure in writing.
GPT-5.5 has real attention to detail and doesn’t just nod along. Mid-chat, Kate swapped which trade-in phone went with which line. I hadn’t asked Codex to verify any of that. It caught it on its own:
Negotiating a $100 Courtesy Credit
I told Codex earlier in the session: if Verizon was at fault, ask for a goodwill credit, but only after the substantive fix was locked in, and only after asking me first.
It waited.
Once Kate confirmed the new monthly amount and the catch-up credits, Codex paused, recommended $100, and waited for me to say go. Then it sent:
Kate took it to a manager and came back with approval.
With little intervention from me, Codex successfully:
- Got the bill fixed. A refund for the previously missed monthly credits will appear on the next bill.
- Negotiated a one-time $100 courtesy credit that will also appear on the next bill.
- Got a note added to the account for a loyalty discount when one becomes available.
What I Learned
A few things stuck with me afterward.
Giving agents context is everything. Engineers building with these models already know this, but the Gathering Context section above is the only reason it worked. The chat itself was the easy part.
Computer use is past the demo phase. This was the first time I’d hand an agent a real task on a live, signed-in account and not regret it ten minutes later. Reading PDFs, navigating tabs, typing into the right field of a support widget.
You can let agents do the grungy work. Filling out nasty forms. Navigating ugly bank, utility, and government UIs. Sitting in support chats. Reading dense PDFs across months of statements. None of these is worth a human’s afternoon on its own, which is exactly why they pile up. An agent doesn’t care.
I was still in the loop. I checked in six times in two hours, said yes or no, gave it the price anchor I half-remembered from the store, and recovered the chat session when it timed out.
You can read through the full raw Codex session on Hugging Face via the Custom Agent Trace Viewer.