The Cost of Clarity: A Comparison of Codex and ChatGPT
I’ve been using codex for several weeks now: here are the differences I see between it and chatGPT:
First, the models are essentially equivalent. My perception is that Anthropic and OpenAI make small tweaks constantly so comparing the models today is going to have little baring on a month from now. As of release day of Opus 4.7, it was too wordy and wasted tokens, but generally capable. Lately it seems less wordy. ChatGPT tends to be a smidge more succinct, but only a smidge. 5.5 hallucinates a bit, but its manageable. In a week that will probably fall off. ChatGPT follows instructions much more closely. That’s probably good, but weird if you’ve gotten used to Opus and Sonnet.
The differences in harness are interesting. Claude Code has 1M context models, which i think are an anti-pattern, personally, because they waste tokens. For most practical purposes that means you don’t need to compact, at the literal cost of burning money on token cost once you get past 200k tokens. Claude Code has auto-compact, which blocks until it finishes, and at 2 minutes that’s pretty annoying. Codex, on the other hand, compacts in the background. Its all but invisible if you’re running a session with lots of short turns. If the session is one long request that you want to run for a very long time, you might run out of context and have to tell it to continue because it wasn’t able to apply the compaction in the background while the turn is running.
Both have saved prompts. Codex uses $ and Claude Code uses /. That’s a meaningless difference really.
The BIGGEST difference I’ve seen is that the $100 plan with OpenAI gets me more weekly usage than the $200 plan with Anthropic. That’s very hard to measure, but is a pretty significant difference.