Codex!!!!!!
Laurifer
Iâve just been through a very exciting few days. A couple of weeks ago I was contacted by OpenAI who had read a previous post of mine and asked whether Iâd be interested in joining a group of academics and scientists who are trying to figure out what LLMs can do for them. I said yes, and by way of a very generous reward, was comped a six-month subscription to their most powerful model. My contact urged me to use an app called Codex, about which I knew nothing, and report back. Codex is a bit different from the usual chat because it can work on files in your computer, edit, compare and save. Until recently I had not much use for it.
Then I found out my next task was to write a huge European Research Council grant in a new scheme called ERC Plus, an explicitly blue-skies research program. Such things only come along rarely, and Iâve been lucky in the past with them, so I decided to apply. For those lucky enough to be unfamiliar with scientific grant applications, the way it works in practice is you need to produce a sheaf of documents (41 for the ERC) detailing every aspect of the grant from top-level, i.e why what you plan to do is interesting, to such things as depreciation schedules over the duration of the grant. Large universities have entire departments dedicated to getting this right. The maddening peculiarity of this is that your grant can be killed by criticism aimed at it on any level from grand plan to small detail. Gone are the days of Carothersâsâhe discovered NylonÂźâjob description:âTo pursue such things as may be of interest.â
I have two months to write it, and a savvy colleague two weeks ago opined that I was unlikely to make the deadline. I was curious to see what AI would do for me. I uploaded old and new publications and various drafts of papers, applications and experimental data. I then asked Codex âis there any way that Codex can help me organise these documents and works schedule for a pending grant application?â What happened next was a white-knuckle ride lasting two sixteen-hour days. What Codex did was go to the ERC website, identify the 41 documents needed, their detailed requirements, their specific formatting. It then built a replica of the website on my machine. It worked out all the dependencies of one document on another (you canât work out depreciation intil you know what youâre buying) and drew up a list of things to do. Then it created a tracker spredsheet to keep track of progress.
What AI has done is rub our noses into the fact that, conservatively, 80% of what we do is boring and has been done before. The worst tasks are those where the remaining 20%, the interesting, novel stuff is chopped so fine and mixed so well with the tedium that you have to eat them both or none. Codex breezes through the small stuff and lets you worry about the big issues. It also deals with intermediate obstacles put up by the system to make you want to give up and cry: things like Work Packages, another word for subtasks borrowed from the world of engineering and awkwardly applied to science. If youâre building an airplane, a logical sequence is required. If youâre hunting the unicorn less so. There is also the dreaded Risk Mitigation part, where you have to explain plans B-K at every point. On those bits I used AI to see what the best task split was and what alternatives it suggests, and picked among different versions.
The back-and forth generated at lightning speed a hundred or so pages of dialogue ranging form the minute âfix the spelling of aluminumâ to the grand âcheck project description for logic and fluency.â I estimate conservatively that Codex did the work of at least two full-time assistants, except much faster and without ever sleeping. At MIT I might have found them in the Research Office if I took a number and got in line. With Codex they were fetching the ball and coming back for more in seconds. At one point I was naughty and pitted Claude against GPT for comments. When you use the top level models they become seriously smart and persnickety. Feeding comments from one to the other was amusing: they remained polite âthose are very good commentsâ and tore into each other until the versions converged. Finally I had to say to GPT âyouâre overthinking thisâ and to my relief it agreed.
On one thing AI failed miserably. The ERC requires a â1-page Vision Statement explaining why your project needs to be funded by this scheme rather than an ordinary grant, and why you are the right person to make it happen. I had not been asked such a thing before and had no drafts. I knew AI couldnât write it, but just for fun I let both LLMs have a go, and it was pathetic babble. Why? Because at this point LLMs know little about me, and do not know in which perspective I hold things in my mind, what is bigger and what is larger, what comes first and what can wait. The AI vision thing was a raving laundry list of disparate things. Iâm glad LLMs donât know about this Substack, or they might have tried to connect perfume writing to Drosophila behavior.
The heady feeling of revenge against bureaucratic gatekeeping felt like an irresistible surge through enemy lines. In my report to OpenAI I described the effect as being like a headlong cavalry charge. Now I must calm down and read the whole thing again.



What I find interesting is that Codex did everything except the one page even a research office couldn't have written for you. And I don't think that's because it doesn't know you well enough yet. A vision statement is supposed to be arguable, you're putting your name on a bet that could be wrong. A model trained to never rub anyone the wrong way just can't do that. Put two of them together and it's worse, because they talk each other into the safest version :)
I'm a product manager in AI, and this is the pattern I see all the time. The demo looks magic because most of the work has been done before. What it can't do is supply the nerve. Good that the nerve is still yours. Good luck with the ERC â€ïž
"The heady feeling of revenge against bureaucratic gatekeeping felt like an irresistible surge through enemy lines."
I've always felt like AI taking care of the procedural and repetitive is inherently anti bureaucracy and giving more power to the interesting or even 'soulful' parts, but I find it hard to explain to the people around me who (legitimate job loss/instability fears aside) often take the stance that it's undignified to use AI.