Rendered at 19:29:13 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.
7777777phil 2 hours ago [-]
On a meta note, I like that Tao is publishing his failed attempts alongside the successful one. Two prior recordings didn't work out, one because the machine crashed mid-run, one because he forgot to screen share properly.. and he just tells you that upfront. Most people would quietly delete those and only show the clean take.
For me the interesting part of this video isn't really the AI though. It's how Tao breaks the work apart. First attempt was "just do the whole thing." Ran 45 minutes, crashed the machine, burned through the token budget, produced nothing. Second time he decomposed it into steps and got it done in 25 minutes. By this third attempt he'd written out a whole recipe beforehand:
>I decided to write up here a step-by-step recipe for what we're going to do
Imo that recipe is where the actual value is. He's figured out which subtasks he can hand off and which ones need his eyes on them. At one point he's manually fixing a proof step while Claude skeletonizes the next lemma in the background. That's not "AI did my homework," that's just.. two workers on separate parts of the same job.
>it didn't mind that I was editing something else. It just went ahead and implemented these edits independently, which is great
Thing that surprised me: the agent handled the high-level formalization fine but choked on the mechanical low-level steps.
>it actually struggles a lot with the lowest level steps of the proof actually, which is surprising because I would have thought that would have been the easiest part
He also said something that tbh I keep coming back to when I look at how firms adopt these tools:
>you do need to keep doing that. Otherwise, if you rely too much on these tools and something goes wrong, you may have no idea what to do
I see this constantly. The people getting the most out of coding agents, in my world it's usually quant or strategy work, are the ones who stay close enough to catch when things go sideways. The ones who fully check out just get quietly worse at their jobs in ways that don't show up until something breaks. There's no magic automation dial you set to the right number and forget about. You kind of have to keep adjusting it task by task.. and honestly that judgment call is the hard part, not the tooling.
For me the interesting part of this video isn't really the AI though. It's how Tao breaks the work apart. First attempt was "just do the whole thing." Ran 45 minutes, crashed the machine, burned through the token budget, produced nothing. Second time he decomposed it into steps and got it done in 25 minutes. By this third attempt he'd written out a whole recipe beforehand:
>I decided to write up here a step-by-step recipe for what we're going to do
Imo that recipe is where the actual value is. He's figured out which subtasks he can hand off and which ones need his eyes on them. At one point he's manually fixing a proof step while Claude skeletonizes the next lemma in the background. That's not "AI did my homework," that's just.. two workers on separate parts of the same job.
>it didn't mind that I was editing something else. It just went ahead and implemented these edits independently, which is great
Thing that surprised me: the agent handled the high-level formalization fine but choked on the mechanical low-level steps.
>it actually struggles a lot with the lowest level steps of the proof actually, which is surprising because I would have thought that would have been the easiest part
He also said something that tbh I keep coming back to when I look at how firms adopt these tools:
>you do need to keep doing that. Otherwise, if you rely too much on these tools and something goes wrong, you may have no idea what to do
I see this constantly. The people getting the most out of coding agents, in my world it's usually quant or strategy work, are the ones who stay close enough to catch when things go sideways. The ones who fully check out just get quietly worse at their jobs in ways that don't show up until something breaks. There's no magic automation dial you set to the right number and forget about. You kind of have to keep adjusting it task by task.. and honestly that judgment call is the hard part, not the tooling.