pre.dev @predotdev

Self-driving coding agent pre.dev Joined August 2023

Tweets

928
Followers

2K
Following

168
Likes

262

pre.dev @predotdev

14 hours ago

Full Podcast Episode of the Intelli-Gents x.com/predotdev/stat…

pre.dev @predotdev

a week ago

1 0 2 557 1

0 0 0 38 0

View Details

Have you spent the last few days thinking about how to move from writing prompts to designing loops? To truly achieve long-running agents, you need a long-term execution graph. Here is how we do it: Last month, we had a user run our coding agent for 60 hours straight. To sustain that scale, you have to solve two foundational challenges: memory compaction and predictive planning. In last week's podcast, Arjun (@ArjunRajJain) and Adam (@adampredev) break down how they built the predev harness capable of handling tens of thousands of messages in a single session. Recursive Memory Compaction: Context stuffing or pollution occurs naturally if the agent simply keeps going, which warmth-seeks and inevitably causes drift. To maintain coherence over a 60-hour window, the agent shouldn't have to break the loop to summarize its history. It requires a memory layout that manages context dynamically without losing the core intent. The Predictive Planning Layer: To build a long-horizon agent loop, you must approximate the series of events before execution begins. By establishing a contract of milestones, user stories, and acceptance criteria up front, the harness translates a vague idea into a structural execution graph. An additional benefit of knowing the full architectural scope ahead of time is efficient infrastructure management. Because the harness understands the roadmap before execution starts, the agent can configure its own isolated cloud sandboxes and route individual subtasks to the exact model tier that offers just the right performance for that specific action. Oddly enough, the deeper you go into the long-term execution graph, the closer you get to traditional project management. It turns out you manage coding agents the exact same way you manage a human engineering team. You break down complex goals into distinct user stories, estimate them, and design the system so the agents report back on their progress, upcoming subtasks, and impediments. Sound familiar? The real engineering enabler isn't the underlying model. The magic lies in building a harness that can decompress a stakeholder's high-level request, map out the execution graph, and dispatch specialized agents with a shared, evolving memory. If you want to see for yourself how to run your coding agents for 60 hours on autopilot, give predev a spin and explore our native agent right in your browser. Link in the comments.

2 0 2 64 0

View Details

pre.dev @predotdev

14 hours ago

Try our self-driving coding agent: try.pre.dev/CCqFYUp

0 0 0 20 0

View Details

pre.dev @predotdev

4 days ago

"It vibes on my machine." We’ve been here before. The acceleration of AI coding agents has curiously resurfaced some familiar and frustrating challenges in professional software development. If your engineers are walking around the office with their laptops open, babysitting their agents running loops inside a local terminal, you are heading toward a very predictable wall. By executing agents locally, engineering teams are re-introducing classic infrastructure liabilities: Context Pollution & Agent Drift: Local history and environments alter the agent’s behavior. The prompt templates and guardrails that cleanly run on one machine fail to replicate across the team, making harmonization and discipline impossible. Local Harness Customizations: When developers hack local settings to force an agent to cooperate in a certain way, you lose a standardized, deterministic execution framework. Something you may have invested in heavily when onboarding coding agents for your organization. Resource Exhaustion: Next-gen local harnesses eat memory like crazy. Running bloated agent loops on a local device inevitably leads to memory leaks and system crashes during long-duration tasks. Beefier machines and more RAM are not a permanent fix. The solution to agent drift isn’t any different from how the industry solved the classic "it works on my machine" era a decade ago. You move the runtime off the local box. We built predev to be entirely browser-native and cloud-sandboxed. No local setup, no local dependencies. Because the runtime lives in isolated cloud infrastructure, the operational dynamic changes: Persistent Execution: You can close your laptop without your agents going to sleep or dropping a thread. You can even check on your agents with your phone. Isolated Sandboxing: Separating the agent from your local network and local file system isn't just a baseline security requirement. It allows the agent to run blind, reproducible verification loops to ensure acceptance criteria are met without environmental bias. Predictive Resource Allocation: Because predev maps out the architecture and user stories before writing a single line of code, the system understands its upcoming constraints. The agent can autonomously configure its own isolated cloud resources, spinning up the exact sandbox environment and model tiers required for the specific task ahead. Software orchestration belongs in a stable, standardized cloud runtime. Not crammed into a local terminal. Below is a clip from our latest podcast episode, where Adam (@adampredev) and Arjun (@ArjunRajJain) reflect on the architecture of agent sandboxing and why local execution is a scaling liability.

2 1 3 262 0

View Details

pre.dev @predotdev

5 days ago

@_itsjustshubh @adampredev Yes agreed, though the vast majority of MVPs never even make it to $1M ARR

0 0 0 19 0

View Details

pre.dev @predotdev

5 days ago

@fabiolauria92 @adampredev Yes this is exactly right. The hard part though is calibrating for your exact stage, because building for 50k active connections when you only have 3 users is also overkill. You need a model that evolves with scale

0 0 0 11 0

View Details

pre.dev @predotdev

5 days ago

@adampredev @ArjunRajJain Full Episode. Intelli-Gents: We Beat Claude Opus with a Smaller Model x.com/predotdev/stat…

pre.dev @predotdev

a week ago

1 0 2 557 1

0 0 0 101 0

View Details

pre.dev @predotdev

5 days ago

On the latest episode of the Artificial Intelli-Gents Podcast, Adam (@adampredev) and Arjun (@ArjunRajJain) discuss the growing demand for forward-deployed engineers as AI-native startups rapidly ship new products and post tremendous ARR growth. Our partnership with Pangea (@pangea_ai) allows founders building with predev to access top-tier, hands-on tech talent to help navigate their GTM growth, all while orchestrating their product roadmap through our architecture-first coding agent.

Adam Elkassas @adampredev

5 days ago

Going from $0 to $1M ARR is easier, and faster, than ever. What happens to your MVP as you try to scale from $1M to $10M ARR is where the reality check hits. Coding agents are incredible for getting a prototype in front of users and driving early revenue. But viably translating

5 0 4 452 0

2 0 2 224 0

View Details

pre.dev @predotdev

5 days ago

We believe the single biggest point of failure is before any code is written. This is solved with the planning layer - something that no other company has touched on adequately. Claude dynamic workflows might be a type of decomposition strategy similar to RLM, but that is still the execution layer. If you can estimate complexity and costs up front -> then you are plugging the biggest token leaks before they start.

0 0 0 18 0

View Details

pre.dev @predotdev

6 days ago

How do you compete in the AI coding game against giants like Anthropic and OpenAI? You don't outspend them. You out-engineer their harnesses. "The harness matters. Especially when it comes to cost efficiency and token usage. If you can get both an intelligence gain and a cost gain, then that’s really how you're chasing the edge in this coding game.” We break down how we just broke the model cost continuum. 1. On off-the-shelf SDKs: "We hand-built this harness because we tried the Claude Code SDK and the Open Code SDK. They both had gaping leaks." 2. On picking a benchmark that actually mirrors real engineering: "You have two types of benchmarks. Some just introduce isolated patches, but they aren't really multi-agent turn. We picked Terminal Bench because it tests multi-file, multi-turn execution right in the terminal." 3. On the results that broke the cost curve: "Our Haiku scored higher than Sonnet 4.5 on the leaderboard. We did about 16% better than Claude Code on Haiku. That’s a three times cost difference, and we somehow jumped that." 4. On the core thesis of next-gen software engineering: "The harness matters. Especially when it comes to cost efficiency and token usage. If you can get both an intelligence gain and a cost gain, then that’s really how you're chasing the edge in this coding game."

pre.dev @predotdev

3 weeks ago

If you have been using Claude Code professionally, take a minute to read this. We beat Opus with Sonnet by using the predev harness. Here is what it means for agentic coding: Orchestration beats brute reasoning. A smaller model running on our architecture just beat Claude Opus

2 1 3 878 0

3 3 5 510 0

View Details

pre.dev @predotdev

6 days ago

🃏

Adam Elkassas @adampredev

6 days ago

Coding with AI feels a lot like poker: I have to wager tokens with high variance on whether the solution is legitimate, while balancing the possibility that the opponent (the agent) is bluffing me. I have imperfect information because I can't account for every single bash command

0 0 0 142 0

0 0 0 125 0

View Details

pre.dev @predotdev

6 days ago

@elena_builds Exactly. There is a reason everyone is rushing to optimizing the harness. It's where efficiency and token ROI gains are greatest.

0 0 0 17 0

View Details

pre.dev @predotdev

6 days ago

Full Episode: x.com/predotdev/stat…

pre.dev @predotdev

a week ago

1 0 2 557 1

0 0 0 81 0

View Details

pre.dev @predotdev

7 days ago

instead of a tokenmaxxing leader board for devs we should have a leader board for enterprises wdyt?

pre.dev @predotdev

2 weeks ago

Uber spent its entire annual AI budget in one quarter. The creator of Openclaw burns $1M a month on Codex tokens. What is the true ROI on AI token spend? Here is how we measure and increase it. When Fortune 100 CFOs question tokenmaxxing and Microsoft cancels Claude

2 0 4 433 0

0 0 0 91 0

View Details

pre.dev @predotdev

7 days ago

Can a custom software harness make a low-tier model outperform a premium frontier model that costs three times as much? Most enterprise teams are facing skyrocketing token bills because they think raw capital is the only path to intelligence. On the latest episode of the Artificial Intelli-Gents Podcast, predev co-founders Adam and Arjun break down exactly why the industry has hit a model cost wall. Instead of waiting for updates from foundational labs, Adam Elkassas (@adampredev) and Arjun Raj Jain (@ArjunRajJain) hand-built a native cloud harness from scratch to fix the severe memory leaks and structural shortcuts found in standard SDKs. They ran their system against Terminal Bench, the most rigorous multi-file coding benchmark in the industry. The results broke the standard model cost continuum: Running on a low-cost tier like Claude Sonnet, predev’s native harness outperformed Anthropic’s own Claude Code running on premium Opus 4.5. How do you manufacture that kind of lopsided intelligence gain while cutting token costs by two-thirds? Here is a breakdown of the episode and how they out-architected foundational labs valued at hundreds of billions: - Why Terminal-Bench: The truth about why standard benchmarks fail to test true, multi-turn agent capability in the wild. - Breaking the Cost Continuum: The exact mechanics of how predev helps users maximize intelligence per token without breaking the bank. - What Makes Our Harness Unique: Moving past simple for-loops into long-horizon planning layers and long-term execution graphs. - Building Custom Browser Agents: How Arjun built a specialized browser agent layer that runs at 3x the speed and 1/3 the cost of market alternatives. - Implementing Production RLM: The blueprint behind being the first team to truly implement Recursive Language Models to achieve unlimited execution depth. - The Reality of Enterprise AI: Why raw agents fail out of the box in production, and how forward-deployed engineers scale MVPs to enterprise security standards. - Upcoming Releases: A sneak peek into predev's next-gen CLI release, local-to-cloud syncing, and multi-session isolated sandboxes. While frontier labs pour tens of billions into physical data centers, the real software alpha is being captured at the orchestration layer. If you want to see how architectural execution beats raw compute capital, this episode is your blueprint. Full episode in the quoted post below.

pre.dev @predotdev

a week ago

1 0 2 557 1

0 1 1 182 0

View Details

Adam Elkassas @adampredev

a week ago

Harness “engineering” is a combination of research and engineering. It takes both

0 1 0 63 0

View Details

pre.dev @predotdev

a week ago

The Artificial Intelli-Gents Ep. 7: We Beat Claude Opus with a Smaller Model We did it: We dropped an entire model tier and still finished ahead. By proving that orchestration beats brute reasoning, our predev harness paired with Sonnet 4.6 (56.2%) officially beat Claude Code running Opus 4.5 (53.9%) on the grueling Terminal-Bench 2.0. The best part? We achieved higher accuracy while significantly cutting our per-task token bill. Also in episode 7 of The Artificial Intelli-Gents, we dive into the current state of RLM, why "harness engineering" is the next frontier, and how to leverage forward-deployed engineers. Timestamps: 2:28 - Why Terminal-Bench 9:33 - How we did it & what the results mean 18:05 - What makes our harness unique 27:00 - Building our own Browser Agents 34:35 - Implementing Recursive Language Models 41:20 - The future of benchmarks 51:00 - Forward Deployed Engineers & Dev Shops 1:04:00 - The harness of harnesses 1:17:37 - Upcoming Releases @adampredev @ArjunRajJain

1 0 2 557 1

View Details

pre.dev @predotdev

a week ago

@adampredev @ArjunRajJain lets go Intelli-Gents are back!

0 0 0 21 0

View Details

pre.dev @predotdev

a week ago

Another state of the union for AI coding live and uncut from the founders of predev

Adam Elkassas @adampredev

a week ago

Episode 7 of the Intelli-Gents Podcast out now! with @ArjunRajJain @predotdev We speak about topics ranging from TBench2.0, harness engineering, coding agent self-improvement, RLM, and more! youtube.com/watch?v=GB36g2…

1 0 1 197 0

0 1 2 90 0

View Details

pre.dev @predotdev

a week ago

Founder live sessions continue

Adam Elkassas @adampredev

a week ago

@ArjunRajJain and I recorded a 90 min episode of the Intelli-gents. We are releasing this on YouTube on Monday. We touch on topics such as coding benchmarks, RLM, self-improvement, harness engineering, and more!