AfterQuery @AfterQuery

Applied research lab curating data solutions to accelerate foundation model development. afterquery.com Joined February 2025

Tweets

28
Followers

1K
Following

6
Likes

64

AfterQuery @AfterQuery

11 hours ago

Blog: afterquery.com/blog/on-policy… Interested in research or other roles at AfterQuery? Check this out! afterquery.com/careers

0 0 4 133 0

View Details

AfterQuery @AfterQuery

11 hours ago

x.com/i/article/2063…

3 4 21 590 7

View Details

Spencer Mateega @spencermateega

2 months ago

@AfterQuery will be at ICLR next week! We’ll be at booth 404. Happy to chat about anything related to tool use/agents, RL environments, code gen, or evals. DM me if you wanna meet up!

1 2 20 2K 1

View Details

Alex Shaw @alexgshaw

2 months ago

AfterQuery post-trained GPT-OSS-20B using Harbor + Tinker and saw a 14% bump on TB2 performance. Love seeing people pick up Harbor for more than just evals.

AfterQuery @AfterQuery

2 months ago

x.com/i/article/2038…

15 15 87 36K 78

3 16 71 13K 46

View Details

AfterQuery @AfterQuery

2 months ago

afterquery.com/blog/terminal-…

0 0 4 2K 1

View Details

AfterQuery @AfterQuery

2 months ago

x.com/i/article/2038…

15 15 87 36K 78

View Details

YC and @GoogleDeepMind are hosting the Multimodal Frontier Hackathon this Saturday. Most AI apps still don't utilize the full multimodal stack. So we’re giving you access to Gemini 3.1, Lyria, & NanoBanana 2 to see what you can build! Sign up at: events.ycombinator.com/deepmind-march…

46 68 968 128K 526

0 1 10 2K 1

View Details

AfterQuery @AfterQuery

4 months ago

Paper: arxiv.org/abs/2601.20886

0 0 3 1K 2

View Details

AfterQuery @AfterQuery

4 months ago

Introducing IDE-Bench! A multi-language, full-stack benchmark evaluating LLMs acting as autonomous IDE agents IDE-Bench assesses agents' ability to navigate, reason, and modify complex repositories using the same tools available in modern AI-native IDEs like Cursor Models tested from @AnthropicAI, @OpenAI, @Alibaba_Qwen, @GoogleDeepMind, @xai, @deepseek_ai, @Meta, and @cohere Check out the full results at ide-bench.com!

8 8 19 2K 7

View Details

@

57 years ago

0 0 0 0 0

View Details

AfterQuery @AfterQuery

4 months ago

Our findings show that current models lack the ability to perform even the most basic tasks in high-impact, real-world domains like quantitative trading. We hope Market-Bench can serve as a shared framework to evaluate models’ understanding of trading strategies and code generation for quantitative finance. Excited to track how these capabilities evolve!

0 0 4 687 1

View Details

AfterQuery @AfterQuery

4 months ago

Leaderboard: marketbench.ai Paper: arxiv.org/abs/2512.12264

1 1 6 872 0

View Details

AfterQuery @AfterQuery

4 months ago

Introducing Market-Bench by @AfterQuery! The first-of-its-kind benchmark on LLMs for quantitative finance. We challenged models to attempt a frequent introductory quantitative trading task: coding an executable backtester from a natural-language strategy description and market assumptions. > 13 models build backtesting systems for directional, pair trading, and delta hedging strategies > evaluated on reliability (executable passes) and accuracy (MAE) across 5 attempts per strategy > real order book data with exchange delays and liquidity constraints > @xAI’s Grok 4 achieved the overall lowest mean MAE (deviation from the golden backtest), followed closely by @OpenAI’s GPT 5.2 > @AnthropicAI's Sonnet 4.5 and @AlibabaGroup's Qwen 3 Max at perfect executability but high MAE > Models from @Meta, @Amazon, @NVIDIA, and @Cohere continued to fail to produce executable backtesters Leaderboard & full paper below!

4 1 11 1K 3

View Details

Spencer Mateega @spencermateega

6 months ago

How far can vibe coding actually go? Introducing App-Bench by @AfterQuery, a benchmark for end-to-end web app development. We tested 6 production web apps on 10 coding agents from @OpenAI, @GoogleDeepMind, @AnthropicAI, @cursor_ai, @budapp, @v0, @boltdotnew, @Replit, and @Lovable. One shot generation. Zero human edits. 4,530 evaluations.

14 12 71 32K 42

View Details

AfterQuery @AfterQuery

7 months ago

@cigdemoztabak_ 👀

1 0 2 91 0

View Details

shrawberry @shrawberryy

7 months ago

Really excited to have contributed to this sick creative vision and brought the @AfterQuery website to life 😎😎

Spencer Mateega @spencermateega

7 months ago

92 118 784 151K 437

2 1 15 5K 5

View Details

AfterQuery @AfterQuery

7 months ago

Today, humanity is shackled by scarcity of expertise. When expertise becomes infinitely scalable, humans will be freed to tackle problems we can't even conceive of today. Introducing @AfterQuery. We’re building a world where expertise is abundant. Domain by domain, profession by profession, AfterQuery is crafting datasets that encode excellence into forms that machines can learn. Data is the final frontier.

76 15 69 10K 14

View Details

AfterQuery @AfterQuery

8 months ago

@shrawberryy @sashabirukoff shrawberry 🙌

1 0 1 259 0

View Details

Spencer Mateega @spencermateega

8 months ago

The frontier begets the frontier. I highly recommend reading @jaminball's latest Clouded Judgement article which spells out the AfterQuery thesis (thread)