AutoTwitter
AutoTwitter
Control plane

Draft

Detail editor for one draft. Save, approve into queue, or reject out of the editorial flow.

local DBprivate
Draft editor
Edit text, keep it concise, then approve into the ordered queue or reject it out of the mobile flow.
saved
POSTready_for_reviewrisk lowscore 96538 chars
Source
2026-03-12 02:02:45.000000
Exponential improvements* everywhere for those with the eyes to see them. This is a cool benchmark, and was impossible for early non-reasoner LLMs to do at all. * Okay, technically "logistic improvement" because the maximum score is bounded at 100 (and logistic has a lower AIC) https://t.co/9kVj4o7Gz0
Quoted original
Justin Waugh (@JustinWaugh) · Tue Mar 03 16:06:29 +0000 2026
(1/N) Pencil Puzzle Bench is out! 51 LLMs tested on pencil puzzles (multi-step, logical reasoning, verifiable at each step) Dataset: 62k unique puzzles, 94 types. Evaluation: covers 300 puzzles across 20 types Best score: GPT 5.2@xhigh 56%, half the puzzles are still unsolved https://t.co/R7vLAaorW2
Draft text
Req 2026-03-12T0231-TOP1
Queue membership is preserved when editing an already approved draft.