Control plane
Draft
Detail editor for one draft. Save, approve into queue, or reject out of the editorial flow.
local DBprivate
Source
2026-03-12 02:02:45.000000
Exponential improvements* everywhere for those with the eyes to see them. This is a cool benchmark, and was impossible for early non-reasoner LLMs to do at all.
* Okay, technically "logistic improvement" because the maximum score is bounded at 100 (and logistic has a lower AIC) https://t.co/9kVj4o7Gz0
Quoted original
Justin Waugh (@JustinWaugh) · Tue Mar 03 16:06:29 +0000 2026
(1/N) Pencil Puzzle Bench is out!
51 LLMs tested on pencil puzzles (multi-step, logical reasoning, verifiable at each step)
Dataset: 62k unique puzzles, 94 types.
Evaluation: covers 300 puzzles across 20 types
Best score: GPT 5.2@xhigh 56%, half the puzzles are still unsolved https://t.co/R7vLAaorW2
Draft text
Req 2026-03-12T0231-TOP1
Queue membership is preserved when editing an already approved draft.