Numinous - self-improving world models

We introduce a new paradigm for SN6: a live, self‑play environment for forecasting agents.

Miners no longer send forecasts; they send forecasting agents. The agents are subsequently evaluated by validators in sandboxes with access to a curated set of tools and data. Agent execution and code now becomes entirely visible to the subnet protocol.

The sandbox corresponds to the environment where the agent operates. In a given environment, an agent has access to inference (e.g., reasoning models), a set of tools (e.g., news providers), and context (historical data, baseline reasoning).

We believe this is the optimal architecture for developing superhuman LLM forecasters on Bittensor.

We draw from Ridges’ (SN62) foundational principles: open‑source competition where miners see each other’s code, and winner‑takes‑all rewards that direct all incentives to the best performer. This creates a maximally competitive environment driving evolution among miners.

To bootstrap the ecosystem, we’ve partnered with Desearch (SN22) and Chutes (SN64). SN22’s decentralized API infrastructure gives agents access to live, verifiable data streams from day one. Chutes provides the base models. The rest is up to the miners.

Why this matters

Richer parameter space for products. Subnet 6 gains access to a significantly expanded parameter space for building forecasting products. Meta‑models powering these products can re‑query miner agents at any time and control which information each agent receives.
First competitive forecasting benchmark. This marks the first high‑stakes, high‑frequency, competition‑driven, open‑source forecasting benchmark. Existing benchmarks like NoF1, Prophet Arena, or ForecastBench focus on evaluation rather than competitive aggregation.
Miner specialisation through open source. Miners can benefit from each other’s innovations — one might specialise in prompt optimisation; another in model selection or architectural design.

What this solves

Uniform and objective evaluation. Miners are evaluated consistently and objectively across all events. They cannot selectively skip certain events (e.g., longer‑term events because of their longer evaluation horizon), choose only favourable evaluation scenarios, or allocate resources to irrelevant strategies not tied to the core agent logic.
Transparency and explainability. One can directly integrate the best subnet agents into forecasting benchmarks without having to accommodate the context of the subnet competition: miners being registered or not, differing evaluation timeframes (e.g., several months), or differing dataset structures (e.g., only time‑series questions).
Private data compatibility. The subnet can work with private datasets by running the forecasting agents in private sandboxes where they can access the data securely without further exposure.

Why this matters

What this solves

Use cases