System Overview

ML-Driven Edge Detection
for Prediction Markets

74 Data Sources  |  Category Ensemble AUC 0.686  |  Backtested Sharpe 3.97

74
Data Sources
335
Features
256,296
Markets
19
Services
55.9%
Win Rate
3.97
Sharpe
Development Pipeline
01Data InfrastructureComplete
74 data sources across 14 categories. 86 harvest scripts. data_processor.py (1,800+ lines) produces 335 features per market. Output: market_features.parquet (256,296 x 335, 33 MB).
02Model Training60%
Category ensemble AUC 0.686 (+6.2%). 6 custom models (Sports 0.955, Health 0.794, Entertainment 0.731). LightGBM 80% + MLP 10% + XGBoost 10%. Autoresearch: 129+ experiments on 3090.
03Live Market FeedComplete
live_scanner.py (1,501 lines). Gamma API polling, 246 features with batch enrichment, model scoring in 4.4 seconds.
04Real-Time FeaturesComplete
246 features generated per market. Classification (instant), financial markets (near-instant), batch enrichment (volume rank, uniqueness, network).
05Edge DetectionComplete
Bayesian blend: P_final = 0.3 * P_model + 0.7 * P_market. Trump-code + MiroFish enrichment.
06Position SizingComplete
Kelly criterion (25% fractional). Max 10% per position, 50% portfolio, 30% per category. 20% drawdown breaker.
07Liquidity FilterComplete
Order book depth, bid-ask spread analysis, slippage estimation. Rejects trades with slippage > 1%.
08ExecutionComplete
Paper trading (default) + live via py-clob-client. Limit orders, manual approval, kill switch.
09P&L TrackingComplete
Sharpe ratio, max drawdown, Calmar, calibration analysis. Category-level breakdown. CSV export.
10Live News & SentimentComplete
7 real-time sources: USGS, HN, Reddit, GDELT, NWS, Google Trends, financial markets.
Phase 1

74 Data Sources

Organized across 14 categories with full 4-year coverage (2023-2026)

Phase 2

Model Architecture

Category-specific ensemble with Bayesian market-price anchoring and multi-agent AI consensus

Base Model
0.745validation AUC
LightGBM · 232 trees · 247 features · 15s training
Category Ensemble
0.686+6.2%
6 custom models routing by market category
MLP Neural Net
0.661
3-layer MLP (128-64-32) · 10% ensemble weight
Bayesian Blend (Live Scoring)
Pfinal = 0.3 × Pmodel + 0.7 × Pmarket
Market price receives 70% weight. Model identifies mispricings via 30% contribution. Adversarial AUC = 1.0 confirms significant distribution shift.
MiroFish AI Consensus
AnalystContrarianInsiderBayesianSuperforecaster
5 agents debate each opportunity via Kimi/Moonshot LLM. Confidence-weighted consensus.
Validation

Backtesting Results

Out-of-sample performance on 7,346 resolved 2026 markets

55.9%
Win Rate
4.49
Profit Factor
3.97
Sharpe Ratio
8.5%
Max Drawdown
Category Performance
CategoryTradesWin RateStatus
Geopolitics1070.0%Best
Other9861.2%Strong
Crypto19456.2%Good
Sports5143.1%Weak
Monthly P&L
Key Metrics
Markets Scanned: 7,346
Trades Taken: 358 (4.9% selectivity)
Avg Edge at Entry: 9.6%
Edge Accuracy: 55.9%
Live System

Paper Trading

Active paper trading with $500 bankroll, scanning every 5 minutes

Bankroll
$500
Deployed
$250
50% exposure
Positions
5
P&L
$0.00
Open Positions
#MarketSideEntrySizeEdge
1BTC above $70K March 16NO0.983$50-27.0%
2BTC above $72K March 16NO0.885$50-24.2%
3No change Fed rates AprilNO0.935$50-24.2%
4BTC reach $75K MarchNO0.855$50-22.3%
5Gen.G vs JD Gaming (LoL)NO0.885$50-21.8%
Documentation

Technical Whitepaper

Comprehensive system documentation covering architecture, data, models, risk management, and competitive edge

Interactive Visualizations

Visualization Suite

11 interactive dashboards β€” click any card to explore

Satellite Intelligence
πŸ›°οΈ
Global Satellite Map
ESRI satellite imagery with geopolitical markers, live USGS earthquakes, fire zones, chokepoint routes. 17 monitored targets.
Three.js Simulation
πŸŒ™
Moon & Sun 3D
Real-time 3D orbiting moon with procedural terrain, glowing sun with corona, starfield. Live phase calculation + trading signal HUD.
3D Mathematical Surfaces
πŸ“
Quant Viz Engine
6 interactive 3D surfaces: Black-Scholes PDE, Heston Vol Smile, Lorenz Attractor, Efficient Frontier, Greeks, Polymarket Surface.
Market Intelligence
πŸ“Š
Treemap & Sentiment
Bloomberg-style market treemap (size=volume, color=edge). 5 animated gauges: VIX Fear, Crypto F&G, Social Buzz, Stress, Confidence.
Network Analysis
πŸ•ΈοΈ
Market Graph & Agent Radar
Force-directed market similarity graph. MiroFish 5-agent debate radar showing Analyst, Contrarian, Insider, Bayesian, Superforecaster.
Advanced Analytics
πŸ”¬
Solar/Lunar Β· Sankey Β· Drawdown
Lunar cycle win rates, trade flow Sankey diagram, drawdown underwater chart with circuit breaker overlay.
Model Diagnostics
🧠
Sunburst Β· Calibration Β· Heatmap
Feature importance sunburst, prediction calibration reliability plot, daily win rate calendar heatmap.
Trading Tools
πŸ”§
Kelly Β· Vol Bands Β· Order Flow
Interactive Kelly calculator with live sliders, VIX volatility regime bands with FOMC markers, order flow waterfall chart.
Satellite Data
🌍
24 Satellite Features
NDVI vegetation cycle, daylight hours, fire risk heatmap, economic activity gauges, Maxar coverage, urban activity, 24 feature cards.
Real-Time
⚑
Live WebSocket Dashboard
Auto-updating opportunities table, P&L charts, news feed, portfolio panel. Scans every 60 seconds via WebSocket.
System Design

Architecture

Data
74 Sources
Features
335 Signals
ML Model
AUC 0.686
Bayesian
30/70 Split
Edge
9.6% Avg
Kelly
25% Frac
Executor
Paper/Live
News Monitor
7 real-time sources. USGS, HN, Reddit, GDELT, NWS, Trends, Financial.
Trump-Code
100 validated rules. 414 days Truth Social. 6.2hr early signal window.
MiroFish
5 AI agents via Kimi/Moonshot. Analyst, Contrarian, Insider, Bayesian, Superforecaster.
Reference

Commands

Trading Commands
# Full orchestrator python -m services.orchestrator --bankroll 500 --interval 300 # Scan only python -m services.live_scanner --top 20 --min-volume 5000 # Paper trading python run_paper_trading.py --bankroll 500 --loop 300 # Dashboard server python -m services.dashboard_server --port 8080
3090 VM Commands
# Check autoresearch ssh douglaswhittingham@10.0.0.3 "tail ~/autoresearch_log.txt" # Train model ~/ml_venv/bin/python ~/train_model.py # Category models python -m backtesting.scripts.train_category_models # Historical backtest python -m backtesting.scripts.backtest_engine --bankroll 500
Main Dashboard
Hero Stats
Key system metrics. 74 data sources feed 335 features across 256,296 markets. Win rate and Sharpe from out-of-sample 2026 backtesting.
Pipeline
10-phase development roadmap. Click phases to expand. All complete except model optimization (ongoing autoresearch on 3090).
Data Sources
All 74 sources in 14 categories. Green dot = full 2023-2026 coverage. Ranges from Reddit to lunar phases to satellite imagery.
Models
Base LightGBM improved by category routing (+6.2%). Bayesian blend: 30% model / 70% market price. MiroFish = 5 AI agents debating each market.
Backtesting
Simulated trading on 7,346 resolved 2026 markets. Win rate 55.9%, profit factor 4.49, Sharpe 3.97.
Paper Trading
Live simulated trades, $500 bankroll. BUY NO = model thinks overpriced. Edge = model vs market disagreement.
Whitepaper
Full 14-section technical documentation. Click to expand each section.
DASHBOARD LIVE 3D VIZ SATELLITE TREEMAP NETWORK ANALYTICS MODEL TOOLS