Foundation Model

AlphaMind Prediction Engine

A Time Series Foundation Model, Trained for FX and Futures

A decoder-only Transformer that learns the “language” of K-line sequences. We tokenize price action, train autoregressively on FX and futures data, and decode predictions back as probabilistic price paths.

AlphaMind Prediction Engine

A Time Series Foundation Model, Trained for FX and Futures

Sequence Ingestion

OHLCV stream

Hierarchical Tokens

Coarse + fine subtokens

Autoregressive Forecast

Token-by-token generation

Decoder-only Transformer · Self-supervised · End-to-end deep learning

01 — Premise

What is a Time-Series Foundation Model

A foundation model is a single large network pre-trained on a massive corpus in a self-supervised way, then specialized to downstream tasks. GPT did this for natural language. Kronos-style architectures do it for K-lines.

The AlphaMind Prediction Engine is a decoder-only Transformer pre-trained on the “language” of financial time series. It treats a sequence of OHLCV candles the way GPT treats a sequence of words — tokenize, learn the conditional distribution, and sample forward.

Two stages: (1) a learned tokenizer that converts continuous OHLCV into hierarchical discrete tokens, and (2) a causal Transformer that autoregressively predicts the next token. At inference, we sample many token paths, decode them back into prices, and aggregate into a probabilistic forecast.

02 — Architecture

Three stages, end-to-end

Price encoding → reasoning core → signal reconstruction. The animation above shows one full inference pass.

Stage 1

Price Encoding

A learned tokenizer maps continuous OHLCV streams into hierarchical discrete tokens — each token splits into a coarse-grained subtoken (k_c bits) and fine-grained subtoken (k_f bits) via Binary Spherical Quantization.

Token = (BSQ_c(x), BSQ_f(x)) ∈ {0,1}^(k_c+k_f)
Stage 2

Reasoning Core

A causal Transformer with N stacked layers — multi-head self-attention, feed-forward networks, residuals, layer-norm. Trained autoregressively with cross-entropy on next-token prediction.

L = − Σₜ log P(tokₜ₊₁ | tok₁:ₜ; θ)
Stage 3

Signal Reconstruction

Generated tokens get decoded back into OHLCV via the tokenizer's decoder. We sample many paths, aggregate into a probabilistic distribution, and return the median path with confidence bands.

P(x_{t+1:T} | x_{1:t}) ≈ {x^(i)}_{i=1..N}
03 — Training Data

Specialized for FX and Futures

The engine is not a general-purpose time-series model — it is pre-trained specifically on the markets we care about.

Universe

FX major + cross pairs

EUR/USD, GBP/USD, USD/JPY, USD/CHF, AUD/USD, USD/CAD, NZD/USD plus major crosses. M1 to H1 candles with ≥10 years of history per pair.

Universe

Commodity & Index Futures

Gold, silver, crude oil, Brent, natural gas, copper; S&P 500, NASDAQ-100, Dow, DAX, Nikkei 225 futures. M1 to H1, continuous contracts adjusted for roll.

04 — Specialization

Why specialize, not generalize

A generic time-series foundation model trained on every dataset under the sun (weather, traffic, retail) will be average everywhere. We made an explicit trade-off: narrow universe, deep specialization.

FX and futures share crucial properties — 24/5 liquidity, dollar-denominated quotes, microsecond-resolution order flow, well-defined session structure. A single tokenizer + a single Transformer can model them as one “dialect” of the same market language.

Crypto, equities, options — different microstructure, different volatility regimes, different sessions. We deliberately leave them out so the engine doesn't average toward a market it isn't built for.

05 — Inference

Probabilistic, not point-estimate

The engine doesn't emit a single number. For each forecast horizon we sample N token paths from the model, decode each back into a price path, and aggregate them into a full probabilistic distribution P(x_{t+1:T} | x_{1:t}).

What downstream gets: a median path, percentile bands (10/25/50/75/90), and a calibrated probability that price exits a given range within the horizon. Every signal is sized against this distribution — not a single point estimate.

06 — Inputs

The engine doesn't see raw prices alone

Six classical and modern quantitative models pre-process the input — regime, volatility, denoising, frequency, memory, decomposition. The engine sees a feature-engineered embedding, not a raw OHLCV stream.

Multi-Model Feature Engineering Framework

Don't Miss the Only AI That Actually Helps You Win in Day Trading

No complex jargon. No noise. No confusing indicators
Just clear, actionable signals.