Foundation Model

AlphaMind Prediction Engine

A Time Series Foundation Model, Trained for FX and Futures

A decoder-only Transformer that learns the “language” of K-line sequences. We tokenize price action, train autoregressively on FX and futures data, and decode predictions back as probabilistic price paths.

AlphaMind Prediction Engine

A Time Series Foundation Model, Trained for FX and Futures

Sequence Ingestion

OHLCV stream

Hierarchical Tokens

Coarse + fine subtokens

Autoregressive Forecast

Token-by-token generation

Decoder-only Transformer · Self-supervised · End-to-end deep learning

01 — Premise

What is a Time-Series Foundation Model

A foundation model is a single large network pre-trained on a massive corpus in a self-supervised way, then specialized to downstream tasks. GPT did this for natural language. Kronos-style architectures do it for K-lines.

The AlphaMind Prediction Engine is a decoder-only Transformer pre-trained on the “language” of financial time series. It treats a sequence of OHLCV candles the way GPT treats a sequence of words — tokenize, learn the conditional distribution, and sample forward.

Two stages: (1) a learned tokenizer that converts continuous OHLCV into hierarchical discrete tokens, and (2) a causal Transformer that autoregressively predicts the next token. At inference, we sample many token paths, decode them back into prices, and aggregate into a probabilistic forecast.

02 — Architecture

Three stages, end-to-end

Price encoding → reasoning core → signal reconstruction. The animation above shows one full inference pass.

Stage 1

Price Encoding

A learned tokenizer maps continuous OHLCV streams into hierarchical discrete tokens — each token splits into a coarse-grained subtoken (k_c bits) and fine-grained subtoken (k_f bits) via Binary Spherical Quantization.

Token = (BSQ_c(x), BSQ_f(x)) ∈ {0,1}^(k_c+k_f)

Stage 2

Reasoning Core

A causal Transformer with N stacked layers — multi-head self-attention, feed-forward networks, residuals, layer-norm. Trained autoregressively with cross-entropy on next-token prediction.

L = − Σₜ log P(tokₜ₊₁ | tok₁:ₜ; θ)

Stage 3

Signal Reconstruction

Generated tokens get decoded back into OHLCV via the tokenizer's decoder. We sample many paths, aggregate into a probabilistic distribution, and return the median path with confidence bands.

P(x_{t+1:T} | x_{1:t}) ≈ {x^(i)}_{i=1..N}

03 — Training Data

Specialized for FX and Futures

The engine is not a general-purpose time-series model — it is pre-trained specifically on the markets we care about.

Universe

FX major + cross pairs

EUR/USD, GBP/USD, USD/JPY, USD/CHF, AUD/USD, USD/CAD, NZD/USD plus major crosses. M1 to H1 candles with ≥10 years of history per pair.

Universe

Commodity & Index Futures

Gold, silver, crude oil, Brent, natural gas, copper; S&P 500, NASDAQ-100, Dow, DAX, Nikkei 225 futures. M1 to H1, continuous contracts adjusted for roll.

04 — Specialization

Why specialize, not generalize

A generic time-series foundation model trained on every dataset under the sun (weather, traffic, retail) will be average everywhere. We made an explicit trade-off: narrow universe, deep specialization.

FX and futures share crucial properties — 24/5 liquidity, dollar-denominated quotes, microsecond-resolution order flow, well-defined session structure. A single tokenizer + a single Transformer can model them as one “dialect” of the same market language.

Crypto, equities, options — different microstructure, different volatility regimes, different sessions. We deliberately leave them out so the engine doesn't average toward a market it isn't built for.

05 — Inference

Probabilistic, not point-estimate

The engine doesn't emit a single number. For each forecast horizon we sample N token paths from the model, decode each back into a price path, and aggregate them into a full probabilistic distribution P(x_{t+1:T} | x_{1:t}).

What downstream gets: a median path, percentile bands (10/25/50/75/90), and a calibrated probability that price exits a given range within the horizon. Every signal is sized against this distribution — not a single point estimate.

06 — Inputs

The engine doesn't see raw prices alone

Six classical and modern quantitative models pre-process the input — regime, volatility, denoising, frequency, memory, decomposition. The engine sees a feature-engineered embedding, not a raw OHLCV stream.

Multi-Model Feature Engineering Framework

Don't Miss the Only AI That Actually Helps You Win in Day Trading

No complex jargon. No noise. No confusing indicators
Just clear, actionable signals.

Start Trading Smarter

AlphaMind Prediction Engine