MSc Machine Learning · Thesis
Repricing-intensity modelling of high-frequency Polymarket order-book data, with terminal absorption at resolution.
Juan Mediavilla · University College London · MSc Machine Learning
Supervisor: Prof. Philip Treleaven
Abstract
Prediction markets price real-world events as continuously traded probabilities, yet the high-frequency process by which those prices form has not been studied at order-book resolution, because no public dataset of a prediction venue's live order book has existed. This thesis introduces such a dataset — a purpose-built, gap-audited, millisecond-resolution corpus of the 250 most active Polymarket markets, captured continuously and through each market's resolution — and uses it to characterise and model the latent dynamics of prediction-market price paths.
Two pre-registered probes establish the governing structure. A natural model — a regime-switching model of price variance — is decisively falsified: apparent volatility regimes do not persist at any timescale. Instead, the price is sticky and reprices in clustered bursts, and the persistent structure lives in the intensity of repricing — how often the market moves — which is strongly clustered and non-Poisson on every market tested.
The thesis's contribution is a generative model of this repricing intensity as a latent hot/cold state process (a Markov-modulated Poisson process), extended so that the intensity is modulated by time-to-resolution — the defining feature of prediction markets, in which every price is absorbed at $0 or $1 by a bounded, approximately known deadline. The model is inferred by variational methods and validated on held-out markets and against realised outcomes via the complete path-to-resolution samples in the corpus.
Contributions
A gap-audited, millisecond-resolution order-book corpus of Polymarket, and the 24/7 collection pipeline that produces it — a contribution in its own right.
The first characterisation of prediction-market price dynamics at order-book resolution, identifying repricing-intensity clustering as the governing structure.
An intensity model for terminally-absorbed price processes — modulating repricing by time-to-resolution, a model class with no standard analogue in financial time-series.
Read on
The full abstract and eight chapter micro-abstracts — context and data, the empirical discovery, the model the discovery dictates, then validation — with datasets and risks.
Open framework →A navigable ten-slide web deck: motivation, background, objectives, methodology, the three experiments, contributions and impact — with real figures and results.
Open deck →