US 11,704,682 B2
Pre-processing financial market data prior to machine learning training
Ari L. Studnitzer, Northbrook, IL (US); David John Geddes, Antrim (GB); Inderdeep Singh, Palatine, IL (US); Steven Hutt, Sutton (GB); and Bernard Pieter Hosman, Amsterdam (NL)
Assigned to Chicago Mercantile Exchange Inc., Chicago, IL (US)
Filed by Chicago Mercantile Exchange Inc., Chicago, IL (US)
Filed on Jul. 5, 2017, as Appl. No. 15/642,038.
Claims priority of provisional application 62/359,007, filed on Jul. 6, 2016.
Prior Publication US 2018/0012239 A1, Jan. 11, 2018
Int. Cl. G06N 20/00 (2019.01); G06Q 40/04 (2012.01); G06N 3/08 (2023.01); G06N 3/044 (2023.01); G06Q 30/0201 (2023.01); G06F 16/28 (2019.01); G06Q 40/06 (2012.01)
CPC G06Q 30/0201 (2013.01) [G06F 16/285 (2019.01); G06N 3/044 (2023.01); G06N 3/08 (2013.01); G06N 20/00 (2019.01); G06Q 40/04 (2013.01); G06Q 40/06 (2013.01)] 10 Claims
OG exemplary drawing
 
1. A computer system comprising:
a processor;
a tangible computer-readable medium containing computer-executable instructions that when executed by the processor cause the computer system to pre-process a collection of raw market data for use by a machine learning computer by performing the steps comprising:
(a) receiving, from a client computer via an electronic communication network, a collection of raw market data that includes time stamps, price levels and order quantities, the collection of raw market data characterized by a first size;
(b) determining, for each time stamp, a difference in order quantity at each price level when compared to order quantity at the same price level at the previous time stamp;
(c) partitioning the collection of raw market data into a sequence of time period windows, comparing order quantities prior to a time period window to order quantities within the time period window, and determining quantiles for changes in order quantities;
(d) dividing the determined differences into predefined portions, each of which is characterized by one of a plurality of categories, each category being assigned to the time period window in accordance with the division of the determined differences and the determined quantiles;
(e) generating a new pre-processed data set comprising the sequence of time period windows, each of which includes a multi-dimensional one-hot binary vector encoding of the plurality of categories representative of each price level and time stamp therein, the new pre-processed data set characterized by a second size less than the first size;
(f) transmitting the new pre-processed data set as input to a computer system that executes a machine learning algorithm, wherein the execution of the machine learning algorithm includes training a recurrent neural network to identify structure in the pre-processed data and executing a lossy encoded compression to compress the sequence of time period windows to provide a feature mapping from the sequence of time period windows to a feature space, wherein the lossy encoded compression of the sequence removes noise from the sequence of time period windows while retaining the unique features of the feature space; and
(g) outputting the compressed sequence of time period windows to a display for user interaction.