CPC G06Q 30/0201 (2013.01) [G06F 16/285 (2019.01); G06N 3/044 (2023.01); G06N 3/08 (2013.01); G06N 20/00 (2019.01); G06Q 40/04 (2013.01); G06Q 40/06 (2013.01)] | 10 Claims |
1. A computer system comprising:
a processor;
a tangible computer-readable medium containing computer-executable instructions that when executed by the processor cause the computer system to pre-process a collection of raw market data for use by a machine learning computer by performing the steps comprising:
(a) receiving, from a client computer via an electronic communication network, a collection of raw market data that includes time stamps, price levels and order quantities, the collection of raw market data characterized by a first size;
(b) determining, for each time stamp, a difference in order quantity at each price level when compared to order quantity at the same price level at the previous time stamp;
(c) partitioning the collection of raw market data into a sequence of time period windows, comparing order quantities prior to a time period window to order quantities within the time period window, and determining quantiles for changes in order quantities;
(d) dividing the determined differences into predefined portions, each of which is characterized by one of a plurality of categories, each category being assigned to the time period window in accordance with the division of the determined differences and the determined quantiles;
(e) generating a new pre-processed data set comprising the sequence of time period windows, each of which includes a multi-dimensional one-hot binary vector encoding of the plurality of categories representative of each price level and time stamp therein, the new pre-processed data set characterized by a second size less than the first size;
(f) transmitting the new pre-processed data set as input to a computer system that executes a machine learning algorithm, wherein the execution of the machine learning algorithm includes training a recurrent neural network to identify structure in the pre-processed data and executing a lossy encoded compression to compress the sequence of time period windows to provide a feature mapping from the sequence of time period windows to a feature space, wherein the lossy encoded compression of the sequence removes noise from the sequence of time period windows while retaining the unique features of the feature space; and
(g) outputting the compressed sequence of time period windows to a display for user interaction.
|