US 9,811,544 B1
Management of real-time and historical streaming data
Katherine Fullington Taylor, Raleigh, NC (US)
Assigned to SAS INSTITUTE INC., Cary, NC (US)
Filed by SAS Institute Inc., Cary, NC (US)
Filed on Nov. 7, 2016, as Appl. No. 15/345,091.
Application 15/345,091 is a continuation of application No. 15/344,868, filed on Nov. 7, 2016.
Claims priority of provisional application 62/359,415, filed on Jul. 7, 2016.
Claims priority of provisional application 62/371,824, filed on Aug. 7, 2016.
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 7/00 (2006.01); G06F 17/00 (2006.01); G06F 17/30 (2006.01)
CPC G06F 17/30309 (2013.01) [G06F 17/30365 (2013.01); G06F 17/30551 (2013.01)] 30 Claims
OG exemplary drawing
 
1. A non-transitory computer readable medium comprising program code for a batch-processing engine usable to process streaming data in batches in real-time, the program code being executable by a processor for causing the processor to:
receive a request for causing the batch-processing engine to operate in a historical mode in which the batch-processing engine analyzes a dataset in a state as of a previous date, wherein the previous date is at least one day prior to a current date, and wherein the dataset includes records previously communicated to the batch-processing engine by remote entities via a network;
based on receiving the request, determine the state of the dataset as of the previous date by:
accessing a database that includes a plurality of entries, each entry in the database being associated with a respective record that was previously communicated to the batch-processing engine from a respective remote entity via the network and including:
(i) an indicator of the respective remote entity that communicated the respective record,
(ii) a respective identifier of a file that comprises the respective record, and
(iii) a respective timestamp indicating when the file was generated; and
performing a filtering process that includes, for each remote entity referenced in the database, filtering entries in the database by timestamp to determine a particular file that includes a most current record for the remote entity as of the previous date by:
receiving user input indicating the previous date;
identifying which particular entry in the database is associated with the most current record for the remote entity as of the previous date by:
obtaining, from the database, respective timestamps associated with a group of entries related to the remote entity; and
comparing the respective timestamps to one another to determine that the particular entry in the group of entries has a most current timestamp as of the previous date; and
determining that the particular file corresponds to the particular entry;
wherein the filtering process results in a determination of a plurality of files that includes the most current record as of the previous date for each remote entity referenced in the database; and
process the plurality of files concurrently as a batch of data to determine an output for the state of the dataset as of the previous date by:
joining together, in memory, records that are in the plurality of files to generate a combined group of records, the combined group of records including the most current record for each remote entity referenced in the database as of the previous date; and
processing the combined group of records to determine the output for the state of the dataset as of the previous date.