US 11,755,534 B2
	Data caching method and node based on hyper-converged infrastructure
Chin-Hsing Hsu, New Taipei (TW)
Assigned to QNAP SYSTEMS, INC., New Taipei (TW)
Filed by QNAP SYSTEMS, INC., New Taipei (TW)
Filed on Jul. 21, 2020, as Appl. No. 16/934,048.
Application 16/934,048 is a continuation of application No. 15/432,753, filed on Feb. 14, 2017, granted, now 11,436,347.
Claims priority of application No. 108126666 (TW), filed on Jul. 26, 2019.
Prior Publication US 2021/0026809 A1, Jan. 28, 2021
Int. Cl. G06F 7/00 (2006.01); G06F 16/172 (2019.01); G06F 16/14 (2019.01); G06N 20/00 (2019.01); G06F 16/182 (2019.01); G06F 18/214 (2023.01)

CPC G06F 16/172 (2019.01) [G06F 16/144 (2019.01); G06F 16/156 (2019.01); G06F 16/1824 (2019.01); G06F 18/214 (2023.01); G06N 20/00 (2019.01)]

10 Claims

1. A data caching method based on a hyper-converged infrastructure comprising a plurality of nodes wherein a computing node of the nodes executes a machine learning training program and prefetches computing data required for executing the machine learning training program from a data node of the nodes, the computing node comprising a cache memory having a higher read/write speed than a hard disk drive, the data caching method comprising steps of:

the machine learning training program designating the computing data to be prefetched before using the computing data as training samples to train a machine learning model;

the machine learning training program requesting the computing node to prefetch the computing data designated by the machine learning training program before using the computing data by calling a cache population function, provided by a machine learning framework to communicate with a file system client operating in the computing node, and providing a population parameter;

the machine learning framework requesting the file system client to store the computing data corresponding to the population parameter into the cache memory and the file system client storing the computing data in the cache memory in response to the cache population function wherein the computing data comprises a complete file, all files and subdirectory content in a directory or all files listed in a file-listing document so that the computing node acquires the computing data from the data node and stores the computing data in the cache memory as requested by the machine learning training program;

the machine learning training program receiving the computing data from the cache memory as the training samples to train the machine learning model;

the machine learning training program designating the computing data to be discarded when the machine learning training program determines that the computing data stored in the cache memory is no longer required for training the machine learning model;

the machine learning training program requesting the computing node to discard the computing data designated by the machine learning training program by calling a cache discard function provided by the machine learning framework and providing a discard parameter; and

the machine learning framework requesting the file system client to discard the computing data corresponding to the discard parameter from the cache memory and the file system client discarding the computing data from the cache memory in response to the cache discard function wherein the computing data comprises a complete file, all files and subdirectory content in the directory or all files listed in the file-listing document so that the computing node discards the computing data from the cache memory as requested by the machine learning training program.