US 11,818,165 B2
Malware detection using machine learning
Joseph H. Levy, Farmington, UT (US)
Assigned to Sophos Limited, Abingdon (GB)
Filed by Sophos Limited, Abingdon (GB)
Filed on Nov. 13, 2020, as Appl. No. 17/098,084.
Application 17/098,084 is a continuation of application No. 15/864,329, filed on Jan. 8, 2018, granted, now 10,841,333.
Prior Publication US 2021/0144154 A1, May 13, 2021
This patent is subject to a terminal disclaimer.
Int. Cl. H04L 9/40 (2022.01); G06N 20/00 (2019.01); G06F 21/56 (2013.01)
CPC H04L 63/145 (2013.01) [G06F 21/56 (2013.01); G06N 20/00 (2019.01); H04L 63/1416 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A computer program product comprising computer executable code embodied in a non-transitory computer readable medium that, when executing on one or more computing devices, performs the steps of:
providing a first training set including a plurality of malware samples;
configuring a first antimalware system to detect the malware samples;
characterizing one or more functional blocks of the malware samples by extracting abstracted features, functions, or behaviors of the malware samples to provide characterizations of the one or more functional blocks;
generating a first number of synthetic malware samples including modifications of the one or more functional blocks of the malware samples based on the characterizations;
validating the first number of synthetic malware samples to provide a validated sample set containing one or more of the first number of synthetic malware samples that execute and perform an unwanted task in a target computing context;
filtering the validated sample set to provide a filtered sample set containing one or more of the first number of synthetic malware samples in the validated sample set that are not detected by the first antimalware system;
creating a second antimalware system by training a machine learning malware detection engine to detect malicious code including the one or more of the first number of synthetic malware samples in the filtered sample set;
repeating one or more of generating, validating, filtering, or training until a predetermined threshold for positive detection by the second antimalware system is reached; and
deploying the second antimalware system on an endpoint.