CPC H04L 63/1483 (2013.01) [G06F 40/279 (2020.01); H04L 63/1416 (2013.01); H04L 63/1425 (2013.01); H04L 63/1433 (2013.01)] | 19 Claims |
1. A computer-implemented method for generating a first set of longest common sequences from a plurality of known malicious webpages, said first set of longest common sequences representing input data which is used to generate a set of regular expressions for detecting phishing webpages, comprising:
obtaining HTML source strings from said plurality of known malicious webpages;
transforming said HTML source strings to reduce the number of at least one of stop words and repeated tags, thereby obtaining a set of transformed source strings;
performing string alignment on said set of transformed source strings, thereby obtaining at least a scoring matrix;
obtaining a second set of longest common sequences responsive to said performing said string alignment;
filtering said second set of longest common sequences, thereby obtaining said first set of longest common sequences;
using said first set of longest common sequences to generate the set of regular expressions; and
using the set of regular expressions to detect a phishing attack.
|