Systems and methods regarding keyword extraction
First Claim
1. A computer system comprising one or more processors that function as:
- (a) a preprocessing unit that extracts text from a webpage to produce at least a first set of candidate keywords, applies language processing to produce at least a second set of candidate keywords, and combines said first and second sets of candidate keywords into a first candidate pool;
(b) a candidate extraction unit that receives data from said preprocessing unit describing at least said first candidate pool and produces a second candidate pool;
(c) a feature extraction unit that receives data describing at least said second candidate pool and analyzes said second candidate pool for general features and linguistic features, wherein said general features include number of times a term appears in the text extracted from the webpage; and
(d) a classification unit that receives said data describing at least said second candidate pool and related data from said feature extraction unit, and determines a likelihood of each candidate in said second candidate pool being a primary or secondary keyword.
2 Assignments
0 Petitions
Accused Products
Abstract
One exemplary aspect comprises a computer system comprising: (a) a preprocessing unit that extracts text from a webpage to produce at least a first set of candidate keywords, applies language processing to produce at least a second set of candidate keywords, and combines said first and second sets of candidate keywords into a first candidate pool; (b) a candidate extraction unit that receives data from said preprocessing unit describing at least said first candidate pool and produces a second candidate pool; (c) a feature extraction unit that receives data describing at least said second candidate pool and analyzes said second candidate pool for general features and linguistic features; and (d) a classification unit that receives said data describing at least said second candidate pool and related data from said feature extraction unit, and determines a likelihood of each candidate in said second candidate pool being a primary or secondary keyword.
23 Citations
19 Claims
-
1. A computer system comprising one or more processors that function as:
-
(a) a preprocessing unit that extracts text from a webpage to produce at least a first set of candidate keywords, applies language processing to produce at least a second set of candidate keywords, and combines said first and second sets of candidate keywords into a first candidate pool; (b) a candidate extraction unit that receives data from said preprocessing unit describing at least said first candidate pool and produces a second candidate pool; (c) a feature extraction unit that receives data describing at least said second candidate pool and analyzes said second candidate pool for general features and linguistic features, wherein said general features include number of times a term appears in the text extracted from the webpage; and (d) a classification unit that receives said data describing at least said second candidate pool and related data from said feature extraction unit, and determines a likelihood of each candidate in said second candidate pool being a primary or secondary keyword. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
-
-
18. A method comprising steps implemented by a computer processing system, said steps comprising:
-
(a) extracting text from a webpage to produce at least a first set of candidate keywords, applying language processing to produce at least a second set of candidate keywords, and combining said first and second sets of candidate keywords into a first candidate pool; (b) receiving data describing at least said first candidate pool and producing a second candidate pool; (c) receiving data describing at least said second candidate pool and analyzing said second candidate pool for general features and linguistic features, wherein said general features include number of times a term appears in the text extracted from the webpage; and (d) receiving said data describing at least said second candidate pool and related data from analyzing said second candidate pool, and determining a likelihood of each candidate in said second candidate pool being a primary or secondary keyword.
-
-
19. A non-transitory computer readable medium storing software instructions comprising:
-
(a) extracting text from a webpage to produce at least a first set of candidate keywords, applying language processing to produce at least a second set of candidate keywords, and combining said first and second sets of candidate keywords into a first candidate pool; (b) receiving data describing at least said first candidate pool and producing a second candidate pool; (c) receiving data describing at least said second candidate pool and analyzing said second candidate pool for general features and linguistic features, wherein said general features include number of times a term appears in the text extracted from the webpage; and (d) receiving said data describing at least said second candidate pool and related data from analyzing said second candidate pool, and determining a likelihood of each candidate in said second candidate pool being a primary or secondary keyword.
-
Specification