Systems And Methods Regarding Keyword Extraction
First Claim
1. A computer system comprising:
- (a) a preprocessing unit that extracts text from a webpage to produce at least a first set of candidate keywords, applies language processing to produce at least a second set of candidate keywords, and combines said first and second sets of candidate keywords into a first candidate pool;
(b) a candidate extraction unit that receives data from said preprocessing unit describing at least said first candidate pool and produces a second candidate pool;
(c) a feature extraction unit that receives data describing at least said second candidate pool and analyzes said second candidate pool for general features and linguistic features; and
(d) a classification unit that receives said data describing at least said second candidate pool and related data from said feature extraction unit, and determines a likelihood of each candidate in said second candidate pool being a primary or secondary keyword.
2 Assignments
0 Petitions
Accused Products
Abstract
One exemplary aspect comprises a computer system comprising: (a) a preprocessing unit that extracts text from a webpage to produce at least a first set of candidate keywords, applies language processing to produce at least a second set of candidate keywords, and combines said first and second sets of candidate keywords into a first candidate pool; (b) a candidate extraction unit that receives data from said preprocessing unit describing at least said first candidate pool and produces a second candidate pool; (c) a feature extraction unit that receives data describing at least said second candidate pool and analyzes said second candidate pool for general features and linguistic features; and (d) a classification unit that receives said data describing at least said second candidate pool and related data from said feature extraction unit, and determines a likelihood of each candidate in said second candidate pool being a primary or secondary keyword.
65 Citations
19 Claims
-
1. A computer system comprising:
-
(a) a preprocessing unit that extracts text from a webpage to produce at least a first set of candidate keywords, applies language processing to produce at least a second set of candidate keywords, and combines said first and second sets of candidate keywords into a first candidate pool; (b) a candidate extraction unit that receives data from said preprocessing unit describing at least said first candidate pool and produces a second candidate pool; (c) a feature extraction unit that receives data describing at least said second candidate pool and analyzes said second candidate pool for general features and linguistic features; and (d) a classification unit that receives said data describing at least said second candidate pool and related data from said feature extraction unit, and determines a likelihood of each candidate in said second candidate pool being a primary or secondary keyword. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
-
-
18. A method comprising steps implemented by a computer processing system, said steps comprising:
-
(a) extracting text from a webpage to produce at least a first set of candidate keywords, applying language processing to produce at least a second set of candidate keywords, and combining said first and second sets of candidate keywords into a first candidate pool; (b) receiving data describing at least said first candidate pool and producing a second candidate pool; (c) receiving data describing at least said second candidate pool and analyzing said second candidate pool for general features and linguistic features; and (d) receiving said data describing at least said second candidate pool and related data from said feature extraction unit, and determining a likelihood of each candidate in said second candidate pool being a primary or secondary keyword.
-
-
19. A tangible computer readable medium storing software operable to perform steps comprising:
-
(a) extracting text from a webpage to produce at least a first set of candidate keywords, applying language processing to produce at least a second set of candidate keywords, and combining said first and second sets of candidate keywords into a first candidate pool; (b) receiving data describing at least said first candidate pool and producing a second candidate pool; (c) receiving data describing at least said second candidate pool and analyzing said second candidate pool for general features and linguistic features; and (d) receiving said data describing at least said second candidate pool and related data from said feature extraction unit, and determining a likelihood of each candidate in said second candidate pool being a primary or secondary keyword.
-
Specification