SYSTEMS AND METHODS FOR EXTRACTING PHASES FROM TEXT
First Claim
Patent Images
1. A method for extracting phrases from text, comprising:
- preprocessing desired phrases into at least one phrase indexing data structure for efficient matching;
scanning text to construct a hash table including keys and corresponding entries;
locating suffix trie trees for each word in the hash table;
matching each position in the hash table against the suffix trie trees; and
outputting phrases matched in the scanned text.
5 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods for extracting phrases from text are disclosed. In an exemplary embodiment, a method may include preprocessing desired phrases into at least one phrase indexing data structure for efficient matching. The method may also include scanning text to construct a hash table including keys and corresponding entries. The method may also include locating suffix trie trees for each word in the hash table. The method may also include matching each position in the hash table against the suffix trie trees, and outputting phrases matched in the scanned text.
46 Citations
21 Claims
-
1. A method for extracting phrases from text, comprising:
-
preprocessing desired phrases into at least one phrase indexing data structure for efficient matching; scanning text to construct a hash table including keys and corresponding entries; locating suffix trie trees for each word in the hash table; matching each position in the hash table against the suffix trie trees; and outputting phrases matched in the scanned text. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. The method of claim 9 further comprising selecting top related phrases for each desired phrase based on relevance.
-
10. A system for extracting phrases from text, comprising:
-
at least one phrase indexing data structure residing in memory, the at least one phrase indexing data structure including desired phrases; a hash table constructed in memory, the hash table including keys and corresponding entries; and program code for locating suffix trie trees for each word in the hash table and matching each position in the hash table against the suffix trie trees to match phrases in a scanned text. - View Dependent Claims (11, 15, 16, 17, 18, 19)
-
- 12. The system of claim 12 wherein the keyword is a word appearing least frequent in a typical text.
-
20. A system for extracting movie titles from web-based text, comprising:
-
memory means for storing at least one phrase indexing data structure including desired movie titles; table means for storing keys and corresponding entries; and means for locating suffix trie trees for each word in the hash table and matching each position in the hash table against the suffix trie trees to match movie titles in a scanned web-based text. - View Dependent Claims (21)
-
Specification