Systems and methods for extracting phases from text
First Claim
Patent Images
1. A method for extracting phrases from text, comprising,preprocessing desired phrases into at least one phrase indexing data structure for efficient matching;
- during preprocessing building suffix trie trees, wherein one of the suffix trite trees is built at a word level, and then an order of words is reversed to build another one of the suffix tile trees;
after preprocessing, scanning text to construct a hash table including keys and corresponding entries;
locating suffix trie trees in the at least one phrase indexing data structure for each word in the hash table;
matching each position in the hash table against the suffix trie trees; and
outputting phrases matched in the scanned text.
5 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods for extracting phrases from text are disclosed. In an exemplary embodiment, a method may include preprocessing desired phrases into at least one phrase indexing data structure for efficient matching. The method may also include scanning text to construct a hash table including keys and corresponding entries. The method may also include locating suffix trie trees for each word in the hash table. The method may also include matching each position in the hash table against the suffix trie trees, and outputting phrases matched in the scanned text.
38 Citations
21 Claims
-
1. A method for extracting phrases from text, comprising,
preprocessing desired phrases into at least one phrase indexing data structure for efficient matching; -
during preprocessing building suffix trie trees, wherein one of the suffix trite trees is built at a word level, and then an order of words is reversed to build another one of the suffix tile trees; after preprocessing, scanning text to construct a hash table including keys and corresponding entries; locating suffix trie trees in the at least one phrase indexing data structure for each word in the hash table; matching each position in the hash table against the suffix trie trees; and outputting phrases matched in the scanned text. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A system for extracting phrases from text, comprising:
-
at least one phrase indexing data structure residing in non-transitory computer readable media, the at least one phrase indexing data structure including desired phrases; program code stored on non-transitory computer readable media and executable by a processor for improving accuracy of matches in scanned text by reducing false positives using phrases related to the scanned text, the preprocessing program code building suffix trie trees, wherein one of the suffix trie trees is built at a word level, and then an order of words is reversed to build another one of the suffix trie trees; a hash table constructed in non-transitory computer readable media, the hash table including keys and corresponding entries; and program code stored on non-transitory computer readable media and executable by a processor for locating suffix trie trees for each word in the hash table and matching each position in the hash table against the suffix trie trees to match phrases in the scanned text. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20)
-
-
21. A system for extracting movie titles from web-based text, comprising:
-
memory means for storing at least one phrase indexing data structure including desired movie titles; preprocessing means for improving accuracy of matches of movie titles in the web-based text by reducing false positives using phrases related to the movie titles, the preprocessing means building suffix trie trees, wherein one of the suffix trie trees is built at a word level, and then an order of words is reversed to build another one of the suffix trie trees; table means for storing keys and corresponding entries, the keys representing words in the web-based text, and the entries representing a list of positions of the words in the web-based text; and means for locating suffix trie trees for each word in the table means and matching each position in the table means against the suffix trie trees to match movie titles in a scanned web-based text.
-
Specification