System and method for identifying phrases in text
First Claim
Patent Images
1. A system for discovering latent relationships in data, the system comprising:
- one or more memory units configured to store a plurality of noun tags; and
one or more processing units operable to;
access a phrase comprising a plurality of nouns;
create a plurality of tokens, each token comprising one of the plurality of nouns and an associated noun tag of the plurality of noun tags;
cluster one or more of the created tokens into a chunk of tokens, the one or more tokens clustered into the chunk of tokens based on the noun tags of the one or more tokens;
form one or more sub-phrases of the accessed phrase based on the chunk of tokens, the one or more sub-phrases comprising the nouns of the one or more tokens clustered into the chunk of tokens; and
perform Latent Semantic Analysis (LSA) using the one or more sub-phrases.
7 Assignments
0 Petitions
Accused Products
Abstract
A method includes accessing text that includes a plurality of words, tagging each of the plurality of words with one of a plurality of parts of speech (POS) tags, and creating a plurality of tokens, each token comprising one of the plurality of words and its associated POS tag. The method further includes clustering one or more of the created tokens into a chunk of tokens, the one or more tokens clustered into the chunk of tokens based on the POS tags of the one or more tokens, and forming a phrase based on the chunk of tokens, the phrase comprising the words of the one or more tokens clustered into the chunk of tokens.
-
Citations
20 Claims
-
1. A system for discovering latent relationships in data, the system comprising:
-
one or more memory units configured to store a plurality of noun tags; and one or more processing units operable to; access a phrase comprising a plurality of nouns; create a plurality of tokens, each token comprising one of the plurality of nouns and an associated noun tag of the plurality of noun tags; cluster one or more of the created tokens into a chunk of tokens, the one or more tokens clustered into the chunk of tokens based on the noun tags of the one or more tokens; form one or more sub-phrases of the accessed phrase based on the chunk of tokens, the one or more sub-phrases comprising the nouns of the one or more tokens clustered into the chunk of tokens; and perform Latent Semantic Analysis (LSA) using the one or more sub-phrases. - View Dependent Claims (2, 3, 10, 11, 12, 13, 14)
-
-
4. A computer-implemented method system for discovering latent relationships in data, the method comprising:
-
accessing a phrase by a processing system, the text comprising a plurality of nouns; creating, by the processing system, a plurality of tokens, each token comprising one of the plurality of nouns and an associated noun tag of a plurality of noun tags; clustering, by the processing system, one or more of the created tokens into a chunk of tokens, the one or more tokens clustered into the chunk of tokens based on the noun tags of the one or more tokens; forming, by the processing system, one or more sub-phrases of the accessed phrase based on the chunk of tokens, the one or more sub-phrases comprising the nouns of the one or more tokens clustered into the chunk of tokens; and performing, by the processing system, Latent Semantic Analysis (LSA) using the one or more sub-phrases. - View Dependent Claims (5, 6, 15, 16, 17)
-
-
7. A non-transitory computer-readable medium comprising software, the software when executed by one or more processing units operable to perform operations comprising:
-
accessing a phrase comprising a plurality of nouns; creating a plurality of tokens, each token comprising one of the plurality of nouns and an associated noun tag of a plurality of noun tags; clustering one or more of the created tokens into a chunk of tokens, the one or more tokens clustered into the chunk of tokens based on the noun tags of the one or more tokens; forming one or more sub-phrases of the accessed phrase based on the chunk of tokens, the one or more sub-phrases comprising the nouns of the one or more tokens clustered into the chunk of tokens; and performing Latent Semantic Analysis (LSA) using the one or more sub-phrases. - View Dependent Claims (8, 9, 18, 19, 20)
-
Specification