System and Method for Identifying Phrases in Text
First Claim
Patent Images
1. A system, comprising:
- one or more memory units; and
one or more processing units operable to;
access text comprising a plurality of words;
tag each of the plurality of words with one of a plurality of parts of speech (POS) tags;
create a plurality of tokens, each token comprising one of the plurality of words and its associated POS tag;
cluster one or more of the created tokens into a chunk of tokens, the one or more tokens clustered into the chunk of tokens based on the POS tags of the one or more tokens; and
form a phrase based on the chunk of tokens, the phrase comprising the words of the one or more tokens clustered into the chunk of tokens.
7 Assignments
0 Petitions
Accused Products
Abstract
A method includes accessing text that includes a plurality of words, tagging each of the plurality of words with one of a plurality of parts of speech (POS) tags, and creating a plurality of tokens, each token comprising one of the plurality of words and its associated POS tag. The method further includes clustering one or more of the created tokens into a chunk of tokens, the one or more tokens clustered into the chunk of tokens based on the POS tags of the one or more tokens, and forming a phrase based on the chunk of tokens, the phrase comprising the words of the one or more tokens clustered into the chunk of tokens.
11 Citations
20 Claims
-
1. A system, comprising:
-
one or more memory units; and one or more processing units operable to; access text comprising a plurality of words; tag each of the plurality of words with one of a plurality of parts of speech (POS) tags; create a plurality of tokens, each token comprising one of the plurality of words and its associated POS tag; cluster one or more of the created tokens into a chunk of tokens, the one or more tokens clustered into the chunk of tokens based on the POS tags of the one or more tokens; and form a phrase based on the chunk of tokens, the phrase comprising the words of the one or more tokens clustered into the chunk of tokens. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A computer-implemented method, comprising:
-
accessing text by a processing system, the text comprising a plurality of words; tagging, by the processing system, each of the plurality of words with one of a plurality of parts of speech (POS) tags; creating, by the processing system, a plurality of tokens, each token comprising one of the plurality of words and its associated POS tag; clustering, by the processing system, one or more of the created tokens into a chunk of tokens, the one or more tokens clustered into the chunk of tokens based on the POS tags of the one or more tokens; and forming, by the processing system, a phrase based on the chunk of tokens, the phrase comprising the words of the one or more tokens clustered into the chunk of tokens. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A non-transitory computer-readable medium comprising software, the software when executed by one or more processing units operable to perform operations comprising:
-
accessing text comprising a plurality of words; tagging each of the plurality of words with one of a plurality of parts of speech (POS) tags; creating a plurality of tokens, each token comprising one of the plurality of words and its associated POS tag; clustering one or more of the created tokens into a chunk of tokens, the one or more tokens clustered into the chunk of tokens based on the POS tags of the one or more tokens; and forming a phrase based on the chunk of tokens, the phrase comprising the words of the one or more tokens clustered into the chunk of tokens. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification