Text segmentation
First Claim
Patent Images
1. A computer-implemented method performed by data processing apparatus comprising:
- receiving a string of characters;
identifying segmented results from the string of characters, wherein an identified segmented result includes one or more words that are formed from segmenting the string of characters into two or more sub-strings;
determining levels at which the identified segmented results occur in one or more corpora;
selecting one or more segmented results from the identified segmented results based on at least the determined levels; and
providing the selected one or more segmented results in association with the string of characters.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods and systems for improving text segmentation are disclosed. In one embodiment, at least a first segmented result and a second segmented result are determined from a string of characters, a first frequency of occurrence for the first segmented result and a second frequency of occurrence for the second segmented result are determined, and an operable segmented result is identified from the first segmented result and the second segmented result based at least in part on the first frequency of occurrence and the second frequency of occurrence.
107 Citations
18 Claims
-
1. A computer-implemented method performed by data processing apparatus comprising:
-
receiving a string of characters; identifying segmented results from the string of characters, wherein an identified segmented result includes one or more words that are formed from segmenting the string of characters into two or more sub-strings; determining levels at which the identified segmented results occur in one or more corpora; selecting one or more segmented results from the identified segmented results based on at least the determined levels; and providing the selected one or more segmented results in association with the string of characters. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A computer-implemented system, comprising:
-
one or more computer processors; and a computer-readable medium storing instructions that, when executed by the one or more computer processors, cause the one or more computer processors to perform operations comprising; receiving a string of characters; identifying segmented results from the string of characters, wherein an identified segmented result includes one or more words that are formed from segmenting the string of characters into two or more sub-strings; determining levels at which the identified segmented results occur in one or more corpora; selecting one or more segmented results from the identified segmented results based on at least the determined levels; and providing the selected one or more segmented results in association with the string of characters. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
-
Specification