METHODS AND SYSTEMS FOR IMPROVING TEXT SEGMENTATION
First Claim
Patent Images
1. A computer-implemented method, comprising:
- receiving, at a computer system, a string of characters that includes no word-delineating breaks;
generating, by the computer system from the string of characters, a plurality of candidate word groups that are portions of the string of characters;
determining, by the computer system, frequencies with which all or a portion of each of the candidate word groups occur in a corpus; and
selecting, by the computer system using the determined frequencies, one or more of the candidate word groups for submission to an entity.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods and systems for improving text segmentation are disclosed. In one embodiment, at least a first segmented result and a second segmented result are determined from a string of characters, a first frequency of occurrence for the first segmented result and a second frequency of occurrence for the second segmented result are determined, and an operable segmented result is identified from the first segmented result and the second segmented result based at least in part on the first frequency of occurrence and the second frequency of occurrence.
-
Citations
20 Claims
-
1. A computer-implemented method, comprising:
-
receiving, at a computer system, a string of characters that includes no word-delineating breaks; generating, by the computer system from the string of characters, a plurality of candidate word groups that are portions of the string of characters; determining, by the computer system, frequencies with which all or a portion of each of the candidate word groups occur in a corpus; and selecting, by the computer system using the determined frequencies, one or more of the candidate word groups for submission to an entity. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A computer program product encoded on a computer-readable medium, operable to cause a data processing apparatus to perform operations comprising:
-
receiving a string of characters that includes no word-delineating breaks; generating, from the string of characters, a plurality of candidate word groups that are portions of the string of characters; determining frequencies with which all or a portion of each of the candidate word groups occur in a corpus; and selecting, using the determined frequencies, one or more of the candidate word groups for submission to an entity. - View Dependent Claims (16, 17, 18, 19)
-
-
20. A system for segmenting a string of characters, the system comprising:
-
one or more server devices; an interface to the one or more server devices that is configured to receive a string of characters that includes no word-delineating breaks; a segmentation processor of the one or more server devices that is configured to from the string of characters, a plurality of candidate word groups that are portions of the string of characters; and means for selecting one or more of the combinations of words, wherein the means for selecting is configured to determine frequencies with which all or a portion of each of the candidate word groups occur in a corpus, and wherein the means for selecting is further configured to select, using the determined frequencies, one or more of the candidate word groups for submission to an entity.
-
Specification