Using source-channel models for word segmentation
First Claim
Patent Images
1. A method of segmenting text formed of a sequence of characters, the method comprising:
- determining a class model probability of an entity given a candidate segment of the sequence of characters;
determining a context probability of a sequence of entities; and
combining the class model probability and the context model probability to select a sequence of entities and thereby select a sequence of candidate segments as a segmentation of the text.
2 Assignments
0 Petitions
Accused Products
Abstract
A method and apparatus for segmenting text is provided that identifies a sequence of entity types from a sequence of characters and thereby identifies a segmentation for the sequence of characters. Under the invention, the sequence of entity types is identified using probabilistic models that describe the likelihood of a sequence of entities and the likelihood of sequences of characters given particular entities. Under one aspect of the invention, organization name entities are identified from a first sequence of identified entities to form a final sequence of identified entities.
-
Citations
51 Claims
-
1. A method of segmenting text formed of a sequence of characters, the method comprising:
-
determining a class model probability of an entity given a candidate segment of the sequence of characters; determining a context probability of a sequence of entities; and combining the class model probability and the context model probability to select a sequence of entities and thereby select a sequence of candidate segments as a segmentation of the text. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29)
-
-
30. A computer-readable storage medium having encoded thereon computer-executable instructions for performing steps comprising:
-
determining a class model probability for a segment of a text given a first entity; determining a class model probability for a segment of the text given a second entity; and using the class model probabilities for the first entity and the second entity to select a sequence of entities that is represented by the text and thereby segment the text. - View Dependent Claims (31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45)
-
-
46. A method of identifying organization names in an unsegmented text, the method comprising:
-
identifying a sequence of entities in the unsegmented text to thereby segment the text; identifying a possible organization name from a portion of the segmented text; determining a probability for the possible organization name based on at least a portion of the sequence of entities; and using the probability to determine whether to accept the possible organization name as an organization name. - View Dependent Claims (47, 48, 49, 50, 51)
-
Specification