PROPER NAME IDENTIFICATION IN CHINESE
First Claim
1. A computer readable medium including instructions readable by a computer which, when implemented, cause the computer to identify proper names in input text by performing steps comprising:
- locating a sequence of single-characters in the input text not forming part of a multiple-character word;
comparing the sequence of single-characters to a lexical knowledge base to identify if a first portion of the sequence corresponds to stored identifiable portions of a proper name; and
comparing the sequence of single-characters to the lexical knowledge base to identify if a second portion of the sequence proximate the first portion includes characters known to comprise a second portion of a proper name.
2 Assignments
0 Petitions
Accused Products
Abstract
A word segmentation method to identify proper names in input text includes locating a sequence of single-characters in the input text not forming part of a multiple-character word. The method further includes comparing the sequence of single-characters to a lexical knowledge base to identify if a first portion of the sequence corresponds to stored identifiable portions of a proper name, and comparing the sequence of single-characters to the lexical knowledge base to identify if a second portion of the sequence proximate the first portion includes characters known to comprise a second portion of a proper name. Instructions can be provided on a computer readable medium to implement the method.
-
Citations
48 Claims
-
1. A computer readable medium including instructions readable by a computer which, when implemented, cause the computer to identify proper names in input text by performing steps comprising:
-
locating a sequence of single-characters in the input text not forming part of a multiple-character word;
comparing the sequence of single-characters to a lexical knowledge base to identify if a first portion of the sequence corresponds to stored identifiable portions of a proper name; and
comparing the sequence of single-characters to the lexical knowledge base to identify if a second portion of the sequence proximate the first portion includes characters known to comprise a second portion of a proper name. - View Dependent Claims (2, 3, 4, 5, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
6. The computer readable medium of claim 6 including instructions readable by a computer which, when implemented, cause the computer to perform a step comprising:
locating single-character words in the input text.
-
15. A computer readable medium including instructions readable by a computer which, when implemented, cause the computer to identify non-Chinese originated names contained in Chinese text by performing steps comprising:
-
locating a sequence of five or more single-characters in the input text not forming part of a multiple-character word; and
comparing the sequence of single-characters to a lexical knowledge base to identify if characters contained in the sequence of characters correspond to characters used in non-Chinese originated names. - View Dependent Claims (16, 17, 18, 19)
-
-
20. A computer readable medium comprising a lexical knowledge base for use in identifying proper names in input text, the lexical knowledge base comprising:
-
for each of a plurality of words, an indication that the word corresponds to a first portion of a proper name; and
for each of a plurality of characters, an indication that the character is part of a second portion of a proper name. - View Dependent Claims (21, 22, 23, 24, 25, 26, 27)
-
-
28. A computer readable medium comprising a lexical knowledge base for use in identifying non-Chinese originated names in Chinese text, the lexical knowledge base comprising:
for each of a plurality of characters, an indication that the character is part of a non-Chinese originated name.
-
29. A computer readable medium including instructions readable by a computer which, when implemented, cause the computer to create a lexical knowledge base for identifying proper names in input text by performing steps comprising:
-
comparing a list of full proper names to be identified and a list of known portions of the full proper names and removing from each of the proper names any known portions contained therein to obtain a list comprising remaining portions of the full proper names; and
storing indications in the lexical knowledge base for the list of full proper names, for the list of known portions of the full proper names, for the list of remaining portions of the full proper names, and for positional information of characters in each of the remaining portions of the full proper names. - View Dependent Claims (30, 31, 32, 35, 36)
-
-
33. A word segmentation method to identify proper names in input text, the method comprising:
-
locating a sequence of single-characters in the input text not forming part of a multiple-character word;
comparing the sequence of single-characters to a lexical knowledge base to identify if a first portion of the sequence corresponds to stored identifiable portions of a proper name; and
comparing the sequence of single-characters to the lexical knowledge base to identify if a second portion of the sequence proximate the first portion includes characters known to comprise a second portion of a proper name. - View Dependent Claims (34, 37, 38, 39, 40, 41, 43)
-
-
44. A word segmentation method to identify non-Chinese originated names contained in Chinese text, the method comprising:
-
locating a sequence of three or more single-characters in the input text not forming part of a multiple-character word; and
comparing the sequence of single-characters to a lexical knowledge base to identify if characters contained in the sequence of characters correspond to characters used in non-Chinese originated names. - View Dependent Claims (45, 46, 47, 48)
-
Specification