Method and apparatus for identifying synonyms and using synonyms to search
First Claim
1. A method for identifying synonyms, the method comprising:
- obtaining, by a server, any two words to be identified;
determining that a shortest edit distance between the two words is less than or equal to an edit distance threshold;
determining whether both of the two words exist in a preset knowledge database;
if at least one of the two words does not exist in the preset knowledge database, segmenting one or more unfound words;
determining whether all of the words after segmentation exist in the knowledge database; and
if all of the words after segmentation exist in the knowledge database, finding a smallest granularity type with highest weight value for each such word in the knowledge database; and
if both of the two words exist in the preset knowledge database, then finding the smallest granularity type with highest weight value for each word in the knowledge database;
determining whether the two words have a same smallest granularity type with highest weight value;
if the two words have the same smallest granularity type with highest weight value, then determining that the two words are synonyms; and
if the two words do not have the same smallest granularity type with highest weight value, then determining that the two words are non-synonyms.
1 Assignment
0 Petitions
Accused Products
Abstract
A method and an apparatus for identifying synonym and utilizing such synonym to conduct search is disclosed. The disclosed method includes: obtaining arbitrary two words to be identified; determining whether a shortest edit distance between the two words less than or equal to an edit distance threshold; determining whether the two words to be identified exist in a preset knowledge database, and if an answer is yes then searching a smallest granularity type with highest weight value for each word in the knowledge database; and if the two word have the same smallest granularity type with highest weight value, then determining such two words are synonyms, or non-synonym otherwise. The disclosed techniques greatly improve accuracy of synonym identification and guarantee effect of synonym identification.
28 Citations
18 Claims
-
1. A method for identifying synonyms, the method comprising:
-
obtaining, by a server, any two words to be identified; determining that a shortest edit distance between the two words is less than or equal to an edit distance threshold; determining whether both of the two words exist in a preset knowledge database; if at least one of the two words does not exist in the preset knowledge database, segmenting one or more unfound words; determining whether all of the words after segmentation exist in the knowledge database; and if all of the words after segmentation exist in the knowledge database, finding a smallest granularity type with highest weight value for each such word in the knowledge database; and if both of the two words exist in the preset knowledge database, then finding the smallest granularity type with highest weight value for each word in the knowledge database; determining whether the two words have a same smallest granularity type with highest weight value; if the two words have the same smallest granularity type with highest weight value, then determining that the two words are synonyms; and if the two words do not have the same smallest granularity type with highest weight value, then determining that the two words are non-synonyms. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. An apparatus for identifying synonyms, the apparatus comprising:
-
a server, the server including hardware, the server storing; a retrieval unit that obtains any two words to be identified; a first determination unit that determines that a shortest edit distance between the two words is less than or equal to an edit distance threshold to inform a second determination unit; the second determination unit that determines that both of the two words exist in a preset knowledge database and to inform an query unit; the query unit that finds a smallest granularity type with highest weight value for each word in the knowledge database; and a third determination unit that determines whether the two words have a same smallest granularity type with highest weight value, if the two words have the same smallest granularity type with highest weight value, a common character table query unit that determines whether any changeable character or word of the two words is among changeable characters of a preset common character table, if any changeable character or word of the two words is among changeable characters of the preset common character table, the common character table query unit informs the third determination unit to determine that the two words are synonyms; and if any changeable character or word of the two words is not among changeable characters of the preset common character table, the common character table query unit informs the third determination unit to determine that the two words are non-synonyms; and if the two words do not have the same smallest granularity type with highest weight value, the third determination unit determines that the two words are non-synonyms. - View Dependent Claims (8, 9, 10, 11)
-
-
12. A server comprising hardware configured to perform operations comprising:
-
obtaining two words to be identified; determining that a shortest edit distance between the two words is less than or equal to an edit distance threshold; determining whether both of the two words exist in a preset knowledge database; if at least one of the two words does not exist in the preset knowledge database, segmenting one or more unfound words; determining whether all of the words after segmentation exist in the knowledge database; and if all of the words after segmentation exist in the knowledge database, finding a smallest granularity type with highest weight value for each such word in the knowledge database; and if both of the two words exist in the preset knowledge database, finding the smallest granularity type with highest weight value for each word in the knowledge database; determining whether the two words have a same smallest granularity type with highest weight value; if the two words have the same smallest granularity type with highest weight value, determining that the two words are synonyms; and if the two words do not have the same smallest granularity type with highest weight value, determining that the two words are non-synonyms. - View Dependent Claims (13, 14, 15, 16, 17)
-
-
18. A method for identifying synonyms, the method comprising:
-
obtaining, by a server, any two words to be identified; determining that a shortest edit distance between the two words is less than or equal to an edit distance threshold; determining whether both of the two words exist in a preset knowledge database; if both of the two words exist in the preset knowledge database, then finding a smallest granularity type with highest weight value for each word in the knowledge database; determining whether the two words have a same smallest granularity type with highest weight value; if the two words have the same smallest granularity type with highest weight value, determining whether any changeable character or word of the two words is among changeable characters of a preset common character table; if any changeable character or word of the two words is among changeable characters of the preset common character table, determining that the two words are synonyms; and if any changeable character or word of the two words is not among changeable characters of the preset common character table, determining that the two words are non-synonyms; and if the two words do not have the same smallest granularity type with highest weight value, determining that the two words are non-synonyms.
-
Specification