Method and apparatus for identifying synonyms and using synonyms to search
First Claim
1. A computer implemented method for identifying synonyms, the method comprising:
- obtaining, by a server, a first word and a second word, each of the first word and the second word including at least one term;
determining that a shortest edit distance between the first word and the second word is less than or equal to an edit distance threshold;
determining whether both of the first word and the second word exist in a preset knowledge database;
in response to determining at least the first word does not exist in the preset knowledge database,segmenting the first word to obtain one or more terms included in the first word;
determining whether the one or more terms after segmentation exist in the preset knowledge database; and
searching, in response to determining that the one or more terms after segmentation exist in the preset knowledge database, a smallest granularity type with a highest weight value for each of the one or more terms in the preset knowledge database;
finding, in response to determining that both of the first word and the second word exist in the preset knowledge database, the smallest granularity type with the highest weight value for each of the first word and the second word in the preset knowledge database; and
determining whether the first word and second word have a same smallest granularity type with a highest weight value including,determining that the first word and the second word are synonyms, in response to determining that the first word and the second word have the same smallest granularity type with the highest weight value; and
determining that the two words are non-synonyms, in response to determining that the first word and the second word do not have the same smallest granularity type with the highest weight value.
1 Assignment
0 Petitions
Accused Products
Abstract
A method and an apparatus for identifying synonym and utilizing such synonym to conduct search is disclosed. The disclosed method includes: obtaining arbitrary two words to be identified; determining whether a shortest edit distance between the two words less than or equal to an edit distance threshold; determining whether the two words to be identified exist in a preset knowledge database, and if an answer is yes then searching a smallest granularity type with highest weight value for each word in the knowledge database; and if the two word have the same smallest granularity type with highest weight value, then determining such two words are synonyms, or non-synonym otherwise. The disclosed techniques greatly improve accuracy of synonym identification and guarantee effect of synonym identification.
30 Citations
16 Claims
-
1. A computer implemented method for identifying synonyms, the method comprising:
-
obtaining, by a server, a first word and a second word, each of the first word and the second word including at least one term; determining that a shortest edit distance between the first word and the second word is less than or equal to an edit distance threshold; determining whether both of the first word and the second word exist in a preset knowledge database; in response to determining at least the first word does not exist in the preset knowledge database, segmenting the first word to obtain one or more terms included in the first word; determining whether the one or more terms after segmentation exist in the preset knowledge database; and searching, in response to determining that the one or more terms after segmentation exist in the preset knowledge database, a smallest granularity type with a highest weight value for each of the one or more terms in the preset knowledge database; finding, in response to determining that both of the first word and the second word exist in the preset knowledge database, the smallest granularity type with the highest weight value for each of the first word and the second word in the preset knowledge database; and determining whether the first word and second word have a same smallest granularity type with a highest weight value including, determining that the first word and the second word are synonyms, in response to determining that the first word and the second word have the same smallest granularity type with the highest weight value; and determining that the two words are non-synonyms, in response to determining that the first word and the second word do not have the same smallest granularity type with the highest weight value. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. An apparatus for identifying synonyms, the apparatus comprising:
-
a processor; a memory device communicatively coupled with the processor; and a server storing; a retrieval unit that obtains a first word and a second word, each of the first word and the second word including at least one term; a first determination unit that determines that a shortest edit distance between the first word and the second word is less than or equal to an edit distance threshold; a second determination unit that determines whether both of the first word and the second word exist in a preset knowledge database; a query unit that finds a smallest granularity type with a highest weight value for each of the first word and the second word in the preset knowledge database, in response to determining that both of the first word and the second word exist in the preset knowledge database; a segmentation unit that segments the first word to obtain one or more terms included in the first word and informs the second determination unit;
wherein the second determination unit further determines if all of the one or more terms after segmentation exist in the preset knowledge database, informs the query unit; and
determines if not all of the one or more terms after segmentation exist in the preset knowledge database, informs the segmentation unit, in response to determining that at least the first word does not exist in the preset knowledge database; anda third determination unit that determines that the first word and the second word are synonyms when the first word and the second word have a same smallest granularity type with a highest weight value, and that the first word and the second word are non-synonyms when the first word and the second word do not have the same smallest granularity type with the highest weight value. - View Dependent Claims (12, 13, 14)
-
-
15. One or more non-transitory computer-readable storage media having stored thereon computer executable units that are executable to perform actions comprising:
-
obtaining a query log of a search engine; determining a threshold of a ranking of queries in the query log; selecting a plurality of queries with rankings higher than the threshold; obtaining a first word and a second word from the plurality of queries; determining that a shortest edit distance between the first word and the second word is less than or equal to an edit distance threshold; determining whether both of the first word and the second word exist in a preset knowledge database; in response to determining at least the first word does not exist in the preset knowledge database, segmenting the first word to obtain one or more terms; and determining whether the one or more terms after segmentation exist in the preset knowledge database, in response to determining that the one or more terms after segmentation exist in the preset knowledge database, searching a smallest granularity type with a highest weight value for each of the one or more terms in the preset knowledge database; in response to determining that both of the first word and the second word exist in the preset knowledge database, finding the smallest granularity type with the highest weight value for each of the first word and the second word in the preset knowledge database; determining whether the two words have a same smallest granularity type with a highest weight value; in response to determining that the first word and the second word have the same smallest granularity type with the highest weight value, determining that the first word and the second word are synonyms; and in response to determining that the first word and the second word do not have the same smallest granularity type with the highest weight value, determining that the two words are non-synonyms. - View Dependent Claims (16)
-
Specification