Method and apparatus for determining a measure of similarity between natural language sentences
First Claim
1. A method for providing a measure of similarity between a pair of sentences, each sentence having a number of words and phrases, the method comprising:
- calculating multiple similarity factors, based on selections of words in the first sentence and the second sentence; and
generating a distance metric using one or more of the similarity factors, the distance metric representing a measure of the similarity between the first sentence and the second sentence.
1 Assignment
0 Petitions
Accused Products
Abstract
Systems and methods for classifying natural language (NL) sentences using a combination of NL algorithms or techniques is disclosed. Each NL algorithm or technique may identify a different similarity trait between two or more sentences, and each may help compare the meaning of the sentences. By combining the various similarity factors, preferably by various weighting factors, a distance metric can be computed. The distance metric provides a measure of the overall similarity between sentences, and can be used to assign a sentences to an appropriate sentence category.
147 Citations
54 Claims
-
1. A method for providing a measure of similarity between a pair of sentences, each sentence having a number of words and phrases, the method comprising:
-
calculating multiple similarity factors, based on selections of words in the first sentence and the second sentence; and
generating a distance metric using one or more of the similarity factors, the distance metric representing a measure of the similarity between the first sentence and the second sentence. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21)
-
-
22. A method for determining a distance metric that indicates the similarity of a first sentence and a second sentence, the method comprising the steps of:
-
comparing the first sentence and the second sentence using a first comparing function, the first comparing function providing a first similarity indicator;
comparing the first sentence and the second sentence using a second comparing function, the second comparing function providing a second similarity indicator; and
calculating the distance metric by combining the first similarity indicator and the second similarity indicator. - View Dependent Claims (23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 47, 49)
-
-
46. A method for comparing the similarity of a first text block and a second text block, each text block having a number of words, the method comprising:
-
providing a list of correlating words, each correlating word having a corresponding correlation factor that provides a measure of correlation of the correlating word to a particular category;
providing a similarity factor; and
for each correlating word, increasing the similarity factor by the corresponding correlation factor when both the first text block and second text block include the correlating word, and decreasing the similarity factor by the corresponding correlation factor when only one of the first and second text blocks include the correlating word, the similarity factor providing a measure of the similarity between the first text block and the second text block.
-
-
48. A method for categorizing sentences in a document, comprising:
-
processing a number of selected sentences, said processing step assigning each of the number of selected sentences to one or more predefined categories, the particulars of the assignment of the number of selected sentences being dependent on a number of operating parameters;
displaying a correspondence between the one or more predefined categories and the selected sentences;
allowing a user to change the assigned category for selected sentences;
updating one or more of the operating parameters to reflect the change in the assigned categories; and
repeating the processing, displaying, allowing and updating steps until a desired accuracy level is achieved.
-
-
50. A method for providing a measure of similarity between a first sentence and a second sentence, each sentence having a number of words, the method comprising:
comparing one or more words of the first sentence with one or more words of the second sentence, the comparing step returning how many words in the first sentence match a word in the second sentence, as a percentage of the number of words in the longest of the first sentence and second sentence. - View Dependent Claims (51)
-
52. A method for providing a measure of similarity between a first phrase and a second phrase, each phrase having a number of words, the method comprising:
comparing one or more words of the first phrase with one or more words of the phrase sentence, the comparing step returning how many words in the first phrase match a word in the second phrase, as a percentage of the number of words in the longest of the first phrase and second phrase. - View Dependent Claims (53)
-
54. A method for providing a sentence similarity value for a first sentence and a second sentence, each sentence having a number of words, the method comprising:
-
providing a number of model words, each model word having a corresponding correlation factor; and
computing a sentence similarity value by increasing the sentence similarity value by the correlation factor of a model word when both the first sentence and second sentence include the model word, and decreasing the sentence similarity value by the correlation factor of a model word when only one of the first and second sentences include the model word.
-
Specification