Method and apparatus for determining a measure of similarity between natural language sentences
First Claim
1. A method for categorizing a sentence into one of two or more sentence categories, wherein each sentence category has at least one associated categorized sentence, and wherein the sentence and each categorized sentence has a number of words and/or phrases, the method comprising:
- calculating one or more similarity factor between the sentence and at least one categorized sentence in each of the two or more sentence categories based on selections of words in the sentence and the least one categorized sentence in each of the two or more sentence categories;
generating a distance metric for each of the sentence categories using one or more of the similarity factors, the distance metric representing a measure of the similarity between the sentence and the least one categorized sentence in each of the two or more sentence categories;
categorizing the sentence into one of the sentence categories based on the distance metrics; and
providing a result that at least in part reflects the categorization of the sentence to a user and/or computer.
1 Assignment
0 Petitions
Accused Products
Abstract
Systems and methods for classifying natural language (NL) sentences using a combination of NL algorithms or techniques is disclosed. Each NL algorithm or technique may identify a different similarity trait between two or more sentences, and each may help compare the meaning of the sentences. By combining the various similarity factors, preferably by various weighting factors, a distance metric can be computed. The distance metric provides a measure of the overall similarity between sentences, and can be used to assign a sentences to an appropriate sentence category.
110 Citations
38 Claims
-
1. A method for categorizing a sentence into one of two or more sentence categories, wherein each sentence category has at least one associated categorized sentence, and wherein the sentence and each categorized sentence has a number of words and/or phrases, the method comprising:
-
calculating one or more similarity factor between the sentence and at least one categorized sentence in each of the two or more sentence categories based on selections of words in the sentence and the least one categorized sentence in each of the two or more sentence categories;
generating a distance metric for each of the sentence categories using one or more of the similarity factors, the distance metric representing a measure of the similarity between the sentence and the least one categorized sentence in each of the two or more sentence categories;categorizing the sentence into one of the sentence categories based on the distance metrics; and providing a result that at least in part reflects the categorization of the sentence to a user and/or computer. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21)
-
-
22. A method for determining a distance metric that indicates the similarity of a first sentence and a second sentence, wherein the first sentence and the second sentence are extracted from one or more documents, the method comprising the steps of:
-
comparing the first sentence and the second sentence using a first comparing function, the first comparing function including a string overlap function and providing a first similarity indicator, the string overlap function returning how many words in the first sentence match a word in the second sentence, as a percentage of the number of words of the longest of the first sentence and second sentence; comparing the first sentence and the second sentence using a second comparing function, the second comparing function providing a second similarity indicator; calculating the distance metric by combining the first similarity indicator and the second similarity indicator; and providing a result that at least in part reflects the distance metric to a user and/or computer. - View Dependent Claims (23, 24)
-
-
25. A method for determining a distance metric that indicates the similarity of a first sentence and a second sentence, wherein the first sentence and the second sentence are extracted from one or more documents, the method comprising the steps of:
-
comparing the first sentence and the second sentence using a first comparing function, the first comparing function including a keyword match function and providing a first similarity indicator, wherein the keyword match function returns how many words in the first sentence match a word in the second sentence with both words having a common word type, as a percentage of the maximum number of keywords in the first sentence and second sentence; comparing the first sentence and the second sentence using a second comparing function, the second comparing function providing a second similarity indicator; calculating the distance metric by combining the first similarity indicator and the second similarity indicator; and providing a result that at least in part reflects the distance metric to a user and/or computer. - View Dependent Claims (26, 27, 28)
-
-
29. A method for determining a distance metric that indicates the similarity of a first sentence and a second sentence, wherein the first sentence and the second sentence are extracted from one or more documents, the method comprising the steps of:
-
comparing the first sentence and the second sentence using a first comparing function, the first comparing function including a phrase match function and providing a first similarity indicator, wherein the phrase match function determines a phrase similarity value for each phrase pair by identifying how many words in a first phrase match a word in a second phrase, as a percentage of the maximum number of words in the first phrase and second phrase; comparing the first sentence and the second sentence using a second comparing function, the second comparing function providing a second similarity indicator; calculating the distance metric by combining the first similarity indicator and the second similarity indicator; and providing a result that at least in part reflects the distance metric to a user and/or computer. - View Dependent Claims (30, 31)
-
-
32. A method for categorizing sentences in a document, comprising:
-
processing a number of selected sentences, said processing step assigning each of the number of selected sentences to one or more predefined categories without using a user entered query, the particulars of the assignment of the number of selected sentences being dependent on a number of operating parameters; displaying a correspondence between the one or more predefined categories and the selected sentences; allowing a user to change the assigned category for selected sentences; updating one or more of the operating parameters to reflect the change in the assigned categories; and
repeating the processing, displaying, allowing and updating steps until a desired accuracy level is achieved. - View Dependent Claims (33)
-
-
34. A method for providing a measure of similarity between a first sentence and a second sentence, each sentence having a number of words, the method comprising:
-
comparing one or more words of the first sentence with one or more words of the second sentence, the comparing step returning how many words in the first sentence match a word in the second sentence, as a percentage of the number of words in the longest of the first sentence and second sentence; providing a result that is related the comparing step to a user and/or computer. - View Dependent Claims (35)
-
-
36. A method for providing a measure of similarity between a first phrase and a second phrase, each phrase having a number of words, the method comprising:
-
comparing one or more words of the first phrase with one or more words of the second phrase, the comparing step returning how many words in the first phrase match a word in the second phrase, as a percentage of the number of words in the longest of the first phrase and second phrase; and providing a result that is related the comparing step to a user and/or computer. - View Dependent Claims (37)
-
-
38. A method of comparing a first sentence and a second sentence, the method comprising the steps of:
-
comparing a word in the first sentence with a word in the second sentence; and determining that the word in the first sentence matches the word in the second sentence if the word in the first sentence and the word in the second sentence each have a minimum of four characters, and the word in the first sentence shares at least 80 percent of the characters with the word in the second sentence; and providing a result to a user and/or computer.
-
Specification