Text comparison apparatus
First Claim
Patent Images
1. A text comparison apparatus, comprising:
- a text storage means for storing a plurality of texts;
a text input means for inputting a text from the text storage means;
a text element extraction means for extracting elements from the text obtained with the text input means;
a text element counting means for counting how many of each of the text elements are included the text;
a text element storage means for storing text elements and occurrence counts as sets;
a text element input means for inputting text elements and their occurrence counts with regard to two texts from the text element storage means; and
a similarity calculation means for calculating a similarity of the texts by dividing the sum of the occurrence counts of the text elements included in both of the two texts by the sum of the occurrence counts of all text elements in each of the texts.
1 Assignment
0 Petitions
Accused Products
Abstract
A text comparison apparatus computes the occurrence count of text elements, stores those text elements that have an occurrence count of at least a occurrence count threshold for storage in a text element storage unit, uses those text elements that have an occurrence count of at least a occurrence count threshold for similarity calculation to calculate similarity, and calculates discrepancy for those text elements for which the difference of occurrence counts is at least a occurrence count threshold for discrepancy calculation. As a result, the calculation of the similarity and the discrepancy between two texts in the same or different languages can be determined
-
Citations
35 Claims
-
1. A text comparison apparatus, comprising:
-
a text storage means for storing a plurality of texts; a text input means for inputting a text from the text storage means; a text element extraction means for extracting elements from the text obtained with the text input means; a text element counting means for counting how many of each of the text elements are included the text; a text element storage means for storing text elements and occurrence counts as sets; a text element input means for inputting text elements and their occurrence counts with regard to two texts from the text element storage means; and a similarity calculation means for calculating a similarity of the texts by dividing the sum of the occurrence counts of the text elements included in both of the two texts by the sum of the occurrence counts of all text elements in each of the texts. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A text comparison apparatus, comprising:
-
a text storage means for storing a plurality of texts; a text input means for inputting a text from the text storage means; a text element extraction means for extracting elements from the text obtained with the text input means; a text element counting means for counting how many of each of the text elements are included the text; a text element storage means for storing text elements and occurrence counts as sets; a text element input means for inputting text elements and their occurrence counts with regard to two texts from the text element storage means; a storage means for storing occurrence count threshold settings for discrepancy calculation, which gives an occurrence count threshold of text elements to be used when calculating the discrepancy of two texts; and a discrepancy calculation means for calculating the discrepancy of two texts by summing up the differences between the occurrence counts of text elements included in the two texts for the text elements with a value of at least that given by the storage means for storing occurrence count threshold settings for discrepancy calculation while assigning the differences to the text containing more of the text elements, and dividing this sum by the sum of the occurrence counts of all text elements in each of the texts. - View Dependent Claims (10, 11, 12, 13)
-
-
14. A text comparison apparatus, comprising:
-
a text storage means for storing a plurality of texts; a text input means for inputting a text from the text storage means; a word analysis means for analyzing words and their part of speech from the text obtained with the text input means; a word counting means for counting how many of each of the respective words are included the text, the counting being carried out for each of the part-of-speech data in case of words that have a plurality of part-of-speech data; a word storage means for storing words, part-of-speech data and occurrence counts as sets; a word input means for inputting words, part-of-speech data and occurrence counts with regard to two texts from the word storage means; and a similarity calculation means for calculating a similarity of the texts by dividing the sum of the occurrence counts of the words that are included in both of the two texts and have matching part-of-speech data by the sum of the occurrence counts of all words in each of the texts. - View Dependent Claims (15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27)
-
-
28. A text comparison apparatus, comprising:
-
a text storage means for storing a plurality of texts; a text input means for inputting a text from the text storage means; a word analysis means for analyzing words and their part of speech from the text obtained with the text input means; a word counting means for counting how many of each of the respective words are included the text, the counting being carried out for each of the part-of-speech data in case of words that have a plurality of part-of-speech data; a word storage means for storing words, part-of-speech data and occurrence counts as sets; a word input means for inputting words, part-of-speech data and occurrence counts with regard to two texts from the word storage means; a storage means for storing occurrence count threshold settings for discrepancy calculation, which gives an occurrence count threshold of words to be used when calculating the discrepancy of two texts; and a discrepancy calculation means for calculating the discrepancy of two texts by summing up the differences between the occurrence counts of words that are included in the two texts and that have matching part-of-speech data, for the words with a value of at least that given by the storage means for storing occurrence count threshold settings for discrepancy calculation while assigning the differences to the text containing more of the words, and dividing this sum by the sum of the occurrence counts of all words in each of the texts. - View Dependent Claims (29, 30, 31, 32, 33, 34, 35)
-
Specification