Word Use Difference Information Acquisition Program and Device

US 20090089046A1
Filed: 07/10/2006
Published: 04/02/2009
Est. Priority Date: 07/12/2005
Status: Active Grant

First Claim

Patent Images

1. In a computer searchably provided with, or connected to, a corpus, which is a usage example database containing example sentences for a plurality of target vocabulary terms having the same or similar meaning, and a thesaurus, which is a database describing the word-to-word relationship between one word and another within a conceptual hierarchy, a word use difference information acquisition program for extracting and outputting information relating to the difference in usage for a plurality of target terms having the same or similar meaning, said program causing the computer to execute processing comprising:

a target word inputting step of receiving the input of a plurality of target words,a sentence extracting step for accessing the corpus, searching the corpus for each target word for which input thereof has been received in the target word inputting step, and extracting from the corpus each sentence data containing any of said target words,a noun extracting step for analyzing the structure of each sentence data extracted in the sentence extracting step, and extracting from each sentence data nouns that exist in a grammatical relationship with the target word contained therein,a directional graph forming step for accessing the thesaurus, searching the thesaurus for the nouns extracted in the noun extracting step, extracting the node representing each of said nouns and the node representing the higher ranking conceptual category with respect to each said noun, and forming a directional graph constructed from the thus extracted nodes and links that connect respective higher and lower ranking nodes and show the relationship therebetween with respect to the conceptual hierarchy, for each corresponding target word,a difference extracting step for comparing each of the directional graphs formed in the directional graph forming step, and extracting the difference nodes between the directional graphs of different target words, anda difference outputting step for outputting the difference between the directional graphs extracted in the difference extracting step as information representing difference in usage between the target words.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A device or computer implemented program for accurately and automatically obtaining general-purpose information regarding the usage difference between a plurality of synonyms and quasi-synonyms, such as the types of words with which the synonyms and quasi-synonyms are often used, is provided with: means for receiving the input of a plurality of words; means for extracting sentence data including an inputted word from a corpus; means for analyzing the sentence structure of the sentence data and extracting nouns that are in a grammatical relationship with the inputted word included in the sentence data; means for extracting the nodes representing the nouns and the nodes representing the semantic category of the noun from a thesaurus and forming a directional graph for each inputted word; means for comparing a plurality of directional graphs and extracting the difference nodes; and means for outputting the extracted difference nodes as information relating to the usage difference of the inputted words.

Citations

22 Claims

1. In a computer searchably provided with, or connected to, a corpus, which is a usage example database containing example sentences for a plurality of target vocabulary terms having the same or similar meaning, and a thesaurus, which is a database describing the word-to-word relationship between one word and another within a conceptual hierarchy, a word use difference information acquisition program for extracting and outputting information relating to the difference in usage for a plurality of target terms having the same or similar meaning, said program causing the computer to execute processing comprising:
- a target word inputting step of receiving the input of a plurality of target words,a sentence extracting step for accessing the corpus, searching the corpus for each target word for which input thereof has been received in the target word inputting step, and extracting from the corpus each sentence data containing any of said target words,a noun extracting step for analyzing the structure of each sentence data extracted in the sentence extracting step, and extracting from each sentence data nouns that exist in a grammatical relationship with the target word contained therein,a directional graph forming step for accessing the thesaurus, searching the thesaurus for the nouns extracted in the noun extracting step, extracting the node representing each of said nouns and the node representing the higher ranking conceptual category with respect to each said noun, and forming a directional graph constructed from the thus extracted nodes and links that connect respective higher and lower ranking nodes and show the relationship therebetween with respect to the conceptual hierarchy, for each corresponding target word,a difference extracting step for comparing each of the directional graphs formed in the directional graph forming step, and extracting the difference nodes between the directional graphs of different target words, anda difference outputting step for outputting the difference between the directional graphs extracted in the difference extracting step as information representing difference in usage between the target words.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. A word use difference information acquisition program as described in claim 1, wherein in the difference extracting step, the computer executes processing in whichthe difference nodes are extracted by demarcating as shared the same nodes or portions linked to said same nodes occurring in each of the directional graphs, and superimposing each of said directional graphs.
  - 3. A word use difference information acquisition program as described in claim 1, wherein if the number of target words received in the target word inputting step is three or more, in the difference extracting step, the computer is caused to execute processing in whicheach of the directional graphs formed for a plurality of target words aside from a single specific target word are combined to form a common directional graph, and said common directional graph is compared with the directional graph of said single specific target word, and the difference nodes between said directional graphs are extracted, and the procedure of extracting the difference nodes between a common directional graph and a directional graph of a single specific target word is repeated for each target word of said plurality of target words utilizing each of said target words as said single specific target word.
  - 4. A word use difference information acquisition program as described in claim 1 wherein the computer is caused to execute processing in whichin the noun extracting step, the nouns contained in each sentence data and existing therein in a grammatical relationship with the target word are extracted together with data relating to the frequency with which said each noun appears with the target word in the sentence data,in the directional graph forming step, each node of the directional graphs to be formed is weighted with the data relating to the frequency, andin the difference extracting step, each of the weighted directional graphs is compared, utilizing the weighted directional graphs formed in the directional graph forming step, and the difference nodes between the weighted directional graphs of different target words are extracted.
  - 5. A word use difference information acquisition program as described in claim 4, wherein a frequency ratio representing a ratio of a rate of a frequency of co-occurrence with a corresponding target word occupied by each of the nouns extracted for said corresponding target word to a total rate of the frequency of co-occurrence with the corresponding target word for all of said nouns combined is applied as the data relating to frequency in the noun extracting step, and the computer is caused to execute processing in whichin the directional graph forming step, the directional graphs are weighted based on said frequency ratio by appending the frequency to the nodes corresponding to the nouns in the directional graphs that are to be formed, appending the total value of the combined frequencies of said nodes representing the nouns to the node representing the higher conceptual category thereof, and appending to all nodes a frequency ratio that is a normalization of each of the individual frequencies,in the difference extracting step, the ratio of the frequency rates of a same node occurring in each of two weighted directional graphs formed in the directional graph forming step and which are subjects for comparison is calculated for each same node, andthe nodes for which the ratio calculated thereof is greater than or equal to a predetermined value are incorporated into the difference nodes, which are the distinctive nodes, and said difference nodes are extracted.
  - 6. A word use difference information acquisition program as described in claim 5, wherein in the difference extracting step, the computer is caused to execute processing in which a procedure wherein a ratio of the frequency rate of a same node occurring in two directional graphs that are subjects for comparison is calculated for each of said same nodes,if the calculated ratio is greater than or equal to a predetermined value, said same node is provisionally incorporated into the difference portion of the directional graph as a difference node,a predetermined number of the top nodes within said difference portion are extracted for each target word in the order starting with the top node having the largest frequency and the proportion of the common nodes among the extracted nodes is calculated, andthe procedure of calculating the proportion of common nodes among the extracted nodes is repeated while the frequency ratio is gradually diminished, whereby, if the proportion of the common nodes calculated in each iteration of the procedure is greater than or equal to a fixed value, said proportion of common nodes is compared to the proportion of common nodes calculated in the previous iteration of the procedure, and if the compared value is greater than or equal to a fixed value, the nodes that were provisionally determined in that iteration of the procedure to be difference nodes are determined finally to be difference nodes, and said nodes are extracted as difference nodes.
  - 7. A word use difference information acquisition program as described in claim 5, wherein the frequency value itself is used instead of the frequency ratio.
  - 8. A word use difference information acquisition program as described in claim 4, wherein the computer is caused to execute processing in whichin the difference extracting step, the extracted difference nodes are subjected to a further extraction process wherein a predetermined number thereof are again extracted in the order starting from the extracted difference node of which the weighting based on frequency is greatest, andin the difference outputting step, the predetermined number of extracted nodes is outputted as the information relating to the difference in usage of the target words.
  - 9. A word use difference information acquisition program as described in claim 1, wherein the computer is caused to execute processing in whichin the difference outputting step, the top node of the difference nodes is outputted as the information relating to the difference in usage of the target words.
  - 10. A word use difference information acquisition program as described in claim 1, wherein the computer is caused to execute processing in whichin the difference outputting step, in addition to, or instead of, the top node of the difference nodes, the bottom node of the common nodes is outputted as the information relating to the difference in usage of the target words.
  - 11. A word use difference information acquisition program as described in claim 1, wherein the computer is caused to execute processing in whichin the target word inputting step, the part of speech of the target words is restricted to adjective or verb.

12. A word use difference information acquisition device configured by a computer that is operated according to a program and which extracts and outputs information relating to the difference in usage for a plurality of target terms having the same or similar meaning, said computer being searchably provided with, or connected to, a corpus, which is a usage example database containing example sentences for a plurality of target vocabulary terms having the same or similar meaning, and a thesaurus, which is a database describing the word-to-word relationship between one word and another within a conceptual hierarchy, and comprising:
- a target word inputting means for receiving the input of a plurality of target words,a sentence extracting means for accessing the corpus, searching the corpus for each target word for which input thereof has been received by the target word inputting means, and extracting from the corpus each sentence data containing any of said target words,a noun extracting means for analyzing the structure of each sentence data extracted by the sentence extracting means, and extracting from each sentence data nouns that exist in a grammatical relationship with the target word contained therein,a directional graph forming means for accessing the thesaurus, searching the thesaurus for the nouns extracted by the noun extracting means, extracting the node representing each of said nouns and the node representing the higher ranking conceptual category with respect to each said noun, and forming a directional graph constructed from the thus extracted nodes and links that connect respective higher and lower ranking nodes and show the relationship therebetween with respect to the conceptual hierarchy, for each corresponding target word,a difference extracting means for comparing each of the directional graphs formed in the directional graph forming means, and extracting the difference nodes between the directional graphs of different target words, anda difference outputting means for outputting the difference between the directional graphs extracted in the difference extracting means as information representing difference in usage between the target words.
- View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
- - 13. A word use difference information acquisition device as described in claim 12, whereinthe difference extracting means extracts the difference nodes by demarcating as shared the common nodes or portions linked to the common nodes occurring in each of the directional graphs, and superimposing each of said directional graphs.
  - 14. A word use difference information acquisition device as described in claim 12, wherein if the number of target words received by the target word inputting means is three or more, the difference extracting means executes processing whereineach of the directional graphs formed for a plurality of target words aside from a single specific target word are combined to form a common directional graph, and said common directional graph is compared with the directional graph of said single specific target word, and the difference nodes between said directional graphs are extracted, and the procedure of extracting the difference nodes between a common directional graph and a directional graph of a single specific target word is repeated for each target word of said plurality of target words utilizing each of said target words as said single specific target word.
  - 15. A word use difference information acquisition device as described in claim 12, wherein the difference extracting means executes processing in which in the noun extracting step, the nouns contained in each sentence data and existing therein in a grammatical relationship with the target word are extracted together with data relating to the frequency with which said each noun co-occurs with the target word in the sentence data,the directional graph forming means executes processing wherein each node of the directional graphs to be formed is weighted with the data relating to the frequency, andthe difference extracting means compares each of the weighted directional graphs, utilizing the weighted directional graphs formed by the directional graph forming means, and extracts the difference nodes between the weighted directional graphs of different target words.
  - 16. A word use difference information acquisition device as described in claim 15, whereina frequency ratio representing a ratio of a rate of a frequency of co-occurrence with a corresponding target word occupied by each of the nouns extracted for said corresponding target word to a total rate of the frequency of co-occurrence with the corresponding target word for all of said nouns combined is used as the data relating to frequency by the noun extracting means, andthe directional graph forming means executes processing wherein the directional graphs are weighted based on said frequency ratio by appending the frequency to the nodes corresponding to the nouns in the directional graphs that are to be formed, appending the total value of the combined frequencies of said nodes representing the nouns to the node representing the higher conceptual category thereof, and appending to all nodes a frequency ratio that is a normalization of each of the individual frequencies, the difference extracting means executes processing wherein the ratio of the frequency rates of a same node occurring in each of two weighted directional graphs formed in the directional graph forming step and which are subjects for comparison is calculated for each same node, and the nodes for which the ratio calculated thereof is greater than or equal to a predetermined value are incorporated into the difference nodes, which are the distinctive nodes, and said difference nodes are extracted.
  - 17. A word use difference information acquisition device as described in claim 16, wherein the difference extracting means executes processing wherein a procedure in whicha procedure wherein a ratio of the frequency rate of a same node occurring in two directional graphs that are subjects for comparison is calculated for each of said same nodes,if the calculated ratio is greater than or equal to a predetermined value, said same node is provisionally incorporated into the difference portion of the directional graph as a difference node,a predetermined number of the top nodes within said difference portion are extracted for each target word in the order starting with the top node having the largest frequency and the proportion of the common nodes among the extracted nodes is calculated, andthe procedure of calculating the proportion of common nodes among the extracted nodes is repeated while the frequency ratio is gradually diminished, whereby, if the proportion of the common nodes calculated in each iteration of the procedure is greater than or equal to a fixed value, said proportion of common nodes is compared to the proportion of common nodes calculated in the previous iteration of the procedure, and if the compared value is greater than or equal to a fixed value, the nodes that were provisionally determined in that iteration of the procedure to be difference nodes are determined finally to be difference nodes, and said nodes are extracted as difference nodes.
  - 18. A word use difference information acquisition device as described in claim 16, whereinthe frequency value itself is used instead of the frequency ratio.
  - 19. A word use difference information acquisition device as described in claim 15, whereinthe difference extracting means executes processing wherein the extracted difference nodes are subjected to a further extraction process in which a predetermined number thereof are again extracted in the order starting from the extracted difference node of which the weighting based on frequency is greatest, andthe difference outputting means outputs the predetermined number of extracted nodes as the information relating to the difference in usage of the target words.
  - 20. A word use difference information acquisition device as described in claim 12, whereinthe difference extracting means outputs the top node among the difference nodes as the information relating to the difference in usage of the target words.
  - 21. A word use difference information acquisition device as described in claim 12, whereinthe difference outputting means outputs, in addition to, or instead of, the top node of the difference nodes, the bottom node of the common nodes as the information relating to the difference in usage of the target words.
  - 22. A word use difference information acquisition device as described in claim 12, whereinthe target word inputting means restricts the part of speech of the target words for which the input thereof is to be received to adjective or verb.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
National Institute of Information and Communications Technology
Original Assignee
National Institute of Information and Communications Technology
Inventors
Shindo, Mika, Isahara, Hitoshi, Uchimoto, Kiyotaka

Granted Patent

US 8,010,342 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/9
CPC Class Codes

G06F 40/247 Thesauruses; Synonyms

G06F 40/284 Lexical analysis, e.g. toke...

Word Use Difference Information Acquisition Program and Device

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

22 Claims

Specification

Solutions

Use Cases

Quick Links

Word Use Difference Information Acquisition Program and Device

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

22 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links