Building and updating of co-occurrence dictionary and analyzing of co-occurrence and meaning
First Claim
1. A computer implemented method, implemented by a programmed computer, of building a co-occurrence dictionary describing whether phrases co-occur in one sentence, the phases belonging to first and second categories in a dictionary containing phrases of a natural language which is an object, said method comprising using the computer to build the co-occurrence dictionary by implementing the steps of:
- selecting, as a first sub-group of phrases (11), phrases from a first group of phrases (1) comprising all phrases belonging to said first category in said dictionary;
selecting, as a second sub-group of phrases (21), phrases from a second group of phrases (2) comprising all phrases belonging to said second category in the dictionary;
preparing first co-occurrence information describing whether each phrase belonging to the first sub-group (11) and each phrase belonging to the second sub-group (21) co-occur in one sentence of the object language;
preparing second-co-occurrence information describing whether each phrase belonging to a third sub-group of phrases (12), comprising all the phrases in the first group (1) which do not belong to the first sub-group (11) and each phrase belonging to the second sub-group (21), co-occur in one sentence of the object language;
preparing third co-occurrence information describing whether each phrase belonging to a fourth sub-group of phrases (22), comprising all the phrases in the second group (2) which do not belong to the second sub-group (21) and each phrase belonging to the first sub-group (11) co-occur in one sentence of the object language;
arranging the first co-occurrence information such that each phrase belonging to the first sub-group (11) corresponds to a real number vector with a dimension below a common maximum dimension and each phrase belonging to the second sub-group (21) corresponds to a real number vector with a dimension below the common maximum dimension;
calculating a value of the real number vector corresponding to each phrase in the first sub-group (11) and a value of the real number vector corresponding to each phrase in the second sub-group (21) on the basis of the first co-occurrence information so that the number of sets of two phrases, wherein;
a value of an inner product of the real number vector corresponding to a first phrase and the real number vector corresponding to a second phrase becomes positive when describing, in the first co-occurrence information, that a first phrase belonging to said first sub-group (11) and a second phrase belonging to said second sub-group (21) co-occur in one sentence, andthe value of an inner product of the real number vector corresponding to said first phrase and the real number vector corresponding to said second phrase becomes negative when describing, in said first co-occurrence information, that said first phrase belonging to said first sub-group (11) and said second phrase belonging to said second sub-group (21) do not co-occur in one sentence,becomes the greatest of all the numbers of sets each comprising phrases belonging to said first sub-group (11) and phrases belonging to the second sub-group (21);
arranging said second co-occurrence information such that each phrase belonging to said third sub-group (12) corresponds to a real number vector with a dimension below the maximum dimension;
calculating a value of the real number vector corresponding to each phrase in said third sub-group (12) on the basis of said second co-occurrence information so that the number of sets of two phrases, wherein;
a value of the inner product of the real number vector corresponding to a third phrase belonging to said third sub-group (12) and the real number vector corresponding to a fourth phrase belonging to said second sub-group (21) and calculated on the basis of said first co-occurrence information becomes positive when describing, in said second co-occurrence information, that the third phase and the fourth phrase co-occur in one sentence, anda value of an inner product of the real number vector corresponding to the third phrase and the real number vector corresponding to the fourth phrase becomes negative when describing, in said second co-occurrence information, that the third phrase and the fourth phrase do not co-occur in one sentence,becomes the largest of all the numbers of sets each comprising a phrase belonging to said third sub-group (12) and a phrase belonging to said second sub-group (21);
arranging said third co-occurrence information such that each phrase belonging to the fourth sub-group (22) corresponds to a real number vector with a dimension below the maximum dimension; and
calculating a value of the real number vector corresponding to each phrase in the fourth sub-group (22) on the basis of said third co-occurrence information so that the number of sets of two phrases, wherein;
the inner product of the real number vector corresponding to a fifth phrase belonging to said first sub-group (11) and calculated on the basis of said first co-occurrence information and the real number vector corresponding to a sixth phrase belonging to the fourth sub-group (22) becomes positive when describing, in the third co-occurrence information, that the fifth phrase and the sixth phrase co-occur in one sentence and, on the other hand,the inner product of the real number vector corresponding to the fifth phrase calculated on the basis of the first co-occurrence information and the real number vector corresponding to the sixth phrase becomes negative when describing, in the third co-occurrence information, that the fifth phrase and the sixth phrase do not co-occur in one sentence,becomes the greatest of all the numbers of sets each comprising a phrase belonging to said first sub-group (11) and a phrase belonging to said fourth sub-group (22).
1 Assignment
0 Petitions
Accused Products
Abstract
A co-occurrence dictionary is built through a process for calculating three kinds of co-occurrence information and a real number vector corresponding to each category. The co-occurrence dictionary is updated through a process for selecting the opposite phrase of the co-occurrence for the additional co-occurrence information and a process for calculating a real number vector corresponding to an additional word on the basis of the additional co-occurrence information. A co-occurrence analysis is effected through a process for calculating in real number the degree of the co-occurrence on the basis of the real number vectors corresponding to two categories to be checked in the co-occurrence relation, and a semantic analysis is effected through a process for indicating, by a numerical value, the propriety of the interpretation on the basis of the degree of each co-occurrence.
78 Citations
9 Claims
-
1. A computer implemented method, implemented by a programmed computer, of building a co-occurrence dictionary describing whether phrases co-occur in one sentence, the phases belonging to first and second categories in a dictionary containing phrases of a natural language which is an object, said method comprising using the computer to build the co-occurrence dictionary by implementing the steps of:
-
selecting, as a first sub-group of phrases (11), phrases from a first group of phrases (1) comprising all phrases belonging to said first category in said dictionary; selecting, as a second sub-group of phrases (21), phrases from a second group of phrases (2) comprising all phrases belonging to said second category in the dictionary; preparing first co-occurrence information describing whether each phrase belonging to the first sub-group (11) and each phrase belonging to the second sub-group (21) co-occur in one sentence of the object language; preparing second-co-occurrence information describing whether each phrase belonging to a third sub-group of phrases (12), comprising all the phrases in the first group (1) which do not belong to the first sub-group (11) and each phrase belonging to the second sub-group (21), co-occur in one sentence of the object language; preparing third co-occurrence information describing whether each phrase belonging to a fourth sub-group of phrases (22), comprising all the phrases in the second group (2) which do not belong to the second sub-group (21) and each phrase belonging to the first sub-group (11) co-occur in one sentence of the object language; arranging the first co-occurrence information such that each phrase belonging to the first sub-group (11) corresponds to a real number vector with a dimension below a common maximum dimension and each phrase belonging to the second sub-group (21) corresponds to a real number vector with a dimension below the common maximum dimension; calculating a value of the real number vector corresponding to each phrase in the first sub-group (11) and a value of the real number vector corresponding to each phrase in the second sub-group (21) on the basis of the first co-occurrence information so that the number of sets of two phrases, wherein; a value of an inner product of the real number vector corresponding to a first phrase and the real number vector corresponding to a second phrase becomes positive when describing, in the first co-occurrence information, that a first phrase belonging to said first sub-group (11) and a second phrase belonging to said second sub-group (21) co-occur in one sentence, and the value of an inner product of the real number vector corresponding to said first phrase and the real number vector corresponding to said second phrase becomes negative when describing, in said first co-occurrence information, that said first phrase belonging to said first sub-group (11) and said second phrase belonging to said second sub-group (21) do not co-occur in one sentence, becomes the greatest of all the numbers of sets each comprising phrases belonging to said first sub-group (11) and phrases belonging to the second sub-group (21); arranging said second co-occurrence information such that each phrase belonging to said third sub-group (12) corresponds to a real number vector with a dimension below the maximum dimension; calculating a value of the real number vector corresponding to each phrase in said third sub-group (12) on the basis of said second co-occurrence information so that the number of sets of two phrases, wherein; a value of the inner product of the real number vector corresponding to a third phrase belonging to said third sub-group (12) and the real number vector corresponding to a fourth phrase belonging to said second sub-group (21) and calculated on the basis of said first co-occurrence information becomes positive when describing, in said second co-occurrence information, that the third phase and the fourth phrase co-occur in one sentence, and a value of an inner product of the real number vector corresponding to the third phrase and the real number vector corresponding to the fourth phrase becomes negative when describing, in said second co-occurrence information, that the third phrase and the fourth phrase do not co-occur in one sentence, becomes the largest of all the numbers of sets each comprising a phrase belonging to said third sub-group (12) and a phrase belonging to said second sub-group (21); arranging said third co-occurrence information such that each phrase belonging to the fourth sub-group (22) corresponds to a real number vector with a dimension below the maximum dimension; and calculating a value of the real number vector corresponding to each phrase in the fourth sub-group (22) on the basis of said third co-occurrence information so that the number of sets of two phrases, wherein; the inner product of the real number vector corresponding to a fifth phrase belonging to said first sub-group (11) and calculated on the basis of said first co-occurrence information and the real number vector corresponding to a sixth phrase belonging to the fourth sub-group (22) becomes positive when describing, in the third co-occurrence information, that the fifth phrase and the sixth phrase co-occur in one sentence and, on the other hand, the inner product of the real number vector corresponding to the fifth phrase calculated on the basis of the first co-occurrence information and the real number vector corresponding to the sixth phrase becomes negative when describing, in the third co-occurrence information, that the fifth phrase and the sixth phrase do not co-occur in one sentence, becomes the greatest of all the numbers of sets each comprising a phrase belonging to said first sub-group (11) and a phrase belonging to said fourth sub-group (22). - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
Specification