Information processing system, an information processing method and a computer readable storage medium
First Claim
1. An information processing system comprising:
- a memory storing instructions; and
one or more processors configured to execute the instructions to;
store distances between any two terms of a plurality of terms, the wherein a distance of the stored distances becomes smaller as two terms are semantically more similar and if the two terms tend to occur in texts that belong to a same class;
adjust a weight of each term of the plurality of terms in a weight vector including weights of the plurality of terms and representing a text, on the basis of distances between each term and other terms in the weight vector and weights of the other terms; and
classify the text using the adjusted weight of each term of the plurality of terms in the weight vector,wherein the weight of each term in the weight vector is adjusted by estimating a latent weight of each term of the plurality of terms in the weight vector by a MAP (maximum-a-posteriori) estimate with a posterior probability of the latent weight when observed weights of the plurality of terms in the weight vector are given, assuming the observed weights are generated from latent weights with Gaussian noise, andwherein the posterior probability is approximated with a subset of the observed weights of the plurality of terms, wherein the subset corresponds to a subset of the plurality of terms in the weight vector selected based on the distances between each term and the other terms.
1 Assignment
0 Petitions
Accused Products
Abstract
An information processing system to increase weights of words that are related to a text, but that do not explicitly occur in the text, in a weight vector representing the text, is provided. An adjusting system (100) includes a distance storing unit (110) and an adjusting unit (120). The distance storing unit (110) stores distances between any two terms of a plurality of terms. The distance between two terms becomes smaller as the two terms are semantically more similar. The adjusting unit (120) adjusts a weight of each term of the plurality of terms in a weight vector including weights of the plurality of terms and representing a text, on the basis of a distance between each term and other term in the weight vector and a weight of the other term.
37 Citations
10 Claims
-
1. An information processing system comprising:
-
a memory storing instructions; and one or more processors configured to execute the instructions to; store distances between any two terms of a plurality of terms, the wherein a distance of the stored distances becomes smaller as two terms are semantically more similar and if the two terms tend to occur in texts that belong to a same class; adjust a weight of each term of the plurality of terms in a weight vector including weights of the plurality of terms and representing a text, on the basis of distances between each term and other terms in the weight vector and weights of the other terms; and classify the text using the adjusted weight of each term of the plurality of terms in the weight vector, wherein the weight of each term in the weight vector is adjusted by estimating a latent weight of each term of the plurality of terms in the weight vector by a MAP (maximum-a-posteriori) estimate with a posterior probability of the latent weight when observed weights of the plurality of terms in the weight vector are given, assuming the observed weights are generated from latent weights with Gaussian noise, and wherein the posterior probability is approximated with a subset of the observed weights of the plurality of terms, wherein the subset corresponds to a subset of the plurality of terms in the weight vector selected based on the distances between each term and the other terms. - View Dependent Claims (2, 3, 4, 5)
-
-
6. An information processing method comprising:
-
reading out distances between each term of a plurality of terms in a weight vector and other terms in the weight vector from a distance storage which stores distances between any two terms of the plurality of terms, wherein a distance of the stored distances becomes smaller as two terms are semantically more similar and if the two terms tend to occur in texts that belong to a same class, the weight vector including weights of the plurality of terms and representing a text; adjusting a weight of each term of the plurality of terms in the weight vector on the basis of the distances between each term and other terms in the weight vector and weights of the other terms; and classifying the text using the adjusted weight of each term of the plurality of terms in the weight vector, wherein the weight of each term in the weight vector is adjusted by estimating a latent weight of each term of the plurality of terms in the weight vector by a MAP (maximum-a-posteriori) estimate with a posterior probability of the latent weight when observed weights of the plurality of terms in the weight vector are given, assuming the observed weights are generated from latent weights with Gaussian noise, and wherein the posterior probability is approximated with a subset of the observed weights of the plurality of terms, wherein the subset corresponds to a subset of the plurality of terms in the weight vector selected based on the distances between each term and the other terms. - View Dependent Claims (7, 8, 9)
-
-
10. A non-transitory computer readable storage medium recording thereon a program, causing a computer to perform a method comprising:
-
reading out distances between each term of a plurality of terms in a weight vector and other term terms in the weight vector from a distance storage which stores distances between any two terms of the plurality of terms, wherein a distance of the stored distances becomes smaller as two terms are semantically more similar and if the two terms tend to occur in texts that belong to a same class, the weight vector including weights of the plurality of terms and representing a text; adjusting a weight of each term of the plurality of terms in the weight vector on the basis of the distances between each term and other terms in the weight vector and weights of the other terms; and classifying the text using the adjusted weight of each term of the plurality of terms in the weight vector, wherein the weight of each term in the weight vector is adjusted by estimating a latent weight of each term of the plurality of terms in the weight vector by a MAP (maximum-a-posteriori) estimate with a posterior probability of the latent weight when observed weights of the plurality of terms in the weight vector are given, assuming the observed weights are generated from latent weights with Gaussian noise, and wherein the posterior probability is approximated with a subset of the observed weights of the plurality of terms, wherein the subset corresponds to a subset of the plurality of terms in the weight vector selected based on the distances between each term and the other terms.
-
Specification