Information processing system, an information processing method and a computer readable storage medium

US 10,354,010 B2
Filed: 04/24/2015
Issued: 07/16/2019
Est. Priority Date: 04/24/2015
Status: Active Grant

First Claim

Patent Images

1. An information processing system comprising:

a memory storing instructions; and

one or more processors configured to execute the instructions to;

store distances between any two terms of a plurality of terms, the wherein a distance of the stored distances becomes smaller as two terms are semantically more similar and if the two terms tend to occur in texts that belong to a same class;

adjust a weight of each term of the plurality of terms in a weight vector including weights of the plurality of terms and representing a text, on the basis of distances between each term and other terms in the weight vector and weights of the other terms; and

classify the text using the adjusted weight of each term of the plurality of terms in the weight vector,wherein the weight of each term in the weight vector is adjusted by estimating a latent weight of each term of the plurality of terms in the weight vector by a MAP (maximum-a-posteriori) estimate with a posterior probability of the latent weight when observed weights of the plurality of terms in the weight vector are given, assuming the observed weights are generated from latent weights with Gaussian noise, andwherein the posterior probability is approximated with a subset of the observed weights of the plurality of terms, wherein the subset corresponds to a subset of the plurality of terms in the weight vector selected based on the distances between each term and the other terms.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An information processing system to increase weights of words that are related to a text, but that do not explicitly occur in the text, in a weight vector representing the text, is provided. An adjusting system (100) includes a distance storing unit (110) and an adjusting unit (120). The distance storing unit (110) stores distances between any two terms of a plurality of terms. The distance between two terms becomes smaller as the two terms are semantically more similar. The adjusting unit (120) adjusts a weight of each term of the plurality of terms in a weight vector including weights of the plurality of terms and representing a text, on the basis of a distance between each term and other term in the weight vector and a weight of the other term.

37 Citations

10 Claims

1. An information processing system comprising:
- a memory storing instructions; and
  
  one or more processors configured to execute the instructions to;
  
  store distances between any two terms of a plurality of terms, the wherein a distance of the stored distances becomes smaller as two terms are semantically more similar and if the two terms tend to occur in texts that belong to a same class;
  
  adjust a weight of each term of the plurality of terms in a weight vector including weights of the plurality of terms and representing a text, on the basis of distances between each term and other terms in the weight vector and weights of the other terms; and
  
  classify the text using the adjusted weight of each term of the plurality of terms in the weight vector,wherein the weight of each term in the weight vector is adjusted by estimating a latent weight of each term of the plurality of terms in the weight vector by a MAP (maximum-a-posteriori) estimate with a posterior probability of the latent weight when observed weights of the plurality of terms in the weight vector are given, assuming the observed weights are generated from latent weights with Gaussian noise, andwherein the posterior probability is approximated with a subset of the observed weights of the plurality of terms, wherein the subset corresponds to a subset of the plurality of terms in the weight vector selected based on the distances between each term and the other terms.
- View Dependent Claims (2, 3, 4, 5)
- - 2. The information processing system according to claim 1, whereinthe one or more processors configured to further execute the instructions to:
    - calculate the distances between any two terms of the plurality of terms on the basis of a distance between feature vectors of the two terms, the feature vector including at least one of information about a local-word window context and information about a topical context.
  - 3. The information processing system according to claim 2, whereinthe distances are corrected in such a way that the distance between two terms becomes smaller if the two terms tend to occur in texts that belong to the same class.
  - 4. The information processing system according to claim 1, whereinthe weight of each term in the weight vector is adjusted by calculating a covariance matrix for the plurality of terms from the distances between each term and the other terms, and estimating the latent weight of each term of the plurality of terms from the weight vector on the basis of the MAP estimate using the calculated covariance matrix.
  - 5. The information processing system according to claim 4, whereinthe latent weight of each term of the plurality of terms is estimated using the calculated covariance matrix restricted to the subset of the plurality of terms, wherein distances between each term and the subset of the plurality of terms are smaller than a distance between each term and a term not in the subset.

6. An information processing method comprising:
- reading out distances between each term of a plurality of terms in a weight vector and other terms in the weight vector from a distance storage which stores distances between any two terms of the plurality of terms, wherein a distance of the stored distances becomes smaller as two terms are semantically more similar and if the two terms tend to occur in texts that belong to a same class, the weight vector including weights of the plurality of terms and representing a text;
  
  adjusting a weight of each term of the plurality of terms in the weight vector on the basis of the distances between each term and other terms in the weight vector and weights of the other terms; and
  
  classifying the text using the adjusted weight of each term of the plurality of terms in the weight vector,wherein the weight of each term in the weight vector is adjusted by estimating a latent weight of each term of the plurality of terms in the weight vector by a MAP (maximum-a-posteriori) estimate with a posterior probability of the latent weight when observed weights of the plurality of terms in the weight vector are given, assuming the observed weights are generated from latent weights with Gaussian noise, andwherein the posterior probability is approximated with a subset of the observed weights of the plurality of terms, wherein the subset corresponds to a subset of the plurality of terms in the weight vector selected based on the distances between each term and the other terms.
- View Dependent Claims (7, 8, 9)
- - 7. The information processing method according to claim 6, further comprising calculating the distances between any two terms of the plurality of terms on the basis of a distance between feature vectors of the two terms, the feature vector including at least one of information about a local-word window context and information about a topical context.
  - 8. The information processing method according to claim 7, wherein,the calculating corrects the distances in such a way that the distance between two terms becomes smaller if the two terms tend to occur in texts that belong to the same class.
  - 9. The information processing method according to claim 6,wherein the adjusting adjusts the weight of each term in the weight vector by calculating a covariance matrix for the plurality of terms from the distances between each term and the other terms, and estimating the latent weight of each term of the plurality of terms from the weight vector on the basis of the MAP estimate using the calculated covariance matrix.

10. A non-transitory computer readable storage medium recording thereon a program, causing a computer to perform a method comprising:
- reading out distances between each term of a plurality of terms in a weight vector and other term terms in the weight vector from a distance storage which stores distances between any two terms of the plurality of terms, wherein a distance of the stored distances becomes smaller as two terms are semantically more similar and if the two terms tend to occur in texts that belong to a same class, the weight vector including weights of the plurality of terms and representing a text;
  
  adjusting a weight of each term of the plurality of terms in the weight vector on the basis of the distances between each term and other terms in the weight vector and weights of the other terms; and
  
  classifying the text using the adjusted weight of each term of the plurality of terms in the weight vector,wherein the weight of each term in the weight vector is adjusted by estimating a latent weight of each term of the plurality of terms in the weight vector by a MAP (maximum-a-posteriori) estimate with a posterior probability of the latent weight when observed weights of the plurality of terms in the weight vector are given, assuming the observed weights are generated from latent weights with Gaussian noise, andwherein the posterior probability is approximated with a subset of the observed weights of the plurality of terms, wherein the subset corresponds to a subset of the plurality of terms in the weight vector selected based on the distances between each term and the other terms.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
NEC Corporation
Original Assignee
NEC Corporation
Inventors
Andrade Silva, Daniel Georg, Tamura, Akihiro, Tsuchida, Masaaki
Primary Examiner(s)
Desir, Pierre Louis
Assistant Examiner(s)
Kim, Jonathan C

Application Number

US15/567,630
Publication Number

US 20180137100A1
Time in Patent Office

1,544 Days
Field of Search

None
US Class Current
CPC Class Codes

G06F 16/3347   using vector based model

G06F 16/35   Clustering; Classification

G06F 16/353   into predefined classes

G06F 40/30   Semantic analysis

Information processing system, an information processing method and a computer readable storage medium

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

37 Citations

10 Claims

Specification

Use Cases

Quick Links

Others

Information processing system, an information processing method and a computer readable storage medium

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

37 Citations

10 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others