Generating distributed word embeddings using structured information

US 9,898,458 B2
Filed: 05/08/2015
Issued: 02/20/2018
Est. Priority Date: 05/08/2015
Status: Active Grant

First Claim

Patent Images

1. A computer program product for generating a vector representation of a set of natural language text in a natural language processing system, the computer program product comprising a computer readable storage medium having stored thereon:

program instructions programmed to receive, by the natural language processing system, a first set of natural language text and a set of information pertaining to the first set of natural language text, where the information includes metadata and corresponding contextual information indicating a relationship between the metadata and the first set of natural language text;

program instructions programmed to determine, by the natural language processing system, a substitute set of natural language text, wherein the substitute set of natural language text includes the first set of natural language text, the metadata, and the corresponding contextual information indicating the relationship between the metadata and the first set of natural language text;

program instructions programmed to generate, by the natural language processing system, a first vector representation of the substitute set of natural language text; and

program instructions programmed to compare, by the natural language processing system, the generated first vector representation to a second vector representation to determine, in the natural language processing system, an amount of similarity between the first set of natural language text and a second set of natural language text represented by the second vector representation,wherein;

the first set of natural language text is a verb,the contextual information corresponding to the metadata includes a dependency parse tree,the dependency parse tree includes a root node and a plurality of nodes that depend from the root node,the root node represents the first set of natural language text,the plurality of nodes that depend from the root node represent context features of the first set of natural language text, andthe generating of the first vector representation of the substitute set of natural language text includes adding vector representations for the context features represented by the plurality of nodes that depend from the root node.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A computer program that uses structured information, such as syntactic and semantic information, as context for representing words and/or phrases as vectors, by performing the following steps: (i) receiving a first set of natural language text and a set of information pertaining to the first set of natural language text, where the information includes metadata and corresponding contextual information indicating a relationship between the metadata and the first set of natural language text; and (ii) generating a first vector representation for the first set of natural language text utilizing the metadata and its corresponding contextual information.

31 Citations

View as Search Results

12 Claims

1. A computer program product for generating a vector representation of a set of natural language text in a natural language processing system, the computer program product comprising a computer readable storage medium having stored thereon:
- program instructions programmed to receive, by the natural language processing system, a first set of natural language text and a set of information pertaining to the first set of natural language text, where the information includes metadata and corresponding contextual information indicating a relationship between the metadata and the first set of natural language text;
  
  program instructions programmed to determine, by the natural language processing system, a substitute set of natural language text, wherein the substitute set of natural language text includes the first set of natural language text, the metadata, and the corresponding contextual information indicating the relationship between the metadata and the first set of natural language text;
  
  program instructions programmed to generate, by the natural language processing system, a first vector representation of the substitute set of natural language text; and
  
  program instructions programmed to compare, by the natural language processing system, the generated first vector representation to a second vector representation to determine, in the natural language processing system, an amount of similarity between the first set of natural language text and a second set of natural language text represented by the second vector representation,wherein;
  
  the first set of natural language text is a verb,the contextual information corresponding to the metadata includes a dependency parse tree,the dependency parse tree includes a root node and a plurality of nodes that depend from the root node,the root node represents the first set of natural language text,the plurality of nodes that depend from the root node represent context features of the first set of natural language text, andthe generating of the first vector representation of the substitute set of natural language text includes adding vector representations for the context features represented by the plurality of nodes that depend from the root node.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The computer program product of claim 1, wherein the contextual information corresponding to the metadata further indicates a semantic relationship between the metadata and the first set of natural language text.
  - 3. The computer program product of claim 2, wherein the metadata includes a named entity type of at least a portion of the first set of natural language text.
  - 4. The computer program product of claim 1, wherein the program instructions programmed to generate the first vector representation of the substitute set of natural language text further include:
    - program instructions programmed to generate, by the natural language processing system, a first initial vector representation from the first set of natural language text;
      
      program instructions programmed to generate, by the natural language processing system, a second initial vector representation from the substitute set of natural language text; and
      
      program instructions programmed to add, by the natural language processing system, the first initial vector representation and the second initial vector representation to generate the first vector representation.
  - 5. The computer program product of claim 1, wherein the program instructions programmed to generate the first vector representation of the substitute set of natural language text further include:
    - program instructions programmed to provide, by the natural language processing system, the substitute set of natural language text as input into an artificial neural network trained to generate vector representations; and
      
      program instructions programmed to receive, by the natural language processing system, the first vector representation as output of the artificial neural network.
  - 6. The computer program product of claim 1, wherein the program instructions programmed to determine the substitute set of natural language text, wherein the substitute set of natural language text includes the first set of natural language text, the metadata, and the corresponding contextual information indicating the relationship between the metadata and the first set of natural language text, comprise program instructions to append the metadata and the corresponding contextual information to the first set of natural language text.

7. A computer system for generating a vector representation of a set of natural language text in a natural language processing system, the computer system comprising:
- a processor(s) set; and
  
  a computer readable storage medium;
  
  wherein;
  
  the processor set is structured, located, connected and/or programmed to run program instructions stored on the computer readable storage medium; and
  
  the program instructions include;
  
  program instructions programmed to receive, by the natural language processing system, a first set of natural language text and a set of information pertaining to the first set of natural language text, where the information includes metadata and corresponding contextual information indicating a relationship between the metadata and the first set of natural language text;
  
  program instructions programmed to determine, by the natural language processing system, a substitute set of natural language text, wherein the substitute set of natural language text includes the first set of natural language text, the metadata, and the corresponding contextual information indicating the relationship between the metadata and the first set of natural language text;
  
  program instructions programmed to generate, by the natural language processing system, a first vector representation of the substitute set of natural language text; and
  
  program instructions programmed to compare, by the natural language processing system, the generated first vector representation to a second vector representation to determine, in the natural language processing system, an amount of similarity between the first set of natural language text and a second set of natural language text represented by the second vector representation,wherein;
  
  the first set of natural language text is a verb,the contextual information corresponding to the metadata includes a dependency parse tree,the dependency parse tree includes a root node and a plurality of nodes that depend from the root node,the root node represents the first set of natural language text,the plurality of nodes that depend from the root node represent context features of the first set of natural language text, andthe generating of the first vector representation of the substitute set of natural language text includes adding vector representations for the context features represented by the plurality of nodes that depend from the root node.
- View Dependent Claims (8, 9, 10, 11, 12)
- - 8. The computer system of claim 7, wherein the contextual information corresponding to the metadata further indicates a semantic relationship between the metadata and the first set of natural language text.
  - 9. The computer system of claim 8, wherein the metadata includes a named entity type of at least a portion of the first set of natural language text.
  - 10. The computer system of claim 7, wherein the program instructions programmed to generate the first vector representation of the substitute set of natural language text further include:
    - program instructions programmed to generate, by the natural language processing system, a first initial vector representation from the first set of natural language text;
      
      program instructions programmed to generate, by the natural language processing system, a second initial vector representation from the substitute set of natural language text; and
      
      program instructions programmed to add, by the natural language processing system, the first initial vector representation and the second initial vector representation to generate the first vector representation.
  - 11. The computer system of claim 7, wherein the program instructions programmed to generate the first vector representation of the substitute set of natural language text further include:
    - program instructions programmed to provide, by the natural language processing system, the substitute set of natural language text as input into an artificial neural network trained to generate vector representations; and
      
      program instructions programmed to receive, by the natural language processing system, the first vector representation as output of the artificial neural network.
  - 12. The computer system of claim 7, wherein the program instructions programmed to determine the substitute set of natural language text, wherein the substitute set of natural language text includes the first set of natural language text, the metadata, and the corresponding contextual information indicating the relationship between the metadata and the first set of natural language text, comprise program instructions to append the metadata and the corresponding contextual information to the first set of natural language text.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Cross, III, James H., Fan, James J., Xiang, Bing, Zhou, Bowen
Primary Examiner(s)
Ky, Kevin

Application Number

US14/707,885
Publication Number

US 20160328383A1
Time in Patent Office

1,019 Days
Field of Search

None
US Class Current
CPC Class Codes

G06F 16/36   Creation of semantic tools,...

G06F 40/211   Syntactic parsing, e.g. bas...

G06F 40/30   Semantic analysis

Generating distributed word embeddings using structured information

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

31 Citations

12 Claims

Specification

Solutions

Use Cases

Quick Links

Generating distributed word embeddings using structured information

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

31 Citations

12 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links