Generating distributed word embeddings using structured information

US 9,922,025 B2
Filed: 08/08/2017
Issued: 03/20/2018
Est. Priority Date: 05/08/2015
Status: Active Grant

First Claim

Patent Images

1. A method for generating a vector representation of a set of natural language text in a natural language processing system, the method comprising:

receiving, by the natural language processing system, a first set of natural language text and a set of information pertaining to the first set of natural language text, where the information includes a dependency parse tree including a root node and a plurality of nodes that depend from the root node, where the root node represents the first set of natural language text, and where the plurality of nodes that depend from the root node represent context features of the first set of natural language text;

generating, by the natural language processing system, a first vector representation of the first set of natural language text, wherein the generating includes adding vector representations for the context features represented by the plurality of nodes that depend from the root node; and

comparing, by the natural language processing system, the generated first vector representation to a second vector representation to determine, in the natural language processing system, an amount of similarity between the first set of natural language text and a second set of natural language text represented by the second vector representation.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A computer program that generates a vector representation of a set of natural language text in a natural language processing system by: (i) receiving a first set of natural language text and a set of information pertaining to the first set of natural language text, where the information includes a dependency parse tree including a root node and a plurality of nodes that depend from the root node, where the root node represents the first set of natural language text, and where the plurality of nodes that depend from the root node represent context features of the first set of natural language text; and (ii) generating, by the natural language processing system, a first vector representation of the first set of natural language text, wherein the generating includes adding vector representations for the context features represented by the plurality of nodes that depend from the root node.

Citations

20 Claims

1. A method for generating a vector representation of a set of natural language text in a natural language processing system, the method comprising:
- receiving, by the natural language processing system, a first set of natural language text and a set of information pertaining to the first set of natural language text, where the information includes a dependency parse tree including a root node and a plurality of nodes that depend from the root node, where the root node represents the first set of natural language text, and where the plurality of nodes that depend from the root node represent context features of the first set of natural language text;
  
  generating, by the natural language processing system, a first vector representation of the first set of natural language text, wherein the generating includes adding vector representations for the context features represented by the plurality of nodes that depend from the root node; and
  
  comparing, by the natural language processing system, the generated first vector representation to a second vector representation to determine, in the natural language processing system, an amount of similarity between the first set of natural language text and a second set of natural language text represented by the second vector representation.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, wherein:
    - the first set of natural language text is part of an input sentence; and
      
      the context features of the first set of natural language text represented by the plurality of nodes that depend from the root node correspond to words or phrases from the input sentence other than the first set of natural language text.
  - 3. The method of claim 2, wherein:
    - the context features of the first set of natural language text represented by the plurality of nodes that depend from the root node include;
      
      (i) the respective words or phrases to which the context features correspond, and (ii) contextual information indicating a relationship between the respective words or phrases and the first set of natural language text.
  - 4. The method of claim 3, wherein:
    - the first set of natural language text is a verb.
  - 5. The method of claim 4, wherein:
    - a first word or phrase corresponding to and included in a first context feature of the first set of natural language text is a subject of the verb; and
      
      the contextual information included in the first context feature indicates that the first word or phrase is a subject of the verb.
  - 6. The method of claim 4, wherein:
    - a first word or phrase corresponding to and included in a first context feature of the first set of natural language text is an object of the verb; and
      
      the contextual information included in the first context feature indicates that the first word or phrase is an object of the verb.
  - 7. The method of claim 4, wherein:
    - a first word or phrase corresponding to and included in a first context feature of the first set of natural language text is a prepositional phrase that modifies the verb; and
      
      the contextual information included in the first context feature indicates that the first word or phrase is a prepositional phrase that modifies the verb.

8. A computer program product for generating a vector representation of a set of natural language text in a natural language processing system, the computer program product comprising a computer readable storage medium having stored thereon:
- program instructions to receive, by the natural language processing system, a first set of natural language text and a set of information pertaining to the first set of natural language text, where the information includes a dependency parse tree including a root node and a plurality of nodes that depend from the root node, where the root node represents the first set of natural language text, and where the plurality of nodes that depend from the root node represent context features of the first set of natural language text;
  
  program instructions to generate, by the natural language processing system, a first vector representation of the first set of natural language text, wherein the generating includes adding vector representations for the context features represented by the plurality of nodes that depend from the root node; and
  
  program instructions to compare, by the natural language processing system, the generated first vector representation to a second vector representation to determine, in the natural language processing system, an amount of similarity between the first set of natural language text and a second set of natural language text represented by the second vector representation.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The computer program product of claim 8, wherein:
    - the first set of natural language text is part of an input sentence; and
      
      the context features of the first set of natural language text represented by the plurality of nodes that depend from the root node correspond to words or phrases from the input sentence other than the first set of natural language text.
  - 10. The computer program product of claim 9, wherein:
    - the context features of the first set of natural language text represented by the plurality of nodes that depend from the root node include;
      
      (i) the respective words or phrases to which the context features correspond, and (ii) contextual information indicating a relationship between the respective words or phrases and the first set of natural language text.
  - 11. The computer program product of claim 10, wherein:
    - the first set of natural language text is a verb.
  - 12. The computer program product of claim 11, wherein:
    - a first word or phrase corresponding to and included in a first context feature of the first set of natural language text is a subject of the verb; and
      
      the contextual information included in the first context feature indicates that the first word or phrase is a subject of the verb.
  - 13. The computer program product of claim 11, wherein:
    - a first word or phrase corresponding to and included in a first context feature of the first set of natural language text is an object of the verb; and
      
      the contextual information included in the first context feature indicates that the first word or phrase is an object of the verb.
  - 14. The computer program product of claim 11, wherein:
    - a first word or phrase corresponding to and included in a first context feature of the first set of natural language text is a prepositional phrase that modifies the verb; and
      
      the contextual information included in the first context feature indicates that the first word or phrase is a prepositional phrase that modifies the verb.

15. A computer system for generating a vector representation of a set of natural language text in a natural language processing system, the computer system comprising:
- a processor(s) set; and
  
  a computer readable storage medium;
  
  wherein;
  
  the processor set is structured, located, connected and/or programmed to run program instructions stored on the computer readable storage medium; and
  
  the program instructions include;
  
  program instructions to receive, by the natural language processing system, a first set of natural language text and a set of information pertaining to the first set of natural language text, where the information includes a dependency parse tree including a root node and a plurality of nodes that depend from the root node, where the root node represents the first set of natural language text, and where the plurality of nodes that depend from the root node represent context features of the first set of natural language text;
  
  program instructions to generate, by the natural language processing system, a first vector representation of the first set of natural language text, wherein the generating includes adding vector representations for the context features represented by the plurality of nodes that depend from the root node; and
  
  program instructions to compare, by the natural language processing system, the generated first vector representation to a second vector representation to determine, in the natural language processing system, an amount of similarity between the first set of natural language text and a second set of natural language text represented by the second vector representation.
- View Dependent Claims (16, 17, 18, 19, 20)
- - 16. The computer system of claim 15, wherein:
    - the first set of natural language text is part of an input sentence; and
      
      the context features of the first set of natural language text represented by the plurality of nodes that depend from the root node correspond to words or phrases from the input sentence other than the first set of natural language text.
  - 17. The computer system of claim 16, wherein:
    - the context features of the first set of natural language text represented by the plurality of nodes that depend from the root node include;
      
      (i) the respective words or phrases to which the context features correspond, and (ii) contextual information indicating a relationship between the respective words or phrases and the first set of natural language text.
  - 18. The computer system of claim 17, wherein:
    - the first set of natural language text is a verb;
      
      a first word or phrase corresponding to and included in a first context feature of the first set of natural language text is a subject of the verb; and
      
      the contextual information included in the first context feature indicates that the first word or phrase is a subject of the verb.
  - 19. The computer system of claim 17, wherein:
    - the first set of natural language text is a verb;
      
      a first word or phrase corresponding to and included in a first context feature of the first set of natural language text is an object of the verb; and
      
      the contextual information included in the first context feature indicates that the first word or phrase is an object of the verb.
  - 20. The computer system of claim 17, wherein:
    - the first set of natural language text is a verb;
      
      a first word or phrase corresponding to and included in a first context feature of the first set of natural language text is a prepositional phrase that modifies the verb; and
      
      the contextual information included in the first context feature indicates that the first word or phrase is a prepositional phrase that modifies the verb.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Cross, III, James H., Fan, James J., Xiang, Bing, Zhou, Bowen
Primary Examiner(s)
Ky, Kevin

Application Number

US15/671,303
Publication Number

US 20170337183A1
Time in Patent Office

224 Days
Field of Search

None
US Class Current
CPC Class Codes

G06F 16/36   Creation of semantic tools,...

G06F 40/211   Syntactic parsing, e.g. bas...

G06F 40/30   Semantic analysis

Generating distributed word embeddings using structured information

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Generating distributed word embeddings using structured information

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links