Human language analyzer for detecting clauses, clause types, and clause relationships

US 10,467,344 B1
Filed: 04/10/2019
Issued: 11/05/2019
Est. Priority Date: 08/02/2018
Status: Active Grant

First Claim

Patent Images

1. A computer-program product tangibly embodied in a non-transitory machine-readable storage medium, the computer-program product including instructions operable to cause a clause analyzer to:

receive a clause request, at the clause analyzer, to locate clauses within text data representing information in a human language, wherein the text data comprises a plurality of clauses of a sentence portion;

generate, responsive to the clause request, a dependency request to a dependency parser to determine dependency information for a plurality of tokens of the plurality of clauses of the sentence portion, wherein each token of the plurality of tokens comprises one or more characters of the text data, and the dependency information indicates a dependency of a respective token of the sentence portion on at least one other token of the sentence portion;

receive, responsive to the dependency request, token information in a token data set, wherein the token information comprises one or more token identifiers of identified tokens in the sentence portion of the text data, and wherein the token information comprises dependency information indicating a dependency of a respective token of the identified tokens on at least one other token of the sentence portion;

determine a location for each of the plurality of clauses of the sentence portion in a hierarchy of clauses by associating to each clause in the sentence portion;

one of a plurality of levels of the hierarchy of clauses; and

any respective connection to a clause associated to a different level of the plurality of levels;

generate a new data set based on the token data set and the hierarchy of clauses, wherein the new data set comprises information representing the token information, and the new data set comprises one or more location identifiers indicating a location of a respective token of the token information according to a location, in the hierarchy of clauses, of a clause comprising the respective token of the token information; and

output the new data set.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A human language analyzer receives, at the human language analyzer, text data representing information in a human language. The human language analyzer receives a computer command for identifying a text data component of the text data. The computer command comprises at least two requirements for the text data component. The human language analyzer, responsive to identifying that the first requirement and the second requirement is met, locates the text data component from one of two clauses. A clause analyzer receives a clause request to locate clauses within text data representing information in a human language. The clause analyzer receives, responsive to a dependency request, token information in a token data set. The clause analyzer determines a location for each clause of the sentence portion in a hierarchy of clauses. The clause analyzer generates and outputs a new data set based on the token data set and the hierarchy of clauses.

54 Citations

View as Search Results

30 Claims

1. A computer-program product tangibly embodied in a non-transitory machine-readable storage medium, the computer-program product including instructions operable to cause a clause analyzer to:
- receive a clause request, at the clause analyzer, to locate clauses within text data representing information in a human language, wherein the text data comprises a plurality of clauses of a sentence portion;
  
  generate, responsive to the clause request, a dependency request to a dependency parser to determine dependency information for a plurality of tokens of the plurality of clauses of the sentence portion, wherein each token of the plurality of tokens comprises one or more characters of the text data, and the dependency information indicates a dependency of a respective token of the sentence portion on at least one other token of the sentence portion;
  
  receive, responsive to the dependency request, token information in a token data set, wherein the token information comprises one or more token identifiers of identified tokens in the sentence portion of the text data, and wherein the token information comprises dependency information indicating a dependency of a respective token of the identified tokens on at least one other token of the sentence portion;
  
  determine a location for each of the plurality of clauses of the sentence portion in a hierarchy of clauses by associating to each clause in the sentence portion;
  
  one of a plurality of levels of the hierarchy of clauses; and
  
  any respective connection to a clause associated to a different level of the plurality of levels;
  
  generate a new data set based on the token data set and the hierarchy of clauses, wherein the new data set comprises information representing the token information, and the new data set comprises one or more location identifiers indicating a location of a respective token of the token information according to a location, in the hierarchy of clauses, of a clause comprising the respective token of the token information; and
  
  output the new data set.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
- - 2. The computer-program product of claim 1, wherein the instructions are operable to cause the clause analyzer to:
    - detect that a clause of the plurality of clauses of the sentence portion is an interrupted clause with all or a portion of words of an embedded clause of the plurality of clauses between words of the interrupted clause in the text data; and
      
      determine a location for the embedded clause and the interrupted clause in the hierarchy of clauses by associating different levels of the plurality of levels of the hierarchy of clauses to the embedded clause and the interrupted clause.
  - 3. The computer-program product of claim 2,wherein the instructions are operable to cause the clause analyzer to determine the location of the embedded clause based on a clause type of the embedded clause;
    - andwherein the clause type is one of a main clause, a balanced subordinate clause, and a de-ranked subordinate clause.
  - 4. The computer-program product of claim 1,wherein the token information comprises a same indication of a coordination token for any type of coordinating conjunction word in the sentence portion;
    - andwherein the instructions are operable to cause the clause analyzer to;
      
      classify at most one token in each respective clause of the plurality of clauses as a head of the respective clause; and
      
      classify a given token as a head of a first clause in the hierarchy of clauses when a given coordination token indicates a relationship between the first clause and a second clause and the second clause contains a token identified as the head of the second clause.
  - 5. The computer-program product of claim 1,wherein the instructions are operable to cause the clause analyzer to obtain a dependency tree of the sentence portion based on the dependency information, wherein the dependency tree comprises a sentence head that is a main predicate in the sentence portion and the dependency tree indicates a syntactic relationship of each token in the sentence portion from the sentence head;
    - andwherein the associating the any respective connection comprises identifying a main clause of the hierarchy of clauses, using the dependency tree, and a connection between the main clause and another clause in the sentence portion at an adjacent level.
  - 6. The computer-program product of claim 1, wherein the instructions are operable to cause the clause analyzer to generate the new data set by appending the one or more location identifiers to the token data set.
  - 7. The computer-program product of claim 1, wherein the token information further comprises in addition to the dependency information:
    - morphological information of the respective token, wherein the morphological information comprises a root form of a token identified in the sentence portion of the text data; and
      
      part of speech information of the respective token.
  - 8. The computer-program product of claim 1, wherein the instructions are operable to cause the clause analyzer to generate the new data set by generating, for each token represented in the token information, a main clause identifier indicating whether a clause comprising a respective token is a main clause or a subordinate clause.
  - 9. The computer-program product of claim 8, wherein the instructions are operable to cause the clause analyzer to:
    - identify a subordinate clause in the plurality of clauses; and
      
      generate a zero for a given main clause identifier indicating the clause is a subordinate clause.
  - 10. The computer-program product of claim 8, wherein the instructions are operable to cause the clause analyzer to:
    - identify a detection ordering of main clauses of the plurality of clauses of the sentence portion; and
      
      generate a numeral for a given main clause identifier uniquely representing an identified position in the detection ordering.
  - 11. The computer-program product of claim 8, wherein the instructions are operable to cause the clause analyzer to:
    - identify whether a first clause of the plurality of clauses of the sentence portion is a main clause, balanced subordinate clause, or de-ranked subordinate clause; and
      
      generate an explicit indication for a main clause identifier for the first clause explicitly indicating whether the first clause is a main clause, balanced subordinate clause, or de-ranked subordinate clause.
  - 12. The computer-program product of claim 1, wherein the instructions are operable to cause the clause analyzer to:
    - determine a first location of a first clause of the plurality of clauses of the sentence portion by associating a first level of the hierarchy of clauses to the first clause;
      
      determine a second location of a second clause of the plurality of clauses of the sentence portion by associating a second level of the hierarchy of clauses to the second clause, wherein the second level is different from the first level;
      
      determine a path from the determined first location of the first clause in the hierarchy of clauses to the determined second location of the second clause in the hierarchy of clauses; and
      
      generate the new data set by generating, for respective tokens of the first clause, a clause hierarchy identifier indicating the path from the first location of the first clause to the second location of the second clause.
  - 13. The computer-program product of claim 12, wherein the instructions are operable to cause the clause analyzer to:
    - determine respective locations for one or more clauses with levels intermediate to the first level and the second level of the hierarchy of clauses;
      
      obtain the path from the first location to the second location with the determined respective locations for one or more clauses with levels intermediate to the first level and the second level; and
      
      generate the new data set by generating the clause hierarchy identifier indicating the one or more clauses with locations intermediate between the first clause and the second clause.
  - 14. The computer-program product of claim 1, wherein the instructions are operable to cause the clause analyzer to:
    - classify each token in each respective clause of the plurality of clauses as a head of the respective clause or a member of the respective clause; and
      
      generate the new data set by generating, for each token represented in the token information, a clause head indicator indicating whether a respective token is classified by the clause analyzer as a head of the respective clause or as a non-head member of the respective clause.
  - 15. The computer-program product of claim 1, wherein the instructions are operable to cause the clause analyzer to:
    - classify by the clause analyzer a token represented in the token information as a token that heads a de-ranked subordinate clause of the plurality of clauses; and
      
      determine the location of the de-ranked subordinate clause in the hierarchy of clauses based on the classified token that heads the de-ranked subordinate clause.
  - 16. The computer-program product of claim 1,wherein the instructions are operable to cause the clause analyzer to augment a location of a given clause in the hierarchy of clauses by associating with the given clause a new level in the plurality of levels that is different than a previous level associated to given clause in the hierarchy of clauses;
    - andwherein the given clause is a de-ranked subordinate clause or a balanced subordinate clause.
  - 17. The computer-program product of claim 1, wherein the one or more location identifiers indicate a balanced subordinate clause or promoted de-ranked subordinate clause is at a child location relative to a respective parent clause in the hierarchy of clauses.
  - 18. The computer-program product of claim 1, wherein the one or more location identifiers indicate a de-ranked subordinate clause or demoted subordinate clause is a same level as a respective parent clause in the hierarchy of clauses.
  - 19. The computer-program product of claim 1, wherein the text data comprises English language text data.
  - 20. The computer-program product of claim 1, wherein the instructions are operable to cause the clause analyzer to receive token information comprises identification of each token of the sentence portion of the text data in a matrix that comprises one or more of:
    - a unique identifier for each token that uniquely identifies within the sentence portion a given string of characters as a token; and
      
      an indication of a length, start, or end of characters forming a token within the sentence portion.

21. A computer-implemented method comprising:
- receiving a clause request, at a clause analyzer, to locate clauses within text data representing information in a human language, wherein the text data comprises a plurality of clauses of a sentence portion;
  
  generating, responsive to the clause request, a dependency request to a dependency parser to determine dependency information for a plurality of tokens of the plurality of clauses of the sentence portion, wherein each token of the plurality of tokens comprises one or more characters of the text data, and the dependency information indicates a dependency of a respective token of the sentence portion on at least one other token of the sentence portion;
  
  receiving, responsive to the dependency request, token information in a token data set, wherein the token information comprises one or more token identifiers of identified tokens in the sentence portion of the text data, and wherein the token information comprises dependency information indicating a dependency of a respective token of the identified tokens on at least one other token of the sentence portion;
  
  determining a location for each of the plurality of clauses of the sentence portion in a hierarchy of clauses by associating to each clause in the sentence portion;
  
  one of a plurality of levels of the hierarchy of clauses; and
  
  any respective connection to a clause associated to a different level of the plurality of levels;
  
  generating a new data set based on the token data set and the hierarchy of clauses, wherein the new data set comprises information representing the token information, and the new data set comprises one or more location identifiers indicating a location of a respective token of the token information according to a location, in the hierarchy of clauses, of a clause comprising the respective token of the token information; and
  
  outputting the new data set.
- View Dependent Claims (22, 23, 24, 25, 26, 27, 28, 29)
- - 22. The computer-implemented method of claim 21, further comprising:
    - detecting that a clause of the plurality of clauses of the sentence portion is an interrupted clause with all or a portion of words of an embedded clause of the plurality of clauses between words of the interrupted clause in the text data; and
      
      determining a location for the embedded clause and the interrupted clause in the hierarchy of clauses by associating different levels of the plurality of levels of the hierarchy of clauses to the embedded clause and the interrupted clause.
  - 23. The computer-implemented method of claim 21,wherein the token information comprises a same indication of a coordination token for any type of coordinating conjunction word in the sentence portion;
    - andand the method further comprises;
      
      classifying at most one token in each respective clause of the plurality of clauses as a head of the respective clause; and
      
      classifying a given token as a head of a first clause in the hierarchy of clauses when a given coordination token indicates a relationship between the first clause and a second clause and the second clause contains a token identified as the head of the second clause.
  - 24. The computer-implemented method of claim 21,wherein the method further comprises obtaining a dependency tree of the sentence portion based on the dependency information, wherein the dependency tree comprises a sentence head that is a main predicate in the sentence portion and the dependency tree indicates a syntactic relationship of each token in the sentence portion from the sentence head;
    - andwherein the associating the any respective connection comprises identifying a main clause of the hierarchy of clauses, using the dependency tree, and a connection between the main clause and another clause in the sentence portion at an adjacent level.
  - 25. The computer-implemented method of claim 21, wherein the generating the new data set comprises appending the one or more location identifiers to the token data set.
  - 26. The computer-implemented method of claim 21, wherein the token information further comprises in addition to the dependency information:
    - morphological information of the respective token, wherein the morphological information comprises a root form of a token identified in the sentence portion of the text data; and
      
      part of speech information of the respective token.
  - 27. The computer-implemented method of claim 21, wherein the generating the new data set comprises generating, for each token represented in the token information, a main clause identifier indicating whether a clause comprising a respective token is a main clause or a subordinate clause.
  - 28. The computer-implemented method of claim 21, wherein the method further comprises:
    - determining a first location of a first clause of the plurality of clauses of the sentence portion by associating a first level of the hierarchy of clauses to the first clause;
      
      determining a second location of a second clause of the plurality of clauses of the sentence portion by associating a second level of the hierarchy of clauses to the second clause, wherein the second level is different than the first level;
      
      determining a path from the determined first location of the first clause in the hierarchy of clauses to the determined second location of the second clause in the hierarchy of clauses; and
      
      wherein the generating the new data set comprises generating, for respective tokens of the first clause, a clause hierarchy identifier indicating the determined path from the first location of the first clause to the second location of the second clause.
  - 29. The computer-implemented method of claim 21,wherein the method further comprises classifying each token in each respective clause of the plurality of clauses as a head of the respective clause or a member of the respective clause;
    - andwherein the generating the new data set comprises generating, for each token represented in the token information, a clause head indicator indicating whether a respective token is classified by the clause analyzer as a head of the respective clause or as a non-head member of the respective clause.

30. A clause analyzer comprising processor and memory, the memory containing instructions executable by the processor wherein the clause analyzer is configured to:
- receive a clause request, at the clause analyzer, to locate clauses within text data representing information in a human language, wherein the text data comprises a plurality of clauses of a sentence portion;
  
  generate, responsive to the clause request, a dependency request to a dependency parser to determine dependency information for a plurality of tokens of the plurality of clauses of the sentence portion, wherein each token of the plurality of tokens comprises one or more characters of the text data, and the dependency information indicates a dependency of a respective token of the sentence portion on at least one other token of the sentence portion;
  
  receive, responsive to the dependency request, token information in a token data set, wherein the token information comprises one or more token identifiers of identified tokens in the sentence portion of the text data, and wherein the token information comprises dependency information indicating a dependency of a respective token of the identified tokens on at least one other token of the sentence portion;
  
  determine a location for each of the plurality of clauses of the sentence portion in a hierarchy of clauses by associating to each clause in the sentence portion;
  
  one of a plurality of levels of the hierarchy of clauses; and
  
  any respective connection to a clause associated to a different level of the plurality of levels;
  
  generate a new data set based on the token data set and the hierarchy of clauses, wherein the new data set comprises information representing the token information, and the new data set comprises one or more location identifiers indicating a location of a respective token of the token information according to a location, in the hierarchy of clauses, of a clause comprising the respective token of the token information; and
  
  output the new data set.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
SAS Institute Incorporated
Original Assignee
SAS Institute Incorporated
Inventors
Jade, Teresa S., Chiang, Wei-shan, Arthur, Aaron Douglas, Lee, Seng, Yang, Qin, Yang, Xu
Primary Examiner(s)
Singh, Satwant K

Application Number

US16/380,353
Time in Patent Office

209 Days
Field of Search

None
US Class Current
CPC Class Codes

G06F 40/205   Parsing

G06F 40/211   Syntactic parsing, e.g. bas...

G06F 40/284   Lexical analysis, e.g. toke...

G06F 40/30   Semantic analysis

Human language analyzer for detecting clauses, clause types, and clause relationships

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

54 Citations

30 Claims

Specification

Solutions

Use Cases

Quick Links

Human language analyzer for detecting clauses, clause types, and clause relationships

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

54 Citations

30 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links