Semantic textual analysis

US 10,296,584 B2
Filed: 01/27/2011
Issued: 05/21/2019
Est. Priority Date: 01/29/2010
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented natural language processing method of determining a degree of semantic similarity between a first unstructured natural language text phrase and one or more second unstructured natural language text phrases, the first text phrase representing a question being asked by a user as received via a user interface, each said second text phrase representing a respective answered question, the method comprising:

(a) analyzing the grammatical structure of the first unstructured natural language text phrase and each of the second unstructured natural language text phrases;

(b) transforming the first unstructured natural language text phrase into a first keyword set by executing a first set of predefined program logic sequences on the first unstructured natural language text phrase;

(c) transforming each said second unstructured natural language text phrase into a respective second keyword set by executing a second set of the predefined program logic sequences on each said second unstructured natural language text phrase to;

(d) calculating, automatically and programmatically, a passage semantic similarity measure (PSSM) between the first text phrase and each of the second text phrases by selectively aggregating outputs from the execution of the first and second sets of predefined program logic sequences, and based on (I) the similarities between the grammatical structure of the first text phrase and the respective second text phrase, and (II) the similarities between the first keyword set and the respective second keyword set, wherein PSSM calculations are indicative of degrees of semantic similarity between two different phrases despite lexical differences between those two different phrases;

(e) based on the calculated PSSM(s), matching the first text phrase with at least one of the one or more second text phrases; and

(f) responding to the question being asked by the user via the user interface as represented by first text phrase, with an answer associated with the at least one matching second text phrase.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method of comparing the semantic similarity of two different text phrases in which the grammatical structure of the two different text phrases is analyzed and a keyword set for each of the different text phrases is derived The semantic similarity of the phrases can be determined in accordance with the grammatical structure of the two different text phrases and the contents of the two keyword sets.

13 Citations

View as Search Results

20 Claims

1. A computer-implemented natural language processing method of determining a degree of semantic similarity between a first unstructured natural language text phrase and one or more second unstructured natural language text phrases, the first text phrase representing a question being asked by a user as received via a user interface, each said second text phrase representing a respective answered question, the method comprising:
- (a) analyzing the grammatical structure of the first unstructured natural language text phrase and each of the second unstructured natural language text phrases;
  
  (b) transforming the first unstructured natural language text phrase into a first keyword set by executing a first set of predefined program logic sequences on the first unstructured natural language text phrase;
  
  (c) transforming each said second unstructured natural language text phrase into a respective second keyword set by executing a second set of the predefined program logic sequences on each said second unstructured natural language text phrase to;
  
  (d) calculating, automatically and programmatically, a passage semantic similarity measure (PSSM) between the first text phrase and each of the second text phrases by selectively aggregating outputs from the execution of the first and second sets of predefined program logic sequences, and based on (I) the similarities between the grammatical structure of the first text phrase and the respective second text phrase, and (II) the similarities between the first keyword set and the respective second keyword set, wherein PSSM calculations are indicative of degrees of semantic similarity between two different phrases despite lexical differences between those two different phrases;
  
  (e) based on the calculated PSSM(s), matching the first text phrase with at least one of the one or more second text phrases; and
  
  (f) responding to the question being asked by the user via the user interface as represented by first text phrase, with an answer associated with the at least one matching second text phrase.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 10)
- - 2. The method according to claim 1, further comprising expanding any contractions detected in (a) using at least one of the predefined program logic sequences.
  - 3. The method according to claim 1, further comprising, for each idiomatic expression detected in (a), identifying one or more alternative expressions with a meaning similar to the respective detected idiomatic expression, using at least one of the predefined program logic sequences.
  - 4. The method according to claim 1, further comprising, as a part of (a), parsing the grammatical structure of each text phrase and inserting part of speech tags in accordance with results of the parsing, using at least one of the predefined program logic sequences.
  - 5. The method according to claim 1, further comprising, in (b) and (c), generating the first and second keyword sets by removing one or more stopwords from the respective text phrases, using at least one of the predefined program logic sequences.
  - 6. The method according to claim 1, further comprising, in (b) and (c), generating the first and second keyword sets by stemming words comprising the respective text phrases, using at least one of the predefined program logic sequences.
  - 7. The method according to claim 1, further comprising, in (b) and (c), generating the first and second keyword sets by extracting one of more keywords from the respective text phrases, using at least one of the predefined program logic sequences.
  - 10. The method according to claim 1, wherein the semantic similarity between the first and second text phrases calculated in (d) is used to provide one or more answers in response to a received question.

8. A computer-implemented natural language processing method of determining a degree of semantic similarity between a first unstructured natural language text phrase and a second unstructured natural language text phrase, the method comprising:
- (a) analyzing the grammatical structure of the first unstructured natural language text phrase and the second unstructured natural language text phrase;
  
  (b) transforming the first unstructured natural language text phrase into a first keyword set by executing a first set of predefined program logic sequences on the first unstructured natural language text phrase;
  
  (c) transforming the second unstructured natural language text phrase into a second keyword set by executing a second set of the predefined program logic sequences on the second unstructured natural language text phrase;
  
  (d) calculating, automatically and programmatically, a passage semantic similarity measure (PSSM) between the first text phrase and the second text phrase by selectively aggregating outputs from the execution of the first and second sets of predefined program logic sequences, and based on (I) the similarities between the grammatical structure of the first text phrase and the second text phrase, and (II) the similarities between the first keyword set and the second keyword set, wherein PSSM calculations are indicative of degrees of semantic similarity between two different phrases despite lexical differences between those two different phrases;
  
  (e) based on the calculated PSSM, determining the similarity of a first document including the first unstructured natural language text phrase, to a second document including the second unstructured natural language text phrase; and
  
  (f) generating a response to a user input query involving the determination of the degree of semantic similarity between the first unstructured natural language text phrase and the second unstructured natural language text phrase, based on the determined similarity of the first document to the second document.
- View Dependent Claims (9)
- - 9. The method according to claim 8, further comprising retrieving one or more documents in accordance with the semantic similarity between the first and second text phrases calculated in (d) and based on the determination in (e).

11. An apparatus comprising a central processing unit, volatile data storage and non-volatile data storage, the central processing unit being configured to control the apparatus to perform a natural language processing program to determine a degree of semantic similarity between a first unstructured natural language text phrase and one or more second unstructured natural language text phrases in which the first text phrase represents a question being asked by a user as received via a user interface and in which each said second text phrase represents a respective answered question, by at least:
- (a) analyzing the grammatical structure of the first unstructured natural language text phrase and each of the second unstructured natural language text phrases;
  
  (b) transforming the first unstructured natural language text phrase into a first keyword set by executing a first set of predefined program logic sequences on the first unstructured natural language text phrase;
  
  (c) transforming each said second text phrase into a respective second keyword set by executing a second set of the predefined program logic sequences on each said second unstructured natural language text phrase;
  
  (d) calculating, automatically and programmatically, a passage semantic similarity measure (PSSM) between the first text phrase and each of the second text phrases by selectively aggregating outputs from the execution of the first and second sets of predefined program logic sequences, and based on;
  
  (I) the similarities between the grammatical structure of the first text phrase and the second respective text phrase, and (II) the similarities between the first keyword set and the respective second keyword set, wherein PSSM calculations are indicative of degrees of semantic similarity between two different phrases despite lexical differences between those two different phrases;
  
  (e) based on the calculated PSSM(s), matching the first text phrase with at least one of the one or more second text phrases; and
  
  (f) responding to the question being asked by the user via the user interface as represented by first text phrase, with an answer associated with the at least one matching second text phrase.
- View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20)
- - 13. The apparatus according to claim 11, wherein at least one of the predefined program logic sequences is configured to expand any contractions detected in (a).
  - 14. The apparatus according to claim 11, wherein at least one of the predefined program logic sequences is configured to identify, for each idiomatic expression detected in (a), one or more alternative expressions with a similar meaning.
  - 15. The apparatus according to claim 11, wherein at least one of the predefined program logic sequences is configured, as a part of (a), to parse the grammatical structure of each text phrase and insert part of speech tags in accordance with results of the parsing.
  - 16. The apparatus according to claim 11, wherein at least one of the predefined program logic sequences is configured, in (b) and (c) to generate said first and second keyword sets by removing one or more stopwords from the respective text phrases.
  - 17. The apparatus according to claim 11, wherein at least one of the predefined program logic sequences is configured, in (b) and (c), to generate said first and second keyword sets by stemming words comprising the respective text phrases.
  - 18. The apparatus according to claim 11, wherein at least one of the predefined program logic sequences is configured, in (b) and (c), to generate said first and second keyword sets by extracting one of more keywords from the respective text phrases.
  - 19. The apparatus according to claim 11, wherein the semantic similarity between the first and second text phrases calculated in (d) is used to determine the similarity of a first document to a second document.
  - 20. The apparatus according to claim 11, wherein the semantic similarity between the first and second text phrases calculated in (d) is used to provide one or more answers in response to a received question.

12. A non-transitory storage medium storing computer executable code executable by a computer to perform natural language processing functionality to determine a degree of semantic similarity between a first unstructured natural language text phrase and one or more second unstructured natural language text phrases, the first text phrase representing a question being asked by a user as received via a user interface, each said second text phrase representing a respective answered question, the functionality comprising:
- (a) analyzing the grammatical structure of the first unstructured natural language text phrase and each of the second unstructured natural language text phrases;
  
  (b) transforming the first unstructured natural language text phrase into a first keyword set by executing a first set of predefined program logic sequences on the first unstructured natural language text phrase;
  
  (c) transforming each said second unstructured natural language text phrase into a respective second keyword set by executing a second set of the predefined program logic sequences on each said second unstructured natural language text phrase;
  
  (d) calculating, automatically and programmatically, a passage semantic similarity measure (PSSM) between the first text phrase and each of the second text phrases by selectively aggregating outputs from the execution of the first and second sets of predefined program logic sequences, and based on (I) the similarities between the grammatical structure of the first text phrase and the respective second text phrase, and (II) the similarities between the first keyword set and the respective second keyword set, wherein PSSM calculations are indicative of degrees of semantic similarity between two different phrases despite lexical differences between those two different phrases;
  
  (e) based on the calculated PSSM(s), matching the first text phrase with at least one of the one or more second text phrases; and
  
  (f) responding to the question being asked by the user via the user interface as represented by first text phrase, with an answer associated with the at least one matching second text phrase.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
British Telecommunications (BT Group Plc) (BT Group PLC)
Original Assignee
British Telecommunications PLC (BT Group PLC)
Inventors
Ducatel, Gery M, Thompson, Simon G, Thint, Marcus
Primary Examiner(s)
Opsasnick, Michael N

Application Number

US13/576,076
Publication Number

US 20120303358A1
Time in Patent Office

3,036 Days
Field of Search

704275
US Class Current
CPC Class Codes

G06F 16/30   of unstructured textual dat...

G06F 40/194   Calculation of difference b...

G06F 40/253   Grammatical analysis; Style...

G06F 40/30   Semantic analysis

Semantic textual analysis

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

13 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Semantic textual analysis

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

13 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links