Systems and methods for identifying similarities using unstructured text analysis

US 10,176,251 B2
Filed: 08/31/2015
Issued: 01/08/2019
Est. Priority Date: 08/31/2015
Status: Active Grant

First Claim

Patent Images

1. A system to perform unstructured text analysis comprising:

processing circuitry communicatively coupled to a memory, the memory including instructions stored thereon which, when executed by processing circuitry, cause the processing circuitry to implement modules comprising;

a user interface module to receive, through a user interface, structured data and deconstruct the structured data to create unstructured text, the structured data comprising user responses input into respective predefined prompts on the user interface and metadata for each response of the user responses, the unstructured text comprising only text corresponding to the user responses input into the respective predefined prompts;

an application server module to receive the unstructured text, create a first word cloud using only the unstructured text, create a query based on the first word cloud, and create a boosted query by increasing a weight of a word in the query based on the metadata;

a search platform module to execute the boosted query on a database, analyze a specified number of records returned from executing the boosted query, and provide data from the specified number of records determined to include data most similar to the first word cloud based on the boosted query; and

wherein the application server module is further to create a plurality of second word clouds, one second word cloud for each of the specified number of records, determine a similarity value indicating how similar the first word cloud is to each of the second word clouds, and provide a similarity indicator to the user interface that indicates how similar a particular record of the specified number of records is to the text received at the user interface based on the determined similarity value.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Generally discussed herein are systems, devices, and methods for unstructured text analysis. A method can include deconstructing structured data to create unstructured text, creating a first word cloud using the unstructured text, creating a query based on the first word cloud, receiving data corresponding to contents of a specified number of records determined to include data most similar to the first word cloud in a database of records, creating a second word cloud for each of the specified number of records using the data from the specified number of records, determining similarity values indicating how similar the first word cloud is to each of the second word clouds, and providing a similarity indicator for each record of the specified number of records to a user interface, the similarity indicator representing a relative magnitude of the determined similarity values of the specified number of records.

76 Citations

View as Search Results

18 Claims

1. A system to perform unstructured text analysis comprising:
- processing circuitry communicatively coupled to a memory, the memory including instructions stored thereon which, when executed by processing circuitry, cause the processing circuitry to implement modules comprising;
  
  a user interface module to receive, through a user interface, structured data and deconstruct the structured data to create unstructured text, the structured data comprising user responses input into respective predefined prompts on the user interface and metadata for each response of the user responses, the unstructured text comprising only text corresponding to the user responses input into the respective predefined prompts;
  
  an application server module to receive the unstructured text, create a first word cloud using only the unstructured text, create a query based on the first word cloud, and create a boosted query by increasing a weight of a word in the query based on the metadata;
  
  a search platform module to execute the boosted query on a database, analyze a specified number of records returned from executing the boosted query, and provide data from the specified number of records determined to include data most similar to the first word cloud based on the boosted query; and
  
  wherein the application server module is further to create a plurality of second word clouds, one second word cloud for each of the specified number of records, determine a similarity value indicating how similar the first word cloud is to each of the second word clouds, and provide a similarity indicator to the user interface that indicates how similar a particular record of the specified number of records is to the text received at the user interface based on the determined similarity value.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The system of claim 1, wherein the user interface module provides a user a view of a first record to be populated using a user'"'"'s respective responses to the predefined prompts, the unstructured text consists of one or more entered respective responses, and the records include records previously populated using the user interface module.
  - 3. The system of claim 2, wherein:
    - the user interface module is to deconstruct the structured data in response to the user completing a response to a predefined prompt of the respective predefined prompts and provide the unstructured text in response to deconstructing the structured data;
      
      the application server module is further to update the first word cloud based on the user response to the predefined prompt and update the query based on the updated first word cloud;
      
      the search platform module is to execute the updated query and provide first data from second records in the database determined to include second data most similar to the updated first word cloud based on the executed updated query; and
      
      the application server module is further to create updated second word clouds using the first data from the second records, determine a similarity value indicating how similar the updated first word cloud is to each of the updated second word clouds, and provide a similarity indicator to the user interface that indicates how similar a particular record of the second records is to the data received at the user interface module.
  - 4. The system of claim 3, wherein the structured data includes data corresponding to user responses to a plurality of the predefined prompts and the unstructured text includes only data corresponding to the user'"'"'s responses to the plurality of prompts, wherein the predefined prompts include a resolution of an issue and one or more of a description of a product defect, a manufacturing error, a part defect, and a customer complaint.
  - 5. The system of claim 4, wherein the records previously filled out through the user interface module include records of the same type as the first record and records of a different type as the first record.
  - 6. The system of claim 1, wherein the first and second word clouds include a term frequency tag for each word of a subset of the words in the unstructured text that indicates the number of times the word appears in the unstructured text, a value of a weight of a word is higher for a word that appears less frequently than for a word that appears more frequently.
  - 7. The system of claim 6, wherein the search platform module is to determine a score for each record and the specified number of records include the records determined to include highest relative scores.
  - 8. The system of claim 7, wherein the search platform module is to determine the score for each record based on an inverse document frequency and one or more of a coordination factor, a length normalization, and a query clause boost factor provided by the user.
  - 9. The system of claim 8, wherein the search platform module is to set the query clause boost factor associated with a word in the query based on the value of the term frequency tag relative to values of term frequency tags associated with other words in the first word cloud.
  - 10. The system of claim 1, wherein the structured data corresponds to Hypertext Markup Language (HTML) data from the user interface module and the unstructured data includes text of an HTML element without one or more associated HTML tags.

11. A method for unstructured text analysis comprising:
- deconstructing structured data provided through a user interface to create unstructured text, the structured data comprising user responses input into respective predefined prompts on the user interface and metadata for each response of the user responses, the unstructured text comprising only text corresponding to the user responses input into the respective predefined prompts;
  
  creating, using an application server, a first word cloud using only the unstructured text;
  
  creating, using the application server, a query based on the first word cloud;
  
  creating a boosted query by increasing a weight of a word in the query based on the metadata;
  
  receiving, in response to issuing the boosted query, data corresponding to contents of a specified number of records determined to include data most similar to the first word cloud in a database of records, the database of records including records of product defects, customer complaints, and part defects;
  
  creating a second word cloud for each of the specified number of records using the data from the specified number of records;
  
  determining similarity values indicating how similar the first word cloud is to each of the second word clouds; and
  
  providing a similarity indicator for each record of the specified number of records to a user interface, the similarity indicator representing a relative magnitude of the determined similarity values of the specified number of records.
- View Dependent Claims (12, 13, 14)
- - 12. The method of claim 11, further comprising receiving, at a user interface communicatively coupled to the application server, a response to a predefined prompt in response to a user completing the response, wherein the unstructured text includes text representative of the response:
    - updating the first word cloud based on the received response;
      
      updating the query based on the updated first word cloud;
      
      executing the updated query;
      
      providing data from second records in the database determined to include data most similar to the updated first word cloud based on the executed updated query;
      
      creating updated second word clouds for each of the second records using the contents of the second records;
      
      determining a similarity value indicating how similar the updated first word cloud is to the updated second word cloud; and
      
      providing updated similarity indicators to the user interface that indicates how similar a particular record of the second records is to the text received at the user interface.
  - 13. The method of claim 11, further comprising determining a score for each of the records returned from the boosted query and providing the specified number of records determined to include highest relative scores.
  - 14. The method of claim 13, further comprising setting a query clause boost factor associated with a word in the query based on a relative value of a term frequency tag to values of term frequency tags associated with other words in the first word cloud, a value of the query clause boost factor is higher for a word that appears less frequently than for a word that appears more frequently.

15. A machine-readable storage device including instructions stored thereon which, when executed by a machine, cause the machine to perform operations for unstructured text analysis, the operations comprising:
- creating a first word cloud using only unstructured text of structured data provided by a user through a user interface, the structured data comprising user responses input into respective predefined prompts on the user interface and metadata for each response of the user responses, the unstructured text comprising only text corresponding to the user responses input into the respective predefined prompts;
  
  creating a query based on the first word cloud;
  
  creating a boosted query by increasing a weight of a word in the query based on the metadata;
  
  receiving, in response to issuing the boosted query, data corresponding to the contents of a specified number of records determined to include data most similar to the first word cloud in a database of records;
  
  creating second word clouds using the data from the specified number of records;
  
  determining similarity values indicating how similar the first word cloud is to each of the second word clouds; and
  
  providing a similarity indicator for each record of the specified number of records to a user interface, the similarity indicator representing a relative magnitude of the determined similarity values of the specified number of records.
- View Dependent Claims (16, 17, 18)
- - 16. The machine-readable storage device of claim 15, further comprising instructions which, when executed by the machine, cause the machine to perform operations comprising receiving a response to a predefined prompt of the predefined prompts in response to a user completing the response:
    - updating the first word cloud based on the received response;
      
      updating the query based on the updated first word cloud;
      
      executing the updated query;
      
      receiving data from second records in the database determined to include data most similar to the updated first word cloud based on the executed updated query;
      
      creating an updated second word cloud using the data from the second records;
      
      determining a similarity value indicating how similar the updated first word cloud is to the updated second word cloud; and
      
      providing an updated similarity indicator to the user interface that indicates how similar a particular record of the second records is to the text received at the user interface.
  - 17. The machine-readable storage device of claim 15, further comprising instructions stored thereon which, when executed by the machine, cause the machine to perform operations comprising determining a score for each record returned from the boosted query and providing the specified number of records determined to include highest relative scores.
  - 18. The machine-readable storage device of claim 17, further comprising instructions stored thereon which, when executed by the machine, cause the machine to perform operations comprising setting a query clause boost factor associated with a word in the query based on a value of a term frequency tag of the word in the first word cloud relative to values of term frequency tags associated with other words in the first word cloud, a value of the query clause boost factor is higher for a word that appears less frequently than for a word that appears more frequently.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Raytheon Company (Rtx Corporation)
Original Assignee
Raytheon Company (Rtx Corporation)
Inventors
Boule, Blaine K., Barrett, Nicholas Wayne
Primary Examiner(s)
Cao, Phuong Thao

Application Number

US14/840,192
Publication Number

US 20170060995A1
Time in Patent Office

1,226 Days
Field of Search

707728
US Class Current
CPC Class Codes

G06F 16/33   Querying

G06F 16/3344   using natural language anal...

G06F 16/338   Presentation of query results

Systems and methods for identifying similarities using unstructured text analysis

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

76 Citations

18 Claims

Specification

Use Cases

Quick Links

Others

Systems and methods for identifying similarities using unstructured text analysis

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

76 Citations

18 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others