Customizing information by combining pair of annotations from at least two different documents

US 8,977,953 B1
Filed: 01/26/2007
Issued: 03/10/2015
Est. Priority Date: 01/27/2006
Status: Active Grant

First Claim

Patent Images

1. A method for obtaining information embedded in unstructured text, comprising:

generating computer-readable annotations based on the unstructured text, at least one of the computer-readable annotations comprising an indication of a linguistic feature;

generating at least one computer-readable relation between at least one pair of the computer-readable annotations;

wherein the unstructured text is from two or more different electronic documents, and the relation relates a first annotation from a first one of the electronic documents to a second annotation from a different one of the electronic documents;

storing characteristic data structures in a database, the characteristic data structures comprising the at least one pair of the computer-readable annotations and the at least one computer-readable relation;

receiving a query comprising at least one criterion;

returning results from the database, wherein the results comprise the at least one pair of the annotations and the at least one computer-readable relation;

generating an information result based on the results that are returned from the database and that comprise the at least one pair of the annotations and the at least one computer-readable relation;

wherein the relation relates the first annotation to the second annotation based on one or more similarities between contents of the first annotation and the second annotation, and wherein the relation represents a semantic relationship between the contents;

wherein the information result comprises a grammatical unit not present in its entirety in any of the at least one pair of the annotations and the at least one computer-readable relation returned from the database;

wherein generating the information result in response to the query includes generating the grammatical unit by applying a transformation to the at least one pair of the annotations and the at least one computer-readable relation returned from the database by combining the at least one pair of the annotations from at least two different sentences from the unstructured text according to the at least one relation to generate the grammatical unit.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and method for obtaining information embedded in unstructured text is provided. The method comprising generating computer-readable annotations based on the unstructured text, at least one of the computer-readable annotations comprising an indication of a linguistic feature. A pair of the computer-readable annotations may be used to generate at least one computer-readable relation between the pair. The annotations and/or relations may be stored as characteristic data structures in a database. A query comprising at least one criterion may be received. In response to the query, an information result may be generated based on at least one of the characteristic data structures stored in the database.

177 Citations

55 Claims

1. A method for obtaining information embedded in unstructured text, comprising:
- generating computer-readable annotations based on the unstructured text, at least one of the computer-readable annotations comprising an indication of a linguistic feature;
  
  generating at least one computer-readable relation between at least one pair of the computer-readable annotations;
  
  wherein the unstructured text is from two or more different electronic documents, and the relation relates a first annotation from a first one of the electronic documents to a second annotation from a different one of the electronic documents;
  
  storing characteristic data structures in a database, the characteristic data structures comprising the at least one pair of the computer-readable annotations and the at least one computer-readable relation;
  
  receiving a query comprising at least one criterion;
  
  returning results from the database, wherein the results comprise the at least one pair of the annotations and the at least one computer-readable relation;
  
  generating an information result based on the results that are returned from the database and that comprise the at least one pair of the annotations and the at least one computer-readable relation;
  
  wherein the relation relates the first annotation to the second annotation based on one or more similarities between contents of the first annotation and the second annotation, and wherein the relation represents a semantic relationship between the contents;
  
  wherein the information result comprises a grammatical unit not present in its entirety in any of the at least one pair of the annotations and the at least one computer-readable relation returned from the database;
  
  wherein generating the information result in response to the query includes generating the grammatical unit by applying a transformation to the at least one pair of the annotations and the at least one computer-readable relation returned from the database by combining the at least one pair of the annotations from at least two different sentences from the unstructured text according to the at least one relation to generate the grammatical unit.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25)
- - 2. The method of claim 1, wherein the at least one criterion comprises an indication of a linguistic feature.
  - 3. The method as recited in claim 1, further comprising resolving the unstructured text associated with at least one computer-readable annotation to a normalized form.
  - 4. The method as recited in claim 3, wherein resolving unstructured text is based on word sense disambiguation.
  - 5. The method as recited in claim 3, wherein resolving unstructured text is based on timestamping.
  - 6. The method as recited in claim 3, wherein resolving unstructured text is based on toponym resolution.
  - 7. The method as recited in claim 3, wherein resolving unstructured text is based on co-reference resolution.
  - 8. The method as recited in claim 1, wherein the at least one computer-readable relation comprises an indication of a linguistic feature.
  - 9. The method as recited in claim 1, further comprising generating at least one additional characteristic data structure based on the information result.
  - 10. The method as recited in claim 1, further comprising transforming the information result into a computer readable form for another application.
  - 11. The method as recited in claim 10, wherein transforming the information result is based on data mining.
  - 12. The method as recited in claim 1, further comprising transforming the information result to a human-readable information product comprising any one of a query result page configured for a display screen, a time and location briefing configured for a display screen, a competitive intelligence briefing configured for a display screen, a product comparison configured for a display screen, and a product summary configured for a display screen.
  - 13. The method recited in claim 12, wherein the human-readable information product is based on ranking of information results.
  - 14. The method recited in claim 12, wherein the human-readable information product is based on scoring of information results.
  - 15. The method recited in claim 12, wherein the human-readable information product is a briefing based on the characteristic data structures.
  - 16. The method as recited in claim 12, wherein transforming the information result to a human-readable information product is based on natural language generation by transforming annotations comprising linguistic data structures with a natural language generating engine into one or more complete sentences.
  - 17. The method as recited in claim 12, wherein transforming the information result to a human-readable information product is based on automatic text summarization by one or more of:
    - transforming annotations comprising pronouns into preferred names based on entities that the pronouns refer to;
      
      applying machine translation to annotations;
      
      transforming annotations into synonyms for concepts represented in the annotations;
      
      transforming technical jargon into common terminology;
      
      or combining multiple annotations into a single natural language sentence.
  - 18. The method as recited in claim 12, wherein the human readable information product comprises human readable text, the human readable text distinguishable from the unstructured text.
  - 19. The method as recited in claim 1, further comprising accessing unstructured text from a plurality of sources.
  - 20. The method as recited in claim 1, wherein generating the computer-readable annotations is based on a lexical feature.
  - 21. The method as recited in claim 1, wherein generating the characteristic data structures is based on a syntactic feature.
  - 22. The method as recited in claim 1, wherein generating the characteristic data structures is based on a semantic feature.
  - 23. The method as recited in claim 1, wherein generating the characteristic data structures is based on a discourse unit feature.
  - 24. The method as recited in claim 1, further comprising converting non-text sources into unstructured text.
  - 25. The method as recited in claim 1, wherein generating the information result is based on a viewpoint.

26. A non-transitory computer readable medium having stored thereon a program, the program executable by a processor for performing a method for obtaining information embedded in unstructured text, the method comprising:
- generating computer-readable annotations based on the unstructured text, at least one of the computer-readable annotations comprising an indication of a linguistic feature;
  
  generating at least one computer-readable relation between at least one pair of the computer-readable annotations;
  
  wherein the unstructured text is from two or more different electronic documents, and the relation relates a first annotation from a first one of the electronic documents to a second annotation from a different one of the electronic documents;
  
  storing characteristic data structures in a database, the characteristic data structures comprising the at least one pair of the computer-readable annotations and the at least one computer-readable relation;
  
  receiving a query comprising at least one criterion;
  
  returning results from the database, wherein the results comprise the at least one pair of the annotations and the at least one computer-readable relation;
  
  generating an information result based on the results that are returned from the database and that comprise the at least one pair of the annotations and the at least one computer-readable relation;
  
  wherein the relation relates the first annotation to the second annotation based on one or more similarities between contents of the first annotation and the second annotation, and wherein the relation represents a semantic relationship between the contents;
  
  wherein the information result comprises a grammatical unit not present in its entirety in any of the at least one pair of the annotations and the at least one computer-readable relation returned from the database;
  
  wherein generating the information result in response to the query includes generating the grammatical unit by applying a transformation to the at least one pair of the annotations and the at least one computer-readable relation returned from the database by combining the at least one pair of the annotations from at least two different sentences from the unstructured text according to the at least one relation to generate the grammatical unit.

27. A method for obtaining information embedded in unstructured text, comprising:
- prior to receiving a query;
  
  generating computer-readable annotations based on the unstructured text, at least one of the computer-readable annotations comprising an indication of a linguistic feature;
  
  generating at least one computer-readable relation between at least one pair of the computer-readable annotations;
  
  wherein a relation, from the at least one computer-readable relation, relates a first annotation from a first electronic document to a second annotation from a second electronic document, which is different than the first electronic document;
  
  storing characteristic data structures in a database, the characteristic data structures comprising the at least one pair of the computer-readable annotations and the at least one computer-readable relation;
  
  receiving the query comprising at least one criterion; and
  
  in response to receiving the query;
  
  generating an information result in response to the query based on at least one of the characteristic data structures stored in the database by transforming, into a different format, each annotation associated with results from the database based on the query, and scoring and ranking each transformed annotation;
  
  wherein transforming each annotation comprises performing;
  
  transforming annotations comprising pronouns into preferred names based on entities that the pronouns refer to, transforming annotations into synonyms for concepts represented in the annotations, and combining multiple annotations, according to the at least one relation, to generate a single natural language sentence;
  
  wherein the at least one relation relates a first annotation from the multiple annotations to a second annotation from the multiple annotations based on one or more similarities between contents of the first annotation and the second annotation, and wherein the at least one relation represents a semantic relationship between the contents.
- View Dependent Claims (28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40)
- - 28. The method of claim 27, wherein the at least one criterion comprises an indication of a linguistic feature.
  - 29. The method as recited in claim 27, further comprising resolving the unstructured text associated with at least one computer-readable annotation to a normalized form.
  - 30. The method as recited in claim 27, wherein the at least one computer-readable relation comprises an indication of a linguistic feature.
  - 31. The method as recited in claim 27, further comprising generating at least one additional characteristic data structure based on the information result.
  - 32. The method as recited in claim 27, further comprising transforming the information result into a computer readable form for another application.
  - 33. The method as recited in claim 27, further comprising transforming the information result to a human-readable information product comprising any one of a query result page configured for a display screen, a time and location briefing configured for a display screen, a competitive intelligence briefing configured for a display screen, a product comparison configured for a display screen, and a product summary configured for a display screen.
  - 34. The method as recited in claim 27, further comprising accessing unstructured text from a plurality of sources.
  - 35. The method as recited in claim 27, wherein generating the computer-readable annotations is based on a lexical feature.
  - 36. The method as recited in claim 27, wherein generating the characteristic data structures is based on a syntactic feature.
  - 37. The method as recited in claim 27, wherein generating the characteristic data structures is based on a semantic feature.
  - 38. The method as recited in claim 27, wherein generating the characteristic data structures is based on a discourse unit feature.
  - 39. The method as recited in claim 27, further comprising converting non-text sources into unstructured text.
  - 40. The method as recited in claim 27, wherein generating the information result is based on a viewpoint.

41. A non-transitory computer readable medium having stored thereon a program, the program executable by a processor for performing a method for obtaining information embedded in unstructured text, the method comprising:
- prior to receiving a query;
  
  generating computer-readable annotations based on the unstructured text, at least one of the computer-readable annotations comprising an indication of a linguistic feature;
  
  generating at least one computer-readable relation between at least one pair of the computer-readable annotations;
  
  wherein a relation, from the at least one computer-readable relation, relates a first annotation from a first electronic document to a second annotation from a second electronic document, which is different than the first electronic document;
  
  storing characteristic data structures in a database, the characteristic data structures comprising the at least one pair of the computer-readable annotations and the at least one computer-readable relation;
  
  receiving the query comprising at least one criterion; and
  
  in response to receiving the query;
  
  generating an information result in response to the query based on at least one of the characteristic data structures stored in the database by transforming, into a different format, each annotation associated with results from the database based on the query, and scoring and ranking each transformed annotation;
  
  wherein transforming each annotation comprises performing;
  
  transforming annotations comprising pronouns into preferred names based on entities that the pronouns refer to, transforming annotations into synonyms for concepts represented in the annotations, and combining multiple annotations, according to the at least one relation, to generate a single natural language sentence;
  
  wherein the at least one relation relates a first annotation from the multiple annotations to a second annotation from the multiple annotations based on one or more similarities between contents of the first annotation and the second annotation, and wherein the at least one relation represents a semantic relationship between the contents.
- View Dependent Claims (42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55)
- - 42. The computer-readable medium of claim 41, wherein the at least one criterion comprises an indication of a linguistic feature.
  - 43. The computer-readable medium as recited in claim 41, further comprising instructions which when executed cause resolving the unstructured text associated with at least one computer-readable annotation to a normalized form.
  - 44. The computer-readable medium as recited in claim 41, wherein the at least one computer-readable relation comprises an indication of a linguistic feature.
  - 45. The computer-readable medium as recited in claim 41, further comprising instructions which when executed cause generating at least one additional characteristic data structure based on the information result.
  - 46. The computer-readable medium as recited in claim 41, further comprising instructions which when executed cause transforming the information result into a computer readable form for another application.
  - 47. The computer-readable medium as recited in claim 41, further comprising instructions which when executed cause transforming the information result to a human-readable information product comprising any one of a query result page configured for a display screen, a time and location briefing configured for a display screen, a competitive intelligence briefing configured for a display screen, a product comparison configured for a display screen, and a product summary configured for a display screen.
  - 48. The computer-readable medium as recited in claim 41, further comprising instructions which when executed cause accessing unstructured text from a plurality of sources.
  - 49. The computer-readable medium as recited in claim 41, wherein the instructions cause generating the computer-readable annotations based on a lexical feature.
  - 50. The computer-readable medium as recited in claim 41, wherein the instructions cause generating the characteristic data structures based on a syntactic feature.
  - 51. The computer-readable medium as recited in claim 41, wherein the instructions cause generating the characteristic data structures based on a semantic feature.
  - 52. The computer-readable medium as recited in claim 41, wherein the instructions cause generating the characteristic data structures is based on a discourse unit feature.
  - 53. The computer-readable medium as recited in claim 41, further comprising instructions which when executed cause converting non-text sources into unstructured text.
  - 54. The computer-readable medium as recited in claim 41, wherein the instructions cause generating the information result based on a viewpoint.
  - 55. The method of claim 41, wherein the grammatical unit is one of:
    - a phrase, a clause, or a sentence.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Linguastat, Inc.
Original Assignee
Linguastat, Inc.
Inventors
Pierre, John M., Butler, Mark H.
Primary Examiner(s)
Paula, Cesar
Assistant Examiner(s)
Blackwell, James H

Application Number

US11/698,444
Time in Patent Office

2,965 Days
Field of Search

715200-203, 715/205, 715/209, 715229-234, 715/254, 715/255, 704/9, 709202-207, 709217-219, 707706-757, 707/912, 707/999.2, 707/999.203, 706/45
US Class Current

715/230
CPC Class Codes

G06F 16/38   Retrieval characterised by ...

G06F 16/9577   Optimising the visualizatio...

G06F 16/958   Organisation or management ...

G06F 40/169   Annotation, e.g. comment da...

G06F 40/284   Lexical analysis, e.g. toke...

G06F 40/289   Phrasal analysis, e.g. fini...

G06F 40/30   Semantic analysis

Customizing information by combining pair of annotations from at least two different documents

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

177 Citations

55 Claims

Specification

Solutions

Use Cases

Quick Links

Customizing information by combining pair of annotations from at least two different documents

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

177 Citations

55 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links