Article and method of automatically filtering information retrieval results using test genre

US 6,505,150 B2
Filed: 06/18/1998
Issued: 01/07/2003
Est. Priority Date: 07/02/1997
Status: Expired due to Term

First Claim

Patent Images

1. A processor implemented method of searching a heterogeneous corpus of untagged machine-readable texts, each text of the corpus having a text genre and a topic, the corpus including at least a first text genre and a second text genre, the corpus including a multiplicity of topics, the processor implemented method comprising the steps of:

a) searching the corpus for a first multiplicity of untagged texts that have a first topic;

b) identifying a first set of texts of the first multiplicity of untagged texts that are instances of the first text genre;

c) identifying a second set of texts of the first multiplicity of untagged texts that are instances of the second text genre;

d) identifying the first multiplicity of untagged texts to a computer user in an order based upon at least a first type and a second type of text genre.

View all claims

7 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method of filtering according to text genre the results of a topic search of a heterogeneous corpus of untagged, machine-readable texts. Because each text of the corpus has a topic and a text genre, the corpus includes multiple text genres and covers multiple topics. According to the method, a processor first searches the corpus for a first multiplicity of texts that have a first topic. Next, the processor identifies a first set of texts of the first multiplicity that are instances of a first text genre and identifies a second set of texts of the first multiplicity that are instances of a second text genre. Finally, the processor identifies to a computer user the first multiplicity of texts in an order based upon the first text genre and second text genre.

39 Citations

View as Search Results

22 Claims

1. A processor implemented method of searching a heterogeneous corpus of untagged machine-readable texts, each text of the corpus having a text genre and a topic, the corpus including at least a first text genre and a second text genre, the corpus including a multiplicity of topics, the processor implemented method comprising the steps of:
- a) searching the corpus for a first multiplicity of untagged texts that have a first topic;
  
  b) identifying a first set of texts of the first multiplicity of untagged texts that are instances of the first text genre;
  
  c) identifying a second set of texts of the first multiplicity of untagged texts that are instances of the second text genre;
  
  d) identifying the first multiplicity of untagged texts to a computer user in an order based upon at least a first type and a second type of text genre.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method of claim 1 wherein step b) comprises the steps of:
3. The method of claim 1 wherein step b) comprises the steps of:
- 1) for each text of the first multiplicity, generating a cue vector from the text, the cue vector representing occurrences in the text of a set of nonstructural surface cues;
  
  2) for each text of the first multiplicity, determining a relevancy to the text of each facet of a set of facets using the cue vector and a weighting vector associated with the facet; and
  
  3) for each text of the first multiplicity, determining whether the text is an instance of the first text genre based upon the facets relevant to the text.
4. The method of claim 2 wherein the set of nonstructural, surface cues includes a punctuational cue.
5. The method of claim 4 wherein the set of cues further includes at least a one of a lexical cue, a string recognizable constructional cue, a formulae cue and a deviation cue.
6. The method of claim 3 wherein the set of nonstructural surface cues includes a punctuational cue.
7. The method of claim 6 wherein the set of nonstructural surface cues further includes at least a one of a lexical cue, a string recognizable constructional cue, a formulae cue and a deviation cue.
8. The method of claim 6 wherein the set of facets includes at least a one of a date facet, a narrative facet, a suasive facet, a fiction facet, a legal fact, a science and technical facet, and an author facet.
9. The method of claim 2 wherein the first text genre is a one of a press report genre, an Email genre, an editorial opinion genre, and a market analysis genre.
10. The method of claim 3 wherein the first text genre is a one of a press report genre, an Email genre, an editorial opinion genre, and a market analysis genre.

11. An article of manufacture comprising:
- a) a memory; and
  
  b) instructions stored in the memory for a method of searching a heterogeneous corpus of untagged machine-readable texts, each text of the corpus having a text genre and a topic, the corpus including at least a first text genre and a second text genre, the corpus including a multiplicity of topics, the method being implemented by a processor coupled to the memory, the method comprising the steps of;
  
  1) searching the corpus for a first multiplicity of texts that have a first topic;
  
  2) identifying a first set of texts of the first multiplicity of untagged texts that are instances of the first text genre;
  
  3) identifying the first set of texts to a computer user.

12. A processor implemented method of searching a heterogeneous corpus of untagged machine-readable texts, each text of the corpus having a text genre and a topic, the corpus including a first multiplicity of text genres and a second multiplicity of topics, the processor implemented method comprising the steps of:
- a) receiving from a computer user a search request for texts having a first topic and a first text genre, the search request also identifying a second text genre to be excluded;
  
  b) identifying a third multiplicity of untagged texts of the corpus having the first topic;
  
  c) determining a text genre of each text of the third multiplicity of untagged texts; and
  
  d) identifying to the computer user those texts of the third multiplicity that are instances of the first text genre and not identifying any text of the third multiplicity that are instances of the second text genre.

13. An article of manufacture comprising:
- a) a memory; and
  
  b) instructions stored in the memory for a method of searching a heterogeneous corpus of untagged machine-readable texts, each text of the corpus having a text genre and a topic, the corpus including a first multiplicity of text genres and a second multiplicity of topics, the method being implemented by a processor coupled to the memory, the method comprising the steps of;
  
  1) receiving from a computer user a search request for texts having a first topic and a first text genre, the search request also identifying a second text genre to be excluded;
  
  2) identifying a third multiplicity of untagged texts of the corpus having the first topic;
  
  3) determining a text genre of each text of the third multiplicity of untagged texts; and
  
  4) identifying to the computer user those texts of the third multiplicity of untagged texts that are instances of the first text genre and not identifying any text of the third multiplicity of untagged texts that are instances of the second text genre.
- View Dependent Claims (14, 15, 16, 17, 18)
- - 14. The article of claim 13, wherein the step b3) comprises the substeps of:
15. The article of claim 13 wherein step b3) comprises the substeps of:
- A) for each text of the third multiplicity of untagged texts generating a cue vector from the text, the cue vector representing occurrences in the text of a first set of nonstructural, surface cues;
  
  B) for each text of the third multiplicity of untagged texts determining a relevancy to the text of each face of a second set of facets using the cue vector and a weighting vector associated with each facet, and C) for each text of the third multiplicity of untagged texts identifying relevant text genres from a third set of text genres based upon the facets relevant to the text.
16. The article of claim 14 wherein the first set of cues includes at least a one of either a punctuational cue, a lexical cue, a string recognizable constructional cue, a formulae cue and a deviation cue.
17. The article of claim 15 wherein the second set of facets includes at least a one of either a date facet, a narrative facet, a suasive facet, a fiction facet, a legal fact, a science and technical facet, and an author facet.
18. The article of claim 13 wherein the third set of text genres includes at least a one of either a press report genre, an Email genre, an editorial opinion genre, and a market analysis genre.

19. An article of manufacture comprising:
- a) a memory; and
  
  b) instructions stored in the memory for a method of searching a heterogeneous corpus of untagged machine-readable texts, each text of the corpus having a text genre and a topic, the corpus including a first multiplicity of text genres and a second multiplicity of topics, the method being implemented by a processor coupled to the memory, the method comprising the steps of;
  
  1) receiving from a computer user a search request for texts having a first topic and a first text genre to be excluded;
  
  2) identifying a third multiplicity of untagged texts of the corpus having the first topic;
  
  3) determining a text genre of each text of the third multiplicity of untagged texts; and
  
  4) identifying to the computer user those texts of the third multiplicity of untagged texts that have a text genre other than the first text genre.

20. An article of manufacture comprising:
- a) a memory; and
  
  b) instructions stored in the memory for a method of searching a heterogeneous corpus of untagged machine-readable texts, each text of the corpus having a topic and a facet value for each facet of a first multiplicity of facets, the corpus including a second multiplicity of topics, the method being implemented by a processor coupled to the memory, the method comprising the steps of;
  
  1) receiving from a computer user a search request for texts having a first topic and a first value of a first facet of the first multiplicity of facets;
  
  2) identifying a third multiplicity of untagged texts of the corpus having the first topic;
  
  3) for each text of the third multiplicity of untagged texts determining for a value of the first facet; and
  
  4) identifying to the computer user those texts of the third multiplicity of untagged texts that have the first value of the first facet.
- View Dependent Claims (21, 22)
- - 21. The article of claim 20 wherein the request of step b1) further includes requesting exclusion of texts having a second value of a second facet of the first multiplicity of facets.
  - 22. The article of claim 20 wherein the first multiplicity of facets includes at least a one of a date facet, a narrative facet, a suasive facet, a fiction facet, a legal fact, a science and technical facet, and an author facet.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Xerox Corporation (Xerox Holdings Corp.)
Original Assignee
Xerox Corporation (Xerox Holdings Corp.)
Inventors
Nunberg, Geoffrey D., Kessler, Brett L., Pedersen, Jan O., Schuetze, Hinrich
Primary Examiner(s)
EDOUARD, PATRICK NESTOR

Application Number

US09/100,201
Publication Number

US 20020002450A1
Time in Patent Office

1,664 Days
Field of Search

704/1, 704/8, 704/9, 704/10, 707/2, 707/3, 707/4, 707/5, 707/6, 707/7, 707/104, 707/531, 707/532, 707/533, 707/536
US Class Current

704/1
CPC Class Codes

G06F 16/35   Clustering; Classification

G06F 16/353   into predefined classes

G06F 40/211   Syntactic parsing, e.g. bas...

G06F 40/284   Lexical analysis, e.g. toke...

G06F 40/289   Phrasal analysis, e.g. fini...

G06F 40/30   Semantic analysis

Article and method of automatically filtering information retrieval results using test genre

First Claim

7 Assignments

0 Petitions

Accused Products

Abstract

39 Citations

22 Claims

Specification

Solutions

Use Cases

Quick Links

Article and method of automatically filtering information retrieval results using test genre

First Claim

7 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

39 Citations

22 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links