Method for extracting referential keys from a document

US 8,060,511 B2
Filed: 07/12/2010
Issued: 11/15/2011
Est. Priority Date: 04/30/2004
Status: Active Grant

First Claim

Patent Images

1. A method of information searching in a document image derived from a scanner, the method comprising:

defining a key type of a referential key based on at least one type of contextual indicator of a plurality of contextual indicator types that is present in the document image;

parsing successive portions of the document image to locate a first type of contextual indicator of the plurality of contextual indicator types, wherein locating the first type of contextual indicator identifies a referential key within the document image;

identifying at least one portion of the document image that includes the located first type of contextual indicator;

determining if the located first type of contextual indicator is determinative of the defined key type of the referential key, without knowledge of text contained within the portion of the document image that includes the located first type of contextual indicator;

extracting characters from the referential key if the located first type of contextual indicator is determinative of the defined key type of the referential key;

parsing the portion of the document image that includes the located first type of contextual indicator to locate a second type of contextual indicator of the plurality of contextual indicator types if the located first type of contextual indicator is not determinative of the defined key type of the referential key;

determining that a combination of the located first type of contextual indicator and the located second type of contextual indicator located in the document image is determinative of the defined key type of the referential key; and

extracting characters from the referential key in response to the determining that the combination of the located first type of contextual indicator and the located second type of contextual indicator located in the document image is determinative of the defined key type of the referential key.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods, computer-readable media, and systems for extracting referential keys from a document are provided. A document is parsed to identify at least one key, the key being identified from at least one contextual indication. The key is classified according to a key type, the key type being identified from the contextual indication. The key is extracted and then stored in a location in a structured shell with the location corresponding to the key type. As a result, the key can be found by a search seeking one of the key and the key type allowing a searcher to identify the document from which the key was extracted.

Citations

11 Claims

1. A method of information searching in a document image derived from a scanner, the method comprising:
- defining a key type of a referential key based on at least one type of contextual indicator of a plurality of contextual indicator types that is present in the document image;
  
  parsing successive portions of the document image to locate a first type of contextual indicator of the plurality of contextual indicator types, wherein locating the first type of contextual indicator identifies a referential key within the document image;
  
  identifying at least one portion of the document image that includes the located first type of contextual indicator;
  
  determining if the located first type of contextual indicator is determinative of the defined key type of the referential key, without knowledge of text contained within the portion of the document image that includes the located first type of contextual indicator;
  
  extracting characters from the referential key if the located first type of contextual indicator is determinative of the defined key type of the referential key;
  
  parsing the portion of the document image that includes the located first type of contextual indicator to locate a second type of contextual indicator of the plurality of contextual indicator types if the located first type of contextual indicator is not determinative of the defined key type of the referential key;
  
  determining that a combination of the located first type of contextual indicator and the located second type of contextual indicator located in the document image is determinative of the defined key type of the referential key; and
  
  extracting characters from the referential key in response to the determining that the combination of the located first type of contextual indicator and the located second type of contextual indicator located in the document image is determinative of the defined key type of the referential key.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. The method of claim 1, wherein the defined key type of the referential key comprises at least one of a title, a header, a footer, a document type, a document identifier, a subject identifier, a section identifier, a chapter identifier and a page number.
  - 3. The method of claim 1, wherein the plurality of contextual indicator types comprises at least one of a placement indicator, a format indicator and a font indicator.
  - 4. The method of claim 3, wherein the placement indicator includes a page position.
  - 5. The method of claim 3, wherein the format indicator comprises a character pattern that includes at least one of a pattern of digits and a pattern of separators.
  - 6. The method of claim 3, wherein the font indicator comprises at least one of a typeface, a boldface, an underlined text portion and a font size.
  - 7. The method of claim 1, wherein the extracting the characters from the referential key comprises extracting the characters using an optical character recognition routine to scan the referential key after at least one of locating the first type of contextual indicator, and locating the combination of the located first type of contextual indicator and the located second type of contextual indicator, is determined to be determinative of the defined key type of the referential key.
  - 8. The method of claim 1 further comprising storing the defined key type of referential key, a key location of the referential key within the document image, and the characters extracted from the referential key in a structured format according to the defined key type, the characters extracted, and the key location.
  - 9. The method of claim 8, wherein the storing in the structured format comprises storing in a format that allows navigation to, from, and within the document image.
  - 10. The method of claim 8, wherein the storing in the structured format comprises storing in an extensible markup language (XML) document.
  - 11. The method of claim 1, wherein the defining the key type of the referential key based on the at least one type of contextual indicator comprises defining the key type of the referential key based on the at least one type of contextual indicator derived from knowledge including at least one of a general knowledge, a tribal knowledge, and a specific document format rule.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
The Boeing Co.
Original Assignee
The Boeing Co.
Inventors
Hadley, Brent L., Chew, Susan C., Eames, Patrick J.
Primary Examiner(s)
Pham; Hung Q
Assistant Examiner(s)
CHEUNG, HUBERT G

Application Number

US12/834,530
Publication Number

US 20100316301A1
Time in Patent Office

491 Days
Field of Search

None
US Class Current

707/736
CPC Class Codes

G06F 16/35   Clustering; Classification

G06F 16/355   Class or cluster creation o...

G06F 16/93   Document management systems

Method for extracting referential keys from a document

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

11 Claims

Specification

Solutions

Use Cases

Quick Links

Method for extracting referential keys from a document

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

11 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links