×

Determining section information of a digital volume

  • US 8,549,008 B1
  • Filed: 11/12/2008
  • Issued: 10/01/2013
  • Est. Priority Date: 11/13/2007
  • Status: Active Grant
First Claim
Patent Images

1. A computer-implemented method for determining section information of a digital volume, the method comprising:

  • determining pages of the digital volume containing a table of contents by applying a classifier to the digital volume, the classifier adapted to use machine learning to recognize references to sections in a body of the digital volume and identify pages of the digital volume containing the section references as the pages containing the table of contents;

    and generating a score estimating an accuracy of a classification of a page as containing the table of contents;

    extracting phrases from the table of contents of the digital volume;

    identifying matching phrases in the body of the digital volume, the matching phrases at least approximately matching the extracted phrases;

    determining best matching phrases from the identified matching phrases, the best matching phrases comprising a matching phrase corresponding to each extracted phrase, the determining based at least in part on the ordering of the extracted phrases and the identified matching phrases;

    generating section information, the section information comprising section headings and section start locations, the section headings comprising the best matching phrases, and the section start locations indicating starting locations of the sections in the digital volume, the section start locations comprising the locations of the best matching phrases in the digital volume; and

    storing the section information.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×