Systems and methods of semantically annotating documents of different structures
First Claim
Patent Images
1. A computer-implemented method, comprising:
- at a computer having memory and one or more processors;
receiving one or more search keywords from a user;
selecting a plurality of candidate document identifiers in accordance with the one or more search keywords, each candidate document identifier corresponding to a respective document at a respective data source;
for a respective candidate document identifier of the plurality of candidate document identifiers;
retrieving a document corresponding to the respective candidate document identifier from a data source, wherein the document has a structure type;
converting the document into a node stream, wherein the document conversion is initiated immediately after retrieving a portion of the document;
generating a customized data model for the document using the node stream in accordance with the structure type of the document;
identifying one or more candidate chunks within the customized data model in accordance with a set of heuristic rules associated with the structure type; and
selecting one or more chunks of the candidate chunks that satisfy the one or more search keywords; and
providing at least one of the selected one or more chunks for display to the user.
6 Assignments
0 Petitions
Accused Products
Abstract
A computer retrieves a document from a data source, wherein the document has a structure type. The computer generates a customized data model for the document in accordance with its structure type. The computer identifies one or more candidate chunks within the customized data model in accordance with a set of heuristic rules associated with the structure type.
-
Citations
33 Claims
-
1. A computer-implemented method, comprising:
at a computer having memory and one or more processors; receiving one or more search keywords from a user; selecting a plurality of candidate document identifiers in accordance with the one or more search keywords, each candidate document identifier corresponding to a respective document at a respective data source; for a respective candidate document identifier of the plurality of candidate document identifiers; retrieving a document corresponding to the respective candidate document identifier from a data source, wherein the document has a structure type; converting the document into a node stream, wherein the document conversion is initiated immediately after retrieving a portion of the document; generating a customized data model for the document using the node stream in accordance with the structure type of the document; identifying one or more candidate chunks within the customized data model in accordance with a set of heuristic rules associated with the structure type; and selecting one or more chunks of the candidate chunks that satisfy the one or more search keywords; and providing at least one of the selected one or more chunks for display to the user. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
12. A computer system, comprising:
-
memory; one or more processors; one or more programs stored in the memory and configured for execution by the one or more processors, the one or more programs including instructions for; receiving one or more search keywords from a user; selecting a plurality of candidate document identifiers in accordance with the one or more search keywords, each candidate document identifier corresponding to a respective document at a respective data source; for a respective candidate document identifier of the plurality of candidate document identifiers; retrieving a document corresponding to the respective candidate document identifier from a data source, wherein the document has a structure type; converting the document into a node stream, wherein the document conversion is initiated immediately after retrieving a portion of the document; generating a customized data model for the document using the node stream in accordance with the structure type of the document; identifying one or more candidate chunks within the customized data model in accordance with a set of heuristic rules associated with the structure type; and selecting one or more chunks of the candidate chunks that satisfy the one or more search keywords; and providing at least one of the selected one or more chunks for display to the user. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
-
-
23. A non-transitory computer readable storage medium having stored therein instructions, which when executed by a computer system cause the computer system to:
-
receive one or more search keywords from a user; select a plurality of candidate document identifiers in accordance with the one or more search keywords, each candidate document identifier corresponding to a respective document at a respective data source; for a respective candidate document identifier of the plurality of candidate document identifiers; retrieve a document corresponding to the respective candidate document identifier from a data source, wherein the document has a structure type; convert the document into a node stream, wherein the document conversion is initiated immediately after retrieving a predefined portion of the document; generate a customized data model for the document using the node stream in accordance with the structure type of the document; identify one or more candidate chunks within the customized data model in accordance with a set of heuristic rules associated with the structure type; and select one or more chunks of the candidate chunks that satisfy the one or more search keywords; and provide at least one of the selected one or more chunks for display to the user. - View Dependent Claims (24, 25, 26, 27, 28, 29, 30, 31, 32, 33)
-
Specification