SEMANTIC CONTENT SEARCHING
First Claim
1. A computer-based method for document searching by semantic content, comprising:
- receiving an end user selection of a desired first portion of an initial document from a database comprising potential target documents, the initial document comprising metadata labels that describe attributes of components of the initial document, the selected first portion comprising components of the initial document that have desired semantic content;
running the initial document, comprising the selected first portion, through one or more trained classifiers, using a computer-based processor, to identify a first potential target document from the database comprising a second portion that has a same semantic content as the first portion; and
if the second portion does not have the same semantic content as the first portion;
receiving an end user selection of a third portion of the first potential target document that comprises the same semantic content as the first portion; and
running the first potential target document, comprising the selected third portion, through the one or more trained classifiers to identify a second potential target document from the database comprising a fourth portion that has the same semantic content as the third portion.
2 Assignments
0 Petitions
Accused Products
Abstract
One or more techniques and/or systems are disclosed that provide for document retrieval where a user can identify key attributes of potential target documents that are desirable (e.g., have a particular semantic content for the user). Further, relevant documents that comprise the desired semantic content can be retrieved. Additionally, the user can provide feedback on the retrieved documents, for example, based on key semantic concepts found in the documents, and the input can be used to update the classification. For example, this process can be iterated to improve the retrieval and precision of documents found through machine learning techniques.
201 Citations
20 Claims
-
1. A computer-based method for document searching by semantic content, comprising:
-
receiving an end user selection of a desired first portion of an initial document from a database comprising potential target documents, the initial document comprising metadata labels that describe attributes of components of the initial document, the selected first portion comprising components of the initial document that have desired semantic content; running the initial document, comprising the selected first portion, through one or more trained classifiers, using a computer-based processor, to identify a first potential target document from the database comprising a second portion that has a same semantic content as the first portion; and if the second portion does not have the same semantic content as the first portion; receiving an end user selection of a third portion of the first potential target document that comprises the same semantic content as the first portion; and running the first potential target document, comprising the selected third portion, through the one or more trained classifiers to identify a second potential target document from the database comprising a fourth portion that has the same semantic content as the third portion. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A system for document searching by semantic content, comprising:
-
a memory component configured to store a database comprising a plurality of potential target documents; a processor component operably coupled with the memory component and configured to execute instructions for one or more classifiers; an end-user input receiving component configured to receive end-user input for a document, the input comprising one or more of; end user selection of a desired portion of a first document from the database, the selected portion comprising document components of the initial document that have desired semantic content; end user indication that a second document retrieved from the database comprises a same semantic content as the selected desired portion of the first document; and end user indication that a second document retrieved from the database does not comprise a same semantic content as the selected desired portion of the first document; one or more classifier components operably coupled with the processor component and memory component, and configured to identify a second document from the database comprising a target portion that has a same semantic content as the selected desired portion of the first document; and a classification updating component operably coupled with the end-user input receiving component and configured to utilize the end-user input to update the one or more classifier components to identify desired semantic content. - View Dependent Claims (15, 16, 17, 18, 19)
-
-
20. A method for document searching by semantic content, comprising:
-
populating a database with potential target documents by performing a keyword search on a collection of documents receiving an end user selection of a desired first portion of an initial document from the database comprising potential target documents, the initial document comprising metadata labels that describe attributes of components of the initial document, the selected first portion comprising components of the initial document that have desired semantic content; running the initial document, comprising the selected first portion, through one or more trained classifiers, using a computer-based processor, to identify a first potential target document from the database comprising a second portion that has a same semantic content as the first portion; if the second portion does not have the same semantic content as the first portion; receiving an end user selection of a third portion of the first potential target document that comprises the same semantic content as the first portion; and running the first potential target document, comprising the selected third portion, through the one or more trained classifiers to identify a second potential target document from the database comprising a fourth portion that has the same semantic content as the third portion; if the second portion has the same semantic content as the first portion; receiving an end user indication that the second portion is a correct selection; and running the one or more classifiers over the database to select a third potential target document; if the second portion does not have the same semantic content as the first portion and the first potential target document does not comprise content that has the same semantic content of the first portion; receiving an end user indication that the first potential target document does not comprise content that has the same semantic content of the first portion; and running the one or more classifiers over the database to select a third potential target document; utilizing user input for potential target documents returned by the one or more classifiers to update the one or more classifiers; and running the plurality of documents through the one or more classifiers until a desired threshold is reached comprising one of; running the plurality of documents through the one or more classifiers until a desired document selection accuracy is reached for desired semantic content; running the plurality of documents through the one or more classifiers until a desired number of correct documents are retrieved without an incorrect document being retrieved; and running the plurality of documents through the one or more classifiers until a desired number of documents have been retrieved from the database; identifying which of the one or more classifiers has a desired accuracy rate for retrieving potential target documents for a desired semantic content; and utilizing the identified classifier to retrieve documents from the database for the desired semantic content.
-
Specification