Systems and methods for searching and indexing documents comprising chemical information
First Claim
1. A method for searching a set of documents comprising chemical information, the method comprising:
- (a) receiving, by a processor of a computing device, a user query comprising user-input chemical structure data and text data, wherein the text data and the user-input chemical structure data correspond to at least one chemical structure;
(b) identifying, by the processor, bit-screening data and connection data from the user-input chemical structure data, wherein the bit-screening data correspond to one or more constituent elements of the at least one chemical structure, and the connection data correspond to one or more connections between a plurality of the one or more constituent elements;
(c) augmenting, by the processor, the user query by generating one or more string tags based on at least a portion of the bit-screening data, such that the augmented user query comprises the one or more string tags, wherein the one or more string tags comprise a sequence of alphanumeric characters for describing the at least one chemical structure;
(d) querying, using a text-based search method, by the processor, a database comprising document data corresponding to the set of documents, wherein the set of documents comprises at least one graphical representation of the at least one chemical structure, the document data comprises at least one string tag that is generated based on the at least one graphical representation of the at least one chemical structure in the set of documents, and the querying of the database comprises correlating at least a portion of the one or more string tags of the augmented user query with the at least one string tag of the document data to generate one or more text-based search results; and
(e) outputting, by the processor, the one or more text-based search results.
2 Assignments
0 Petitions
Accused Products
Abstract
Described herein are systems and methods that efficiently search for documents related to chemical structures of interest to a user. In certain embodiments, text data and chemical structure data provided in a user query are simultaneously searched with a text-based search method to efficiently produce search results. Subsequent structure-based searching on the results of the text-based search produces precise results for a particular user query. This approach increases the speed of the structure-based search by reducing the amount of data the structure-based search searches over. Additionally described herein are systems and methods for indexing document data in order to facilitate this efficient searching.
136 Citations
21 Claims
-
1. A method for searching a set of documents comprising chemical information, the method comprising:
-
(a) receiving, by a processor of a computing device, a user query comprising user-input chemical structure data and text data, wherein the text data and the user-input chemical structure data correspond to at least one chemical structure; (b) identifying, by the processor, bit-screening data and connection data from the user-input chemical structure data, wherein the bit-screening data correspond to one or more constituent elements of the at least one chemical structure, and the connection data correspond to one or more connections between a plurality of the one or more constituent elements; (c) augmenting, by the processor, the user query by generating one or more string tags based on at least a portion of the bit-screening data, such that the augmented user query comprises the one or more string tags, wherein the one or more string tags comprise a sequence of alphanumeric characters for describing the at least one chemical structure; (d) querying, using a text-based search method, by the processor, a database comprising document data corresponding to the set of documents, wherein the set of documents comprises at least one graphical representation of the at least one chemical structure, the document data comprises at least one string tag that is generated based on the at least one graphical representation of the at least one chemical structure in the set of documents, and the querying of the database comprises correlating at least a portion of the one or more string tags of the augmented user query with the at least one string tag of the document data to generate one or more text-based search results; and (e) outputting, by the processor, the one or more text-based search results. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A method for searching a set of documents comprising chemical information, the method comprising:
-
(a) receiving, by a processor of a computing device, a user query comprising user-input chemical structure data, wherein the user-input chemical structure data correspond to at least one chemical structure; (b) identifying, by the processor, bit-screening data and connection data from the user-input chemical structure data, wherein the bit-screening data correspond to one or more constituent elements of the at least one chemical structure, and the connection data correspond to one or more connections between a plurality of the one or more constituent elements; (c) augmenting, by the processor, the user query by generating one or more string tags based on at least a portion of the bit-screening data and, optionally, generating one or more encoded strings based on at least a portion of the connection data, such that the augmented user query comprises the one or more string tags, wherein the one or more string tags comprise a sequence of alphanumeric characters for describing the at least one chemical structure; (d) querying, using a text-based search method, by the processor, a database comprising document data corresponding to the set of documents, wherein the set of documents comprises at least one graphical representation of the at least one chemical structure, the document data corresponding to the set of documents and comprising at least one string tag generated based on the at least one graphical representation of the at least one chemical structure in the set of documents, and the querying comprises correlating at least a portion of the one or more string tags of the augmented user query with the at least one string tag of the document data to generate one or more text-based search results; and (e) outputting, by the processor, the one or more text-based search results.
-
-
14. A method for text-based searching a set of indexed documents comprising chemical information, the method comprising the steps of:
-
(a) receiving, by a processor of a computing device, a user query comprising text data, wherein the text data comprise a sequence of alphanumeric characters that describe at least one chemical structure; (b) querying, using a text-based search method, by the processor, a database comprising document data corresponding to the set of indexed documents, the document data having been augmented to include one or more index string tags, wherein the set of indexed documents comprises at least one graphical representation of the at least one chemical structure, the one or more index string tags comprise at least one index string tag that is generated based on the at least one graphical representation of the at least one chemical structure in the set of indexed documents, the at least one index string tag comprises a sequence of alphanumeric characters for describing the at least one graphical representation of the at least one chemical structure, and the querying comprises correlating at least a portion of the text data of the user query with the one or more index string tags to generate one or more text-based search results; and (c) outputting, by the processor, the one or more text-based search results. - View Dependent Claims (15, 16, 17)
-
-
18. A system for searching a set of documents comprising chemical information, the system comprising:
-
a processor; and a non-transitory computer readable medium having instructions stored thereon, wherein the instructions, when executed by the processor, cause the processor to; (a) receive, by the processor, a user query comprising user-input chemical structure data and text data, wherein the text data and the user-input chemical structure data correspond to at least one chemical structure; (b) identify, by the processor, bit-screening data and connection data from the user-input chemical structure data, wherein the bit-screening data correspond to one or more constituent elements of the at least one chemical structure, and the connection data correspond to one or more connections between a plurality of the one or more constituent elements; (c) augment, by the processor, the user query by generating one or more string tags based on at least a portion of the bit-screening data, such that the augmented user query comprises the one or more string tags, wherein the one or more string tags comprise a sequence of alphanumeric characters for describing the at least one chemical structure; (d) query, using a text-based search method, by the processor, a database comprising document data corresponding to the set of documents, wherein the set of documents comprises at least one graphical representation of the at least one chemical structure, the document data corresponding to the set of documents and comprising at least one string tag that is generated based on the at least one graphical representation of the at least one chemical structure in the set of documents, and the querying comprises correlating at least a portion of the one or more string tags of the augmented user query with the at least one string tag of the document data to generate one or more text-based search results; and (e) output, by the processor, the one or more text-based search results. - View Dependent Claims (19)
-
-
20. A system for searching a set of documents comprising chemical information, the system comprising:
-
a processor; and a non-transitory computer readable medium having instructions stored thereon, wherein the instructions, when executed by the processor, cause the processor to; (a) receive, by the processor, a user query comprising user-input chemical structure data, wherein the user-input chemical structure data correspond to at least one chemical structure; (b) identify, by the processor, bit-screening data and connection data from the user-input chemical structure data, wherein the bit-screening data correspond to one or more constituent elements of the at least one chemical structure, and the connection data correspond to one or more connections between a plurality of the one or more constituent elements; (c) augment, by the processor, the user query by generating one or more string tags based on at least a portion of the bit-screening data and, generating one or more encoded strings based on at least a portion of the connection data, such that the augmented user query comprises the one or more string tags, wherein the one or more string tags comprise a sequence of alphanumeric characters for describing the at least one chemical structure; (d) query, using a text-based search method, by the processor, a database comprising document data corresponding to the set of documents, wherein the set of documents comprises at least one graphical representation of the at least one chemical structure, the document data corresponding to the set of documents and comprising at least one string that is generated based on the at least one graphical representation of the at least one chemical structure in the set of documents, and the querying comprises correlating at least a portion of the augmented user query with the at least one string tag of the document data to generate one or more text-based search results; and (e) output, by the processor, the one or more text-based search results. - View Dependent Claims (21)
-
Specification