Automated search
First Claim
Patent Images
1. A computing device comprising:
- one or more processors; and
a non-transitory, computer-readable medium storing programming that is executable by the one or more processors, the programming comprising instructions to;
receive an input data set comprising a document;
determine at least one focus in the input data set, wherein the focus is at least one of a grammatical part of speech or a functional descriptor, and wherein the focus is a portion of the input data set less than the input data set;
form a term unit matrix from the input data set, the term unit matrix comprising a plurality of term units represented as a plurality of numeric integer values, wherein the term unit matrix is a substantially canonical representation of the input data set;
filter the plurality of term units by removing one or more term units from the plurality of term units based on the focus;
for term units that remain after filtering, form a group of remaining term units based on an underlying grammatical rule of the input data set, wherein for each term unit of the group of remaining term units, the underlying grammatical rule is numerically encoded in respective numeric integer values of the remaining term units;
identify at least one root term unit of the group of remaining term units, the at least one root term unit having a plurality of tail term units associated therewith;
search a data repository that is different from the input data set using the at least one root term unit and the plurality of tail term units;
organize search results based on the focus indicating presence of the at least one root term unit; and
display the organized search results.
1 Assignment
0 Petitions
Accused Products
Abstract
Embodiments described herein are used to automatically generate a list of searchable terms from any text set, as text found in a repository of information, that then can be used in a variety of applications, from providing search results, to analyzing data sets, to building a variety of text generation tools, such as messaging and emails.
49 Citations
20 Claims
-
1. A computing device comprising:
-
one or more processors; and a non-transitory, computer-readable medium storing programming that is executable by the one or more processors, the programming comprising instructions to; receive an input data set comprising a document; determine at least one focus in the input data set, wherein the focus is at least one of a grammatical part of speech or a functional descriptor, and wherein the focus is a portion of the input data set less than the input data set; form a term unit matrix from the input data set, the term unit matrix comprising a plurality of term units represented as a plurality of numeric integer values, wherein the term unit matrix is a substantially canonical representation of the input data set; filter the plurality of term units by removing one or more term units from the plurality of term units based on the focus; for term units that remain after filtering, form a group of remaining term units based on an underlying grammatical rule of the input data set, wherein for each term unit of the group of remaining term units, the underlying grammatical rule is numerically encoded in respective numeric integer values of the remaining term units; identify at least one root term unit of the group of remaining term units, the at least one root term unit having a plurality of tail term units associated therewith; search a data repository that is different from the input data set using the at least one root term unit and the plurality of tail term units; organize search results based on the focus indicating presence of the at least one root term unit; and display the organized search results. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A method comprising:
-
receiving, by a computing device, an input data set comprising a document; determining, by the computing device, at least one focus in the input data set, wherein the focus is at least one of a grammatical part of speech or a functional descriptor, and the focus is a portion of the input data set less than the input data set; forming, by the computing device, a term unit matrix from the input data set, the term unit matrix comprising a plurality of numeric integer values, the plurality of numeric integer values corresponding to a plurality of term units of the input data set, wherein the plurality of numeric integer values is a substantially lossless representation of the input data set; filtering, by the computing device, the plurality of term units by removing one or more term units from the plurality of term units based on the focus; forming, by the computing device, a group of combinations of term units that remain after filtering and that are based on an underlying grammatical rule of the input data set, wherein for each term unit of the group of combinations of term units, the underlying grammatical rule is numerically encoded in respective numeric integer values of the group of combinations of term units; identifying, by the computing device, at least one root term unit of the group of combinations of term units that remain after filtering, the at least one root term unit having a plurality of tail term units associated therewith; searching, by the computing device, a data repository that is different from the input data set using the at least one root term unit and the plurality of tail term units; organizing, by the computing device, search results based on the focus indicating presence of the at least one root term unit; and providing, by the computing device, the organized search results. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19)
-
-
20. A method comprising:
-
receiving, by a computing device, an input data set comprising human language text; determining, by the computing device, a focus of the input data set, wherein the focus is at least one of a grammatical part of speech or a functional descriptor; forming, by the computing device, a term unit matrix from the input data set, the term unit matrix comprising a plurality of term units of the input data set, wherein the term unit matrix is represented as a plurality of numeric integer values, and the plurality of numeric integer values is a substantially lossless representation of the input data set; filtering, by the computing device, the plurality of term units by removing one or more term units from the plurality of term units based on the focus and by using a topical filter based on a plurality of topics; forming, by the computing device, a group of combinations of term units that remain after filtering that are based on an underlying grammatical rule of the input data set, the underlying grammatical rule based on a human language represented in at least a portion of the input data set; identifying, by the computing device, at least one root term unit of the group of combinations of term units that remain after filtering, the at least one root term unit having a plurality of associated term units; searching, by the computing device, a data repository using the at least one root term unit and the plurality of associated term units; organizing, by the computing device, search results based on the focus indicating presence of the at least one root term unit; and displaying, by the computing device, the organized search results on a human-machine interface, wherein an ontology of organization comprises a visual spectrum indicating relevance.
-
Specification