Automated search

US 9,727,619 B1
Filed: 05/02/2014
Issued: 08/08/2017
Est. Priority Date: 05/02/2013
Status: Expired due to Fees

First Claim

Patent Images

1. A computing device comprising:

one or more processors; and

a non-transitory, computer-readable medium storing programming that is executable by the one or more processors, the programming comprising instructions to;

receive an input data set comprising a document;

determine at least one focus in the input data set, wherein the focus is at least one of a grammatical part of speech or a functional descriptor, and wherein the focus is a portion of the input data set less than the input data set;

form a term unit matrix from the input data set, the term unit matrix comprising a plurality of term units represented as a plurality of numeric integer values, wherein the term unit matrix is a substantially canonical representation of the input data set;

filter the plurality of term units by removing one or more term units from the plurality of term units based on the focus;

for term units that remain after filtering, form a group of remaining term units based on an underlying grammatical rule of the input data set, wherein for each term unit of the group of remaining term units, the underlying grammatical rule is numerically encoded in respective numeric integer values of the remaining term units;

identify at least one root term unit of the group of remaining term units, the at least one root term unit having a plurality of tail term units associated therewith;

search a data repository that is different from the input data set using the at least one root term unit and the plurality of tail term units;

organize search results based on the focus indicating presence of the at least one root term unit; and

display the organized search results.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Embodiments described herein are used to automatically generate a list of searchable terms from any text set, as text found in a repository of information, that then can be used in a variety of applications, from providing search results, to analyzing data sets, to building a variety of text generation tools, such as messaging and emails.

49 Citations

20 Claims

1. A computing device comprising:
- one or more processors; and
  
  a non-transitory, computer-readable medium storing programming that is executable by the one or more processors, the programming comprising instructions to;
  
  receive an input data set comprising a document;
  
  determine at least one focus in the input data set, wherein the focus is at least one of a grammatical part of speech or a functional descriptor, and wherein the focus is a portion of the input data set less than the input data set;
  
  form a term unit matrix from the input data set, the term unit matrix comprising a plurality of term units represented as a plurality of numeric integer values, wherein the term unit matrix is a substantially canonical representation of the input data set;
  
  filter the plurality of term units by removing one or more term units from the plurality of term units based on the focus;
  
  for term units that remain after filtering, form a group of remaining term units based on an underlying grammatical rule of the input data set, wherein for each term unit of the group of remaining term units, the underlying grammatical rule is numerically encoded in respective numeric integer values of the remaining term units;
  
  identify at least one root term unit of the group of remaining term units, the at least one root term unit having a plurality of tail term units associated therewith;
  
  search a data repository that is different from the input data set using the at least one root term unit and the plurality of tail term units;
  
  organize search results based on the focus indicating presence of the at least one root term unit; and
  
  display the organized search results.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The computing device of claim 1, wherein the input data set is a text set comprising at least one portion of at least one message, at least one portion of at least one document, at least one portion of at least one email, at least one portion of at least one file, or combinations thereof.
  - 3. The computing device of claim 1, further comprising programming instructions to determine the focus from the plurality of term units using at least one of a characteristic of the input data set or a user selection.
  - 4. The computing device of claim 1, wherein the computing device comprises at least one of a computer, a laptop computer, a personal computer, a personal data assistant, a camera, a phone, a cell phone, mobile phone, a computer server, a media server, a music player, a game box, a smart phone, a data storage device, a measuring device, a handheld scanner, a scanning device, a barcode reader, a point-of-sale device, a digital assistant, a desk phone, an IP phone, a solid-state memory device, a tablet, or a memory card.
  - 5. The computing device of claim 1, wherein filtering the term units comprises using a topical filter based on a plurality of topics of a data repository.
  - 6. The computing device of claim 1, wherein steps of receiving, determining, parsing, filtering, forming, identifying, searching, organizing, and displaying are performed automatically.
  - 7. The computing device of claim 1, wherein displaying comprises providing organized search results to a human-machine interface or a program application.
  - 8. The computing device of claim 1, wherein the group of remaining term units are constrained by grammatical function within the input data set.
  - 9. The computing device of claim 1, wherein the underlying grammatical rule of the input data set is based on a human language.
  - 10. The computing device of claim 1, wherein search results are organized by at least one of term unit frequency, term unit positive mentions, term unit sentiment analysis, or term unit grammatical weight.

11. A method comprising:
- receiving, by a computing device, an input data set comprising a document;
  
  determining, by the computing device, at least one focus in the input data set, wherein the focus is at least one of a grammatical part of speech or a functional descriptor, and the focus is a portion of the input data set less than the input data set;
  
  forming, by the computing device, a term unit matrix from the input data set, the term unit matrix comprising a plurality of numeric integer values, the plurality of numeric integer values corresponding to a plurality of term units of the input data set, wherein the plurality of numeric integer values is a substantially lossless representation of the input data set;
  
  filtering, by the computing device, the plurality of term units by removing one or more term units from the plurality of term units based on the focus;
  
  forming, by the computing device, a group of combinations of term units that remain after filtering and that are based on an underlying grammatical rule of the input data set, wherein for each term unit of the group of combinations of term units, the underlying grammatical rule is numerically encoded in respective numeric integer values of the group of combinations of term units;
  
  identifying, by the computing device, at least one root term unit of the group of combinations of term units that remain after filtering, the at least one root term unit having a plurality of tail term units associated therewith;
  
  searching, by the computing device, a data repository that is different from the input data set using the at least one root term unit and the plurality of tail term units;
  
  organizing, by the computing device, search results based on the focus indicating presence of the at least one root term unit; and
  
  providing, by the computing device, the organized search results.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19)
- - 12. The method of claim 11, wherein the input data set is a textual data set comprising at least a portion of at least one human language message, at least a portion of at least one human language document, at least a portion of at least one human language email, at least a portion of at least one file comprising human language text, or combinations thereof.
  - 13. The method of claim 12, wherein the group of combinations of term units that remain after filtering is constrained by grammatical function within the input data set, and the underlying grammatical rule of the input data set is based on a human language.
  - 14. The method of claim 13, wherein search results are organized by at least one of term unit frequency, term unit positive mentions, term unit sentiment analysis, or term unit grammatical weight.
  - 15. The method of claim 11, further comprising determining, by the computing device, the focus from the plurality of term units using at least one of a characteristic of the input data set or a user selection.
  - 16. The method of claim 15, wherein filtering the term units comprises using a topical filter based on a plurality of topics of a data repository.
  - 17. The method of claim 16, wherein steps of receiving, determining, parsing, filtering, forming, identifying, searching, organizing, and providing are performed substantially automatically.
  - 18. The method of claim 11, wherein providing comprises displaying organized search results to a human-machine interface.
  - 19. The method of claim 11, wherein the computing device comprises at least one of a computer, a laptop computer, a personal computer, a personal data assistant, a camera, a phone, a cell phone, mobile phone, a computer server, a media server, a music player, a game box, a smart phone, a data storage device, a measuring device, a handheld scanner, a scanning device, a barcode reader, a point-of-sale device, a digital assistant, a desk phone, an IP phone, a solid-state memory device, a tablet, or a memory card.

20. A method comprising:
- receiving, by a computing device, an input data set comprising human language text;
  
  determining, by the computing device, a focus of the input data set, wherein the focus is at least one of a grammatical part of speech or a functional descriptor;
  
  forming, by the computing device, a term unit matrix from the input data set, the term unit matrix comprising a plurality of term units of the input data set, wherein the term unit matrix is represented as a plurality of numeric integer values, and the plurality of numeric integer values is a substantially lossless representation of the input data set;
  
  filtering, by the computing device, the plurality of term units by removing one or more term units from the plurality of term units based on the focus and by using a topical filter based on a plurality of topics;
  
  forming, by the computing device, a group of combinations of term units that remain after filtering that are based on an underlying grammatical rule of the input data set, the underlying grammatical rule based on a human language represented in at least a portion of the input data set;
  
  identifying, by the computing device, at least one root term unit of the group of combinations of term units that remain after filtering, the at least one root term unit having a plurality of associated term units;
  
  searching, by the computing device, a data repository using the at least one root term unit and the plurality of associated term units;
  
  organizing, by the computing device, search results based on the focus indicating presence of the at least one root term unit; and
  
  displaying, by the computing device, the organized search results on a human-machine interface, wherein an ontology of organization comprises a visual spectrum indicating relevance.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Intelligent Language, LLC
Original Assignee
Intelligent Language, LLC
Inventors
Smyros, Athena Ann, Smyros, Constantine John
Primary Examiner(s)
Spooner, Lamont

Application Number

US14/268,983
Time in Patent Office

1,194 Days
Field of Search

704 1, 704 9, 704 10, 707706-711
US Class Current
CPC Class Codes

G06F 16/24578   using ranking

G06F 40/205   Parsing

G06F 40/211   Syntactic parsing, e.g. bas...

G06F 40/253   Grammatical analysis; Style...

G06F 40/279   Recognition of textual enti...

G06F 40/284   Lexical analysis, e.g. toke...

G06F 40/289   Phrasal analysis, e.g. fini...

Automated search

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

49 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Automated search

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

49 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links