Intelligent query system for automatically indexing information in a database and automatically categorizing users

US 5,974,412 A
Filed: 09/24/1997
Issued: 10/26/1999
Est. Priority Date: 09/24/1997
Status: Expired due to Term

First Claim

Patent Images

1. An evolutionary system for identifying information, comprising:

multiple information sets each representing a portion of the information;

multiple collators each independently deriving vector spaces from associated information sets and identifying concepts in the vector spaces; and

the multiple collators independently identifying information in the associated information sets according to the identified concepts in the vector spaces and competing against each other to identify relevant information in response to information queries.

View all claims

5 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An Intelligent Query Engine (IQE) system automatically develops multiple information spaces in which different types of real-world objects (e.g., documents, users, products) can be represented. Machine learning techniques are used to facilitate automated emergence of information spaces in which objects are represented as vectors of real numbers. The system then delivers information to users based upon similarity measures applied to the representation of the objects in these information spaces. The system simultaneously classifies documents, users, products, and other objects. Documents are managed by collators that act as classifiers of overlapping portions of the database of documents. Collators evolve to meet the demands for information delivery expressed by user feedback. Liaisons act on the behalf of users to elicit information from the population of collators. This information is then presented to users upon logging into the system via Internet or another communication channel. Mites handle incoming documents from multiple information sources (e.g., in-house editorial staff, third-party news feeds, large databases, World Wide Web spiders) and feed documents to those collators which provide a good fit for the new documents.

694 Citations

48 Claims

1. An evolutionary system for identifying information, comprising:
- multiple information sets each representing a portion of the information;
  
  multiple collators each independently deriving vector spaces from associated information sets and identifying concepts in the vector spaces; and
  
  the multiple collators independently identifying information in the associated information sets according to the identified concepts in the vector spaces and competing against each other to identify relevant information in response to information queries.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
- - 2. A system according to claim 1 wherein the multiple collators each independently modify the associated information sets according to the relevancy of the identified information to the queries.
  - 3. A system according to claim 1 wherein the multiple collators each include an associated goodness table ranking the similarity of information in the associated information sets to the concepts identified in the vector spaces.
  - 4. A system according to claim 3 including a feedback event table used in combination with the goodness table to selectively terminate individual collators determined to be poor providers of relevant information in response to information queries.
  - 5. A system according to claim 1 including a feedback event table that tracks user feedback on identified information.
  - 6. A system according to claim 1 wherein the collators each have the following stages:
    - a birth stage where the vector spaces for collators are created immaculately from a new set of information or created as offspring of existing collators;
      
      an adolescence stage where the collators respond to queries; and
      
      a maturity stage where the collators are evaluated as either fit for producing offspring by reverting back to the birth stage with a portion of the parenting collator'"'"'s information set, or evaluated as unfit for survival wherein the associated information set is released for use by other collators.
  - 7. A system according to claim 1 wherein the collators each include an associated centroid space identifying the concepts in the vector space.
  - 8. A system according to claim 1 wherein the collators each include an associated goodness space identifying the similarity of information in the associated information sets to the concepts identified in the vector space.
  - 9. A system according to claim 1 including a liaison that conducts the queries to the multiple collators on behalf of a user.
  - 10. A system according to claim 9 wherein the liaison conducts any one of the following queries:
    - a manual query wherein text from a user is converted into an index and broadcast to the collators, the collators then mapping the index into the associated vector spaces;
      
      a knowledge-based query wherein a knowledge-based system is used to convert queries from the user into an expert recommendations list that is mapped into the collators'"'"' associated vector spaces;
      
      a user query wherein a feedback event table associated with the user is mapped into the associated vector spaces of the collators;
      
      a type 1 social query wherein a feedback event table associated with the user is mapped into the associated vector spaces of the collators, the collators identifying a selected number of most similar feedback event tables for other users and then merging the user feedback event tables for the identified other users into the final information recommendations list;
      
      ora type 2 social query wherein profile data associated with the user is compared with profile data from other users and a set of feedback event tables for the other users with similar profile data is merged together forming a final information recommendations list.
  - 11. A system according to claim 1 including multiple grinders converting the information into indices stored in the multiple information sets.
  - 12. A system according to claim 11 including multiple slurpees each associated with a different source of information, the multiple slurpees converting each different source of information into a common format for the grinders.
  - 13. A system according to claim 1 including multiple mites each selectively transporting information to the different information sets according to the similarity of the information to the concepts identified in the vector spaces of the collators.
  - 14. A system according to claim 13 wherein the mites transport the information to the collators according to a goodness score generated by the collator indicating similarity of the information to the concepts in the associated vector space.

15. A method for identifying relevant information in an information source, comprising:
- converting different sets of information into different vector spaces;
  
  converting the vector spaces into associated centroid spaces that identify central concepts for the sets of information that comprise the vector spaces;
  
  independently identifying in each of the different centroid spaces the information clustered around the identified central concepts; and
  
  controlling genetic evolution for each of the vector spaces according to the similarity of the identified information to the central concepts.
- View Dependent Claims (16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29)
- - 16. A method according to claim 15 including generating an associated goodness space from each of the multiple centroid spaces that identifies how closely the information in the vector space matches the central concepts.
  - 17. A method according to claim 15 wherein controlling the vector spaces includes automatically deleting vector spaces that are unsuccessful over time in identifying information similar to the central concepts.
  - 18. A method according to claim 17 wherein controlling the vector spaces includes generating offspring for selected vector spaces that are successful over time in identifying information similar to the central concepts, the offspring comprising a subset of information in the selected vector spaces that most closely relate to the concepts identified in the associated vector spaces.
  - 19. A method according to claim 15 wherein identifying the information includes the following:
    - receiving information queries;
      
      mapping the information queries into the different vector spaces;
      
      identifying which concepts in the vector spaces map closest to the information queries;
      
      identifying the information closest to the identified concepts; and
      
      supplying the identified information to a user.
  - 20. A method according to claim 15 including selectively populating information into the vector spaces according to how similar the information is to the concepts identified in the associated vector spaces.
  - 21. A method according to claim 15 including conducting a manual query as follows:
    - converting text from a user into a query index;
      
      broadcasting the query index to the different vector spaces;
      
      mapping the query index into the different vector spaces;
      
      identifying a predetermined number of central concepts in each of the vector spaces most similar to the query index;
      
      identifying the information in each of the vector spaces within a predetermined distance of the identified closest central concepts;
      
      determining relevance scores representing semantic distances of the identified information from the query index;
      
      generating a query goodness score identifying how closely the query index relates to the central concepts for each of the vector spaces; and
      
      returning recommendations lists from each of the vector spaces containing the identified information, the relevance scores for the identified information, and the associated goodness score of the query index.
  - 22. A method according to claim 21 including merging the recommendations lists from each of the collators by weighting each of the relevance scores by the associated goodness score and averaging the weighted relevance scores for the same identified information from different vector spaces.
  - 23. A method according to claim 15 including conducting a knowledge-based query according to the following:
    - retrieving user profile data;
      
      creating an expert recommendations list from the profile data containing facts relevant to the user weighted by confidence levels for each fact;
      
      broadcasting an identifier for each fact to the different vector spaces;
      
      mapping the fact into the vector spaces; and
      
      identifying information in each of the vector spaces similar to the fact.
  - 24. A method according to claim 23 including:
    - deriving a starting set of facts from the user profile;
      
      generating inferred facts from the set of facts according to a set of rules;
      
      generating a confidence level for each of the inferred facts according to a knowledge tree; and
      
      generating an expert recommendations list by identifying only the inferred facts above a predetermined confidence level.
  - 25. A method according to claim 15 including;
    - generating a feedback event table that rates a set of information according to user feedback on the identified information;
      
      mapping the set of information in the feedback event table into each vector space; and
      
      locating a feedback event table vector in each vector space according to the mapped set of information and the ratings associated with the information.
  - 26. A method according to claim 25 including conducting a user query as follows:
    - identifying a feedback event table associated with the user;
      
      broadcasting the identified feedback event table to the vector spaces;
      
      recalling feedback event table vectors in the vector spaces for the identified feedback event table; and
      
      identifying information in each vector space similar to the feedback event table vectors.
  - 27. A method according to claim 25 including conducting a type 1 social query as follows:
    - identifying a feedback event table for the user;
      
      broadcasting the identified feedback event table to the vector spaces;
      
      mapping the feedback event table vector into the vector spaces;
      
      locating other similar feedback event table vectors representing reading interests of other users;
      
      generating a goodness score from each of the vector spaces indicating how closely related the feedback event table vector is to the central concepts of the vector spaces; and
      
      generating a recommendations list from each one of the vector spaces listing the other users with the most similar feedback event table vectors and the goodness score.
  - 28. A method according to claim 15 including conducting a type 2 social query as follows:
    - calling up a knowledge-based system to look up facts about the user;
      
      creating an expert recommendations list containing facts relevant to the user weighted by confidence levels for each fact;
      
      identifying a subset of key facts in the expert recommendations list;
      
      locating other users according to similarity of the key facts and the confidence levels of the similar key facts;
      
      generating a recommendations list by the knowledge-based system of the identified similar users.
  - 29. A method according to claim 28 including selecting feedback event tables of the most similar identified users and merging the information identified in the feedback event tables together to form a recommendations list.

30. A genetic system for information retrieval and information categorization, comprising:
- a corpus of information;
  
  a multidimensional vector space derived from the corpus of information, the vector space comprising a set of axes that locate contextual relationships in the corpus of information;
  
  a centroid space that locates central concepts in the vector space; and
  
  a collator that automatically controls evolution of the vector space over time according to the relevancy of the central concepts to information queries.
- View Dependent Claims (31, 32, 33, 34, 35, 36, 37, 38)
- - 31. A system according to claim 30 including a goodness space derived from the centroid space that identifies how closely information in the corpus relates to the central concepts identified in the vector space.
  - 32. A system according to claim 30 wherein the collator is terminated when the central concepts are not identified as relevant to the information queries for a given period of time.
  - 33. A system according to claim 30 wherein the corpus includes profile data from multiple users and the vector space derived from the information corpus identifies information relevant to the multiple users.
  - 34. A system according to claim 30 wherein the centroid space classifies the multiple users into groups having similar profile characteristics and informational interests.
  - 35. A system according to claim 30 including one or more mites that selectively transport information to the corpus from different information sources according to the similarity of the information to the central concepts in the vector space.
  - 36. A system according to claim 30 including a liaison that continuously queries the collator for information and maintains a user feedback event table ranking the relevancy of the information suggested by the collator in response to the queries.
  - 37. A system according to claim 30 wherein the collator comprises the following:
    - a birth stage where the collator forms the corpus of information and generates a vector space that identifies central concepts in the information corpus;
      
      an adolescence phase where the collator identifies information in the information corpus in response to queries and selectively searches and adds new information to the information corpus according to similarity of the new information to the information corpus, the collator identifying information during queries according to the similarly of the query to the central concepts; and
      
      a maturity phase where the collator either dies checking the information corpus back into an information source or reproduces maintaining a first portion of the information corpus most similar to the central concepts and discarding a second portion of the information corpus least similar to the central concepts.
  - 38. A system according to claim 30 wherein the collator automatically spawns an offspring vector space when the collator is successful over time in identifying information relevant to the information queries, the offspring comprising a subset of information in the vector spaces that most closely relate to the central concepts identified in the associated vector spaces.

39. A method for processing queries in an information retrieval system, comprising;
- initiating selectable query modes;
  
  generating a query according to the query modes selected;
  
  identifying the concepts in the information set most similar to the query;
  
  identifying the information in the information set most closely clustered around the identified concepts;
  
  generating a goodness score indicating how closely the query relates to the identified concepts;
  
  combining the identified information and the goodness score into a recommendations list;
  
  wherein one of the query modes comprises a knowledge-based query including;
  
  retrieving user profile data;
  
  creating an expert recommendations list from the profile data containing facts relevant to the user weighted by confidence levels;
  
  broadcasting an identifier for each fact separately to the collator;
  
  recalling stored topic vectors representing the fact identifiers in the collator; and
  
  identifying information in the collator similar to each of the topic vectors.

40. A method for processing queries in an information retrieval system, comprising;
- initiating selectable query modes;
  
  generating a query according to the query modes selected;
  
  identifying the concepts in the information set most similar to the query;
  
  identifying the information in the information set most closely clustered around the identified concepts;
  
  generating a goodness score indicating how closely the query relates to the identified concepts;
  
  combining the identified information and the goodness score into a recommendations list;
  
  wherein one of the query modes comprises a user query including;
  
  identifying a feedback event table for a user;
  
  broadcasting the identified feedback event table to the collator;
  
  recalling a feedback event table vector in the collator for the identified feedback event table; and
  
  identifying information in the collator similar to the feedback event vector.

41. A method for processing queries in an information retrieval system, comprising;
- initiating selectable query modes;
  
  generating a query according to the query modes selected;
  
  identifying the concepts in the information set most similar to the query;
  
  identifying the information in the information set most closely clustered around the identified concepts;
  
  generating a goodness score indicating how closely the query relates to the identified concepts;
  
  combining the identified information and the goodness score into a recommendations list;
  
  wherein one of the query modes comprises a type 1 social query including;
  
  generating a feedback event table rating information according to its relevancy to previous queries;
  
  mapping the information in the feedback event table into the collator;
  
  identifyg a feedback event table vector in the collator according to the mapped set of information and the rating associated with the information;
  
  locating in the collator other similar feedback event table vectors representing reading interests of other users;
  
  generating a goodness score for the collator indicating how closely the feedback event table vector for the user relates to the central concepts of the collator; and
  
  generating a recommendations list for the user listing the feedback event tables for the most similar other users and the goodness score.

42. A method for processing queries in an information retrieval system, comprising;
- initiating selectable query modes;
  
  generating a query according to the query modes selected;
  
  identifying the concepts in the information set most similar to the query;
  
  identifying the information in the information set most closely clustered around the identified concepts;
  
  generating a goodness score indicating how closely the query relates to the identified concepts;
  
  combining the identified information and the goodness score into a recommendations list;
  
  wherein one of the query modes comprises a type 2 social query as follows;
  
  using a knowledge-based system to look up facts about the user;
  
  creating an expert recommendations list containing facts relevant to the user weighted by confidence levels;
  
  identifying a key set of facts in the expert recommendations list;
  
  locating other users according to similarity of the key facts and the confidence levels of the similar key facts; and
  
  returning a recommendations list by the knowledge-based system of the identified similar users.

43. A method for categorizing users in an information retrieval system, comprising:
- mapping reading histories for multiple users into multiple vector spaces;
  
  identifying central concepts in the vector spaces;
  
  mapping a reading history for a target user into the multiple vector spaces;
  
  identifying which central concepts are most relevant to the reading history of the target user;
  
  generating a recommendations list identifying the users most closely clustered; and
  
  wherein mapping reading histories of multiple users includes;
  
  maintaining a feedback event table identifying information supplied to the users during previous queries;
  
  ranking the information in the feedback event table according to the relevance of the information to the previous queries;
  
  mapping the information into the vector spaces;
  
  generating a feedback event table vector that is located in the vector spaces according to the mapped information and the ratings associated with the mapped information;
  
  locating similar feedback event table vectors in the vector spaces for other users; and
  
  generating a recommendations list identifying the similar uses.

44. A method for adapting a semantic space comprising:
- generating the semantic space from a resident set of information;
  
  continuously checking for new information that become available in the information source;
  
  computing a goodness value that characterizes the closeness of the new information to concepts in the semantic space for the resident set of information; and
  
  automatically adding the new information to the resident set of information when the goodness value meets a given threshold.
- View Dependent Claims (45, 46, 47)
- - 45. A method according to claim 44 including providing a mite that checks information in and out of the information source through a queue and selectively submits the information to the semantic space according to the goodness value generated by the semantic space.
  - 46. A method according to claim 44 including automatically terminating the semantic space according to both similarity of the information in the semantic space to central concepts identified in the semantic space and according to user feedback event tables that identify how closely the information in the semantic space relates to previous queries.
  - 47. A method according to claim 44 including the following steps:
    - at periodic times automatically removing information in the resident set of information that ranks in a lowest percentile of goodness scores; and
      
      regenerating the semantic space with the remaining resident set of information.

48. A system for classifying information, comprising;
- a knowledge-based system that includes facts and sets of rules over the facts, the knowledge-based system inferring facts from initial information, assigning confidence levels for each of the inferred facts, and identifying key facts according to the assigned confidence levels;
  
  an artificial neural network that converts a corpus of information into a multidimensional vector space having a set of axes that locate contextual relationships in the corpus of information, the neural network receiving a key fact from the knowledge-based system, mapping the key fact into the vector space, and identifying information in the vector space similar to the key fact; and
  
  an information processor for representing, storing, and incrementally improving the representations of facts from the knowledge-based system within the vector space of the neural network.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
WebMD Health Corporation
Original Assignee
Sapient Health Network, Inc. (WebMD Health Corporation)
Inventors
Burke, Scott M., Hazlehurst, Brian L., Nybakken, Kristopher E.
Primary Examiner(s)
Amsbury, Wayne
Assistant Examiner(s)
PARDO, THUY N

Application Number

US08/936,354
Time in Patent Office

762 Days
Field of Search

707/2, 707/10, 707/102, 707/532, 707/3, 704/9, 704/241, 382/159, 345/440
US Class Current

1/1
CPC Class Codes

G06F 16/3332   Query translation

Y10S 707/99933   Query processing, i.e. sear...

Y10S 707/99942   Manipulating data structure...

Y10S 707/99943   Generating database or data...

Intelligent query system for automatically indexing information in a database and automatically categorizing users

First Claim

5 Assignments

0 Petitions

Accused Products

Abstract

694 Citations

48 Claims

Specification

Use Cases

Quick Links

Others

Intelligent query system for automatically indexing information in a database and automatically categorizing users

First Claim

5 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

694 Citations

48 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others