Performance and scalability in an intelligent data operating layer system

US 9,323,767 B2
Filed: 10/01/2012
Issued: 04/26/2016
Est. Priority Date: 10/01/2012
Status: Active Grant

First Claim

Patent Images

1. A system comprising:

an index handler;

data servers configured for distributed processing of structured and unstructured data,where a first of the data servers is configured to;

use a statistical algorithm to assign weights to terms found in content of electronic files,use idea distancing between similar concepts to find different words in the electronic files that describe a same idea,form a conceptual understanding of content in each of the electronic files using the statistical algorithm and the idea distancing, andcooperate with the index handler to form a common index of the conceptual understanding of each of the electronic files, where the common index includes at least a conceptual understanding of a first of the electronic files that contains structured data and a conceptual understanding of a second of the electronic files that contains unstructured data, anda query pipeline that includes an action handler that is configured to support a distribution of command actions in a common protocol to the data servers,wherein the data servers are configurable to selectively run in one of a mirror mode and a non-mirror mode, wherein the data servers in the mirror mode have a same configuration and contain same data, and wherein the data servers in the non-mirror mode are configured differently and contain different data, andwherein the index handler is configurable to selectively run in one of the mirror mode and the non-mirror mode, and wherein the index handler in the mirror mode is configured to distribute same index data of the common index to the data servers, and the index handler in the non-mirror mode is configured to distribute different portions of the common index to the corresponding data servers.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Systems and methods that allow for an intelligence platform for distributed processing of big data sets including both structured and unstructured data types across two or more intelligent data operation engine servers. The intelligent data operation engine servers can form a conceptual understanding of content in each electronic file and then cooperates with a distributed index handler to index the conceptual understanding of the electronic file. A query pipeline and the distributed index handler in the intelligence platform cooperate with the two or more intelligent data operation engine servers to improve scalability and performance on the big data sets containing both structured and un-structured electronic files represented in the common index.

Citations

20 Claims

1. A system comprising:
- an index handler;
  
  data servers configured for distributed processing of structured and unstructured data,where a first of the data servers is configured to;
  
  use a statistical algorithm to assign weights to terms found in content of electronic files,use idea distancing between similar concepts to find different words in the electronic files that describe a same idea,form a conceptual understanding of content in each of the electronic files using the statistical algorithm and the idea distancing, andcooperate with the index handler to form a common index of the conceptual understanding of each of the electronic files, where the common index includes at least a conceptual understanding of a first of the electronic files that contains structured data and a conceptual understanding of a second of the electronic files that contains unstructured data, anda query pipeline that includes an action handler that is configured to support a distribution of command actions in a common protocol to the data servers,wherein the data servers are configurable to selectively run in one of a mirror mode and a non-mirror mode, wherein the data servers in the mirror mode have a same configuration and contain same data, and wherein the data servers in the non-mirror mode are configured differently and contain different data, andwherein the index handler is configurable to selectively run in one of the mirror mode and the non-mirror mode, and wherein the index handler in the mirror mode is configured to distribute same index data of the common index to the data servers, and the index handler in the non-mirror mode is configured to distribute different portions of the common index to the corresponding data servers.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 17)
- - 2. The system of claim 1, wherein a first instance of the action handler is implemented in a distribution server configured to support and convey distribution of the command actions in the common protocol both to and between the data servers.
  - 3. The system of claim 1, wherein the first data server is configured to use the idea distancing to form clusters of the electronic files, each of the clusters conveying a similar concept based on the formed conceptual understanding of the electronic files in the respective cluster fitting into a respective category.
  - 4. The system of claim 1, wherein the index handler is configured to split and index data into the data servers, optimize performance by batching data for each data server to process, and replicate index commands between the data servers.
  - 5. The system of claim 1, wherein the query pipeline is configured to analyze a statistically most rare occurrence of a term present in search terms of the query, and to determine, based on the analyzing, a most relevant portion of structured data and unstructured data to begin a for the query.
  - 6. The system of claim 1, wherein the query pipeline is configured to focus a majority of total processing power of the data servers to find electronic files most relevant towards a term in the query that has a most rare occurrence, and wherein the query pipeline is configured to start a search in response to the query that is weighted more heavily on the term having the most rare occurrence.
  - 7. The system of claim 1, further comprising:
    - repositories storing the electronic files that are organized by subject matter, and wherein the repositories include both repositories for the structured data and repositories for the unstructured data, and wherein the conceptual understandings are represented in an XML format.
  - 8. The system of claim 1, wherein the first data server includes an executable dynamic reasoning engine that is configured to interact with one or more processors implementing multi-threaded processes of the first data server to generate the conceptual understandings of the electronic files, and execute queries on the electronic files.
  - 9. The system of claim 1, wherein conceptual understandings of the electronic files are indexed and organized within the common index, and each conceptual understanding within the common index has one or more pointers to the respective electronic file.
  - 10. The system of claim 1, wherein the action handler is further configured to propagate query actions to the data servers to search the common index, where the action handler is further configured to:
    - use the data servers as a pool of servers,automatically select a primary data server in the pool, andswitch to a secondary data server in the pool when the primary data server fails.
  - 11. The system of claim 1, wherein the system increases performance of both the index handler and the query pipeline by using database statistics and life cycle management to determine relevant electronic files and to place the relevant electronic files into respective categories for indexing.
  - 12. The system of claim 11, wherein the first data server is configured to make a determination of which electronic files are most relevant by considering an amount of relevant documents returning to key terms of a query, a strength rating or percentage relevance of those returned documents to the query, and tracked historic data of a most relevant indexed categories to search for previous similar queries.
  - 13. The system of claim 1, where the action handler is configured to distribute the command actions when a lack of feedback from a particular data server occurs to ensure uninterrupted service when any of the data servers should fail.
  - 14. The system of claim 1, wherein the first data server is configured to perform adaptive probabilistic concept caching in which frequently-used concepts are maintained in a memory of the first data server and query results are returned using the frequently-used concepts maintained in the memory.
  - 15. The system of claim 1, wherein the first data server is configured to form the conceptual understanding of a given electronic file by:
    - eliminating inconsequential information from a content in the given electronic file;
      
      generating a set of key terms, wherein the set of key terms include one or more of singular terms, higher order terms, noun phrases, proper names, and any combination of these; and
      
      assigning a frequency of occurrence weight to each of the key terms.
  - 17. The system of claim 1, wherein the data servers are configured to run in the mirror mode and the index handler is configured to run in the mirror mode for processing queries under a first condition, andwherein the data servers are configured to run in the non-mirror mode and the index handler is configured to run in the non-mirror mode for processing queries under a second condition.

16. A method comprising:
- assigning weights to terms found in content of electronic files;
  
  using idea distancing between similar concepts to find different words in the electronic files that describe a same idea;
  
  using both the assigned weights and the idea distancing to form a conceptual understanding of the content in each of the electronic files;
  
  cooperating with an index handler to form a common index of the conceptual understanding of each of the electronic files, where the common index includes at least a conceptual understanding of a first of the electronic files that contains structured data and a conceptual understanding of a second of the electronic files that contains unstructured data;
  
  distributing, by a query pipeline, command actions in a common protocol to data servers;
  
  selectively configuring the data servers to selectively run in one of a mirror mode and a non-mirror mode, wherein the data servers in the mirror mode have a same configuration and contain same data, and wherein the data servers in the non-mirror mode are configured differently and contain different data; and
  
  selectively configuring the index handler to selectively run in one of the mirror mode and the non-mirror mode, wherein the index handler in the mirror mode is configured to distribute same index data of the common index to the data servers, and the index handler in the non-mirror mode is configured to distribute different portions of the common index to corresponding data servers.
- View Dependent Claims (18)
- - 18. The method of claim 16, wherein the data servers are configured to run in the mirror mode and the index handler is configured to run in the mirror mode for processing queries under a first condition, andwherein the data servers are configured to run in the non-mirror mode and the index handler is configured to run in the non-mirror mode for processing queries under a second condition.

19. A non-transitory computer-readable storage medium storing instructions that upon execution cause a system to:
- assign weights to terms found in content of electronic files;
  
  use idea distancing between similar concepts to find different words in the electronic files that describe a same idea;
  
  form a conceptual understanding of the content in each of the electronic files using both the assigned weights and the idea distancing;
  
  cooperate with an index handler to form a common index of the conceptual understanding of each of the electronic files, where the common index includes at least a conceptual understanding of a first of the electronic files that contains structured data and a conceptual understanding of a second of the electronic files that contains unstructured data;
  
  distribute, by a query pipeline, command actions in a common protocol to data servers;
  
  selectively configure the data servers to selectively run in one of a mirror mode and a non-mirror mode, wherein the data servers in the mirror mode have a same configuration and contain same data, and wherein the data servers in the non-mirror mode are configured differently and contain different data; and
  
  selectively configure the index handler to selectively run in one of the mirror mode and the non-mirror mode, wherein the index handler in the mirror mode is configured to distribute same index data of the common index to the data servers, and the index handler in the non-mirror mode is configured to distribute different portions of the common index to corresponding data servers.
- View Dependent Claims (20)
- - 20. The non-transitory computer-readable storage medium of claim 19, wherein the data servers are configured to run in the mirror mode and the index handler is configured to run in the mirror mode for processing queries under a first condition, andwherein the data servers are configured to run in the non-mirror mode and the index handler is configured to run in the non-mirror mode for processing queries under a second condition.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Longsand Limited (Open Text Corporation)
Original Assignee
Longsand Limited (Open Text Corporation)
Inventors
Blanchflower, Sean Mark, Gallagher, Darren John
Primary Examiner(s)
Vital, Pierre
Assistant Examiner(s)
Sultana, Nargis

Application Number

US13/632,825
Publication Number

US 20140095505A1
Time in Patent Office

1,303 Days
Field of Search

707/655, 707/659, 707/728, 707/731, 707/737, 707/830
US Class Current

1/1
CPC Class Codes

G06F 16/13   File access structures, e.g...

G06F 16/134   Distributed indices

G06F 16/182   Distributed file systems

G06F 16/93   Document management systems

Performance and scalability in an intelligent data operating layer system

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Performance and scalability in an intelligent data operating layer system

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links