COMPUTERIZED SEARCHABLE DOCUMENT REPOSITORY USING SEPARATE METADATA AND CONTENT STORES AND FULL TEXT INDEXES
First Claim
1. A computerized searchable repository for documents, each document having a structured metadata part and one or more unstructured content parts, comprising:
- a storage sub-system operative to store the documents, a full text index and a linking structure, the content parts of the documents being stored in a single-instanced manner avoiding duplication of identical content parts, the full text index being usable for keyword searching of the documents and including a metadata index and a content index of the metadata and content parts respectively of the documents, the linking structure including metadata-to-content links and content-to-metadata linking entries, each metadata-to-content link linking a metadata part of a respective document to each content part of the document, each content-to-metadata linking entry having one or more content-to-metadata links collectively linking a respective content part to the metadata parts of a group of documents that each include the content part; and
processing circuitry operative to perform full text indexing of the documents in the storage sub-system, the full text indexing of each document including metadata indexing a metadata part, conditionally content indexing a content part, and updating the linking structure, the content indexing being performed only if the content part is a new content part not matching any of at least a set of content parts already stored in a content store and indexed in the content index, each of the metadata indexing and content indexing including generating new index entries in the metadata or content index respectively for the metadata or content part respectively, each index entry associating a respective key word or key value with a corresponding one or more metadata or content parts containing the key word or key value, and the updating of the linking structure including generating new metadata-to-content and content-to-metadata links between the metadata part and either the new content part or an existing matching content part if present.
10 Assignments
0 Petitions
Accused Products
Abstract
A computerized searchable repository stores documents as structured metadata parts and unstructured content parts using single instancing. A full text index used for keyword searching includes a metadata index and a content index. A linking structure includes metadata-to-content (MD to CT) links and content-to-metadata (CT to MD) linking entries, with each MD to CT link linking a metadata part of a document to each content part of the document, and each CT to MD linking entry having one or more CT to MD links collectively linking a content part to the metadata parts of the documents that include the content part. Indexing includes metadata indexing a metadata part, conditionally content indexing a content part, and updating the linking structure. Content indexing is performed only if the content part does not match a content part already stored and indexed. Index entries each associate a key word or key value with corresponding metadata or content parts containing the key word or key value. Updating the linking structure includes generating new MD to CT and CT to MD links between the metadata part and either the new content part or an existing matching content part if present.
-
Citations
22 Claims
-
1. A computerized searchable repository for documents, each document having a structured metadata part and one or more unstructured content parts, comprising:
-
a storage sub-system operative to store the documents, a full text index and a linking structure, the content parts of the documents being stored in a single-instanced manner avoiding duplication of identical content parts, the full text index being usable for keyword searching of the documents and including a metadata index and a content index of the metadata and content parts respectively of the documents, the linking structure including metadata-to-content links and content-to-metadata linking entries, each metadata-to-content link linking a metadata part of a respective document to each content part of the document, each content-to-metadata linking entry having one or more content-to-metadata links collectively linking a respective content part to the metadata parts of a group of documents that each include the content part; and processing circuitry operative to perform full text indexing of the documents in the storage sub-system, the full text indexing of each document including metadata indexing a metadata part, conditionally content indexing a content part, and updating the linking structure, the content indexing being performed only if the content part is a new content part not matching any of at least a set of content parts already stored in a content store and indexed in the content index, each of the metadata indexing and content indexing including generating new index entries in the metadata or content index respectively for the metadata or content part respectively, each index entry associating a respective key word or key value with a corresponding one or more metadata or content parts containing the key word or key value, and the updating of the linking structure including generating new metadata-to-content and content-to-metadata links between the metadata part and either the new content part or an existing matching content part if present. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A method of operating a computerized searchable repository for documents, each document having a structured metadata part and one or more unstructured content parts, comprising:
-
storing the documents, a full text index and a linking structure in a storage sub-system of the computerized searchable repository, the content parts of the documents being stored in a single-instanced manner avoiding duplication of identical content parts, the full text index being usable for keyword searching of the documents and including a metadata index and a content index of the metadata and content parts respectively of the documents, the linking structure including metadata-to-content links and content-to-metadata linking entries, each metadata-to-content link linking a metadata part of a respective document to each content part of the document, each content-to-metadata linking entry having one or more content-to- metadata links collectively linking a respective content part to the metadata parts of a group of documents that each include the content part; and operating processing circuitry of the computerized searchable repository to perform full text indexing of the documents in the storage sub-system, the full text indexing of each document including metadata indexing a metadata part, conditionally content indexing a content part, and updating the linking structure, the content indexing being performed only if the content part is a new content part not matching any of at least a set of content parts already stored in a content store and indexed in the content index, each of the metadata indexing and content indexing including generating new index entries in the metadata or content index respectively for the metadata or content part respectively, each index entry associating a respective key word or key value with a corresponding one or more metadata or content parts containing the key word or key value, and the updating of the linking structure including generating new metadata-to-content and content-to-metadata links between the metadata part and either the new content part or an existing matching content part if present. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
-
Specification