Structured-text cataloging method, structured-text searching method, and portable medium used in the methods
First Claim
1. A structured-text cataloging/searching method for a text searching system, in which a set of texts is searched for specific text contents, comprising the following steps:
- an already-analyzed-text data generating/cataloging step of cataloging, in a text database, already-analyzed-text data obtained from an analysis of a logical structure of a text to be cataloged, said already-analyzed-text data generating/cataloging step being performed for a plurality of texts to be cataloged;
a structure-index creating step of creating a structure index, by sequentially superposing logical structures of said plurality of texts cataloged in said already-analyzed-text data generating/cataloging step;
wherein said structure index has a tree-like structure composed of a plurality of metanodes;
wherein a context identifier that uniquely identifies one of said metanodes is assigned to each metanode of said structure index; and
wherein a group of structure elements having the same position of appearance and the same element type for a plurality of texts are represented by a single metanode;
a character-string-index updating step comprising the sub-steps of;
extracting partial character strings each having a predetermined character count from each of a plurality of texts to be cataloged; and
updating a character string index by cataloging an associative relation between each of said partial character strings and structured character position information of that partial character string in said character string index;
a structure-condition judging step of searching the structure index for a set of context identifiers satisfying a specific structure condition;
a structured-character-position-information extracting step of extracting partial character strings from a search term, each extracted partial character string having a predetermined character count, and searching the character string index for a set of pieces of structured-character-position information matching said extracted partial character strings; and
an index searching step of searching said set of pieces of structured-character-position information for specific pieces of structured-character-position information that have context identifiers found at said structure-condition judging step, and that have a positional relation among said specific pieces of structured-character-position information matching an order of arrangements of said partial character strings in said search term.
1 Assignment
0 Petitions
Accused Products
Abstract
A text cataloging method includes a step of cataloging already-analyzed-text data obtained from an analysis of a logical structure of a text to be cataloged in a text database, a step of creating a structure index by sequentially superposing logical structures of texts to be cataloged, wherein a single metaelement is used for representing a group of elements in the texts having the same position of appearance in one of the texts and the same element type, a single piece of meta-character-string data is used for representing a group of pieces of character-string data in the texts having the same position of appearance in one of the texts, and a context identifier is assigned to each metanode composing a tree-like structure of the structure index for uniquely identifying the metanode; a step of generating structured-full-text data composed of definitions of associative relations between all pieces of character-string data-included in already-analyzed-text data of each text to be cataloged, and context identifiers of pieces of meta-character-string data in the structure index used for representing the pieces of character-string data; and a character-string-index updating step, including the sub-steps of extracting partial character strings, generating structured-character-position information, and updating a character-string index.
93 Citations
10 Claims
-
1. A structured-text cataloging/searching method for a text searching system, in which a set of texts is searched for specific text contents, comprising the following steps:
-
an already-analyzed-text data generating/cataloging step of cataloging, in a text database, already-analyzed-text data obtained from an analysis of a logical structure of a text to be cataloged, said already-analyzed-text data generating/cataloging step being performed for a plurality of texts to be cataloged; a structure-index creating step of creating a structure index, by sequentially superposing logical structures of said plurality of texts cataloged in said already-analyzed-text data generating/cataloging step; wherein said structure index has a tree-like structure composed of a plurality of metanodes; wherein a context identifier that uniquely identifies one of said metanodes is assigned to each metanode of said structure index; and wherein a group of structure elements having the same position of appearance and the same element type for a plurality of texts are represented by a single metanode; a character-string-index updating step comprising the sub-steps of; extracting partial character strings each having a predetermined character count from each of a plurality of texts to be cataloged; and updating a character string index by cataloging an associative relation between each of said partial character strings and structured character position information of that partial character string in said character string index; a structure-condition judging step of searching the structure index for a set of context identifiers satisfying a specific structure condition; a structured-character-position-information extracting step of extracting partial character strings from a search term, each extracted partial character string having a predetermined character count, and searching the character string index for a set of pieces of structured-character-position information matching said extracted partial character strings; and an index searching step of searching said set of pieces of structured-character-position information for specific pieces of structured-character-position information that have context identifiers found at said structure-condition judging step, and that have a positional relation among said specific pieces of structured-character-position information matching an order of arrangements of said partial character strings in said search term. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A text searching system, comprising:
-
a text cataloging sub-system in which a plurality of texts are cataloged for use as text search objects in a text search operation; a text searching server that causes a text database to be searched in response to a text search request; a text searching client from which a text search request is issued to the text searching server; and a network connecting the text cataloging sub-system, the text searching server, and the text searching client; wherein the text cataloging sub-system creates a structure index based on the cataloging of the plurality of texts, the structure index containing context identifiers of character-string data derived from the cataloged texts; wherein said structure index has a tree-like structure composed of a plurality of metanodes; wherein a context identifier that uniquely identifies one of said metanodes is assigned to each metanode of said structure index; wherein a group of structure elements having the same position of appearance and the same element type for a plurality of texts are represented by a single metanode; wherein the text search request issued by the text searching client includes a search condition that is translated into a condition specification by the text searching server, from which condition specification the text searching server causes the structure index to be searched for agreement between the context identifiers and the search condition; and wherein the text searching server transmits a text search result to the text searching client upon completion of the search of the structure index.
-
-
10. A text searching system, comprising:
-
a text cataloging sub-system in which a plurality of texts are cataloged for use as text search objects in a text search operation; a text searching server that causes a text database to be searched in response to a text search request; a text searching client from which a text search request is issued to the text searching server; and a network connecting the text cataloging sub-system, the text searching server, and the text searching client; wherein the text cataloging sub-system stores and updates a character-string index from texts input to be cataloged; wherein the character-string index includes partial character strings extracted from the input texts, each partial character string having a predetermined character count;
character-position information of said partial character strings;
a text identifier for uniquely identifying the text in the text database; and
a context identifier of a metanode representing character-string data including the partial character strings in a structure index; andwherein the text cataloging sub-system updates the stored character-string index by generating structured-character-position information that includes the character-position information, the text identifier, and the context identifier, and by cataloging an associative relation between each of the partial character strings and the structured-character-position information in the stored character-string index.
-
Specification