Full text indexing in a database system
First Claim
Patent Images
1. A processor-implemented method for indexing with redundant information, the method comprising:
- identifying, by a processor, a plurality of unknown code points for a document in response to an indexing request for the document, wherein a code point is a numerical value that is assigned to a character within a codepage, wherein the codepage is a table of numerical values that defines a character set, wherein identifying the plurality of unknown code points for the document further includes;
receiving, by the processor, the document, wherein the document includes one or more attachments;
parsing, utilizing multiple codepages, each of the one or more attachments for the plurality of unknown code points, wherein the unknown code points are within at least one of the one or more attachments;
converting the identified plurality of unknown code points into a plurality of converted code points, wherein each of the plurality of converted code points uses a different codepage, wherein the converting the identified plurality of unknown code points into the plurality of converted code points further comprises;
converting the identified plurality of unknown code points into a first set of converted code points with a first codepage; and
converting the identified plurality of unknown code points into a second set of converted code points with a second codepage;
identifying sets of same code points and sets of redundant code points from the plurality of converted code points;
building an index based on the identified sets of same code points and the identified sets of redundant code points; and
retaining the sets of same code points and the sets of redundant code points, wherein the retaining the sets of same code points further comprises;
retaining a first set of same code points from the first set of converted code points; and
retaining a second set of same code points from the second set of converted code points.
1 Assignment
0 Petitions
Accused Products
Abstract
A method for indexing with redundant information. The method may identify unknown code points for a document in response to an indexing request for the document. The method may further convert the identified unknown code points into a plurality of converted code points. Each set of converted code points of the plurality uses a different codepage. The method may further identify sets of same code points and sets of redundant code points from the plurality of converted code points. The method may build an index based on the sets of same code points and the sets of redundant code points.
-
Citations
5 Claims
-
1. A processor-implemented method for indexing with redundant information, the method comprising:
-
identifying, by a processor, a plurality of unknown code points for a document in response to an indexing request for the document, wherein a code point is a numerical value that is assigned to a character within a codepage, wherein the codepage is a table of numerical values that defines a character set, wherein identifying the plurality of unknown code points for the document further includes; receiving, by the processor, the document, wherein the document includes one or more attachments; parsing, utilizing multiple codepages, each of the one or more attachments for the plurality of unknown code points, wherein the unknown code points are within at least one of the one or more attachments; converting the identified plurality of unknown code points into a plurality of converted code points, wherein each of the plurality of converted code points uses a different codepage, wherein the converting the identified plurality of unknown code points into the plurality of converted code points further comprises; converting the identified plurality of unknown code points into a first set of converted code points with a first codepage; and converting the identified plurality of unknown code points into a second set of converted code points with a second codepage; identifying sets of same code points and sets of redundant code points from the plurality of converted code points; building an index based on the identified sets of same code points and the identified sets of redundant code points; and retaining the sets of same code points and the sets of redundant code points, wherein the retaining the sets of same code points further comprises; retaining a first set of same code points from the first set of converted code points; and retaining a second set of same code points from the second set of converted code points. - View Dependent Claims (2, 3, 4)
-
-
5. A processor-implemented method for indexing with redundant information, the method comprising:
-
identifying, by a processor, a plurality of unknown code points for a document in response to an indexing request for the document, wherein a code point is a numerical value that is assigned to a character within a codepage, wherein the codepage is a table of numerical values that defines a character set, wherein identifying the plurality of unknown code points for the document further includes; receiving, by the processor, the document, wherein the document includes one or more attachments; parsing, utilizing multiple codepages, each of the one or more attachments for the plurality of unknown code points, wherein the unknown code points are within at least one of the one or more attachments; converting the identified plurality of unknown code points into a plurality of converted code points, wherein each of the plurality of converted code points uses a different codepage, wherein the converting the identified plurality of unknown code points into the plurality of converted code points further comprises; converting the identified plurality of unknown code points into a first set of converted code points with a first codepage; and converting the identified plurality of unknown code points into a second set of converted code points with a second codepage; identifying sets of same code points and sets of redundant code points from the plurality of converted code points; building an index based on the identified sets of same code points and the identified sets of redundant code points; and assigning a weight to the first set of converted code points and the second set of converted code points; wherein, in the built index, a redundant index is constructed from the first set of redundant code points and the second set of redundant code points, and is assigned with a same weight as the first set of converted code points and the second set of converted code points.
-
Specification