Composite storage
First Claim
Patent Images
1. A method of storing data, including natural language information, the method comprising:
- receiving at a computer, input data as an input text stream;
parsing, by the computer, the input data to form a term unit matrix, the term unit matrix comprising a plurality of term units, wherein at least one of the plurality of term units comprises a linguistic word, the linguistic word comprises text content of the input text stream, the linguistic word comprises plural characters, each term unit is one of a functional term unit or a content term unit, and the term unit matrix is a substantially lossless representation of the input data;
forming, based on processing the term unit matrix to transform the data, by the computer, a Windex, the Windex having a plurality of inteaer values corresponding to the plurality of term units of the term unit matrix, wherein the Windex is an ordered index with each entry mapped to respective ones of a plurality of ranges of numbers based on grammar-embedded rules, wherein at least one of the plurality of integer values numerically encodes an entirety of the linguistic word as a single integer value, wherein a first range of the numbers is related with functional term units, and a second range of the numbers is related with content term units;
transforming, based on the term unit matrix, by the computer, the input data to form an ISetM index wherein the ISetM index is a listing of the plurality of integer values;
forming based on the Windex, by the computer, a Windex Block Index from the transformed input data, wherein the Windex Block Index is a listing of the plurality of integer values and their respective locations in the input data;
storing, by the computer, the Windex, the ISetM Index, and the Windex Block Index in a memory of the computer, wherein the Windex is different from the ISetM Index, the ISetM Index is different from the Windex Block Index, and the Windex is different from the Windex Block index;
processing, by the computer, the Windex, the ISetM Index, and the Windex Block Index from the memory to form a composite storage of the input data as a model for retrieving natural language information and other types of information found in any type of documentation, wherein the composite storage includes two sections;
a number of extra grammatical length or lengths that contain other kinds of information outside the range of language and a ISetM length, which can be divided into sub lengths, that contains most or all the grammar elements from a set of at least one language; and
performing, by the computer, a search on the input data using the composite storage of the input data to retrieve, based on classifications, a portion of the input data in response to user'"'"'s input without requiring a full grammatical interpretation of the user'"'"'s input, wherein the retrieving further includes the filtering the ISetM to determine whether the ISetM is responsive to the user'"'"'s input, if the ISetM does not contain a value within the first range, regardless of the alphanumeric order of the terms, the ISetM is not responsive to the user'"'"'s input.
1 Assignment
0 Petitions
Accused Products
Abstract
The present application relates in general to information retrieval and repository management system and in particular to composite storage that efficiently and compactly stores all grammatical information, including text and non-text information, about a document or a set of documents, as well as various measures based on the language used within such documents to allow any size device to manage the information that it requires to perform its functions.
-
Citations
13 Claims
-
1. A method of storing data, including natural language information, the method comprising:
-
receiving at a computer, input data as an input text stream; parsing, by the computer, the input data to form a term unit matrix, the term unit matrix comprising a plurality of term units, wherein at least one of the plurality of term units comprises a linguistic word, the linguistic word comprises text content of the input text stream, the linguistic word comprises plural characters, each term unit is one of a functional term unit or a content term unit, and the term unit matrix is a substantially lossless representation of the input data; forming, based on processing the term unit matrix to transform the data, by the computer, a Windex, the Windex having a plurality of inteaer values corresponding to the plurality of term units of the term unit matrix, wherein the Windex is an ordered index with each entry mapped to respective ones of a plurality of ranges of numbers based on grammar-embedded rules, wherein at least one of the plurality of integer values numerically encodes an entirety of the linguistic word as a single integer value, wherein a first range of the numbers is related with functional term units, and a second range of the numbers is related with content term units; transforming, based on the term unit matrix, by the computer, the input data to form an ISetM index wherein the ISetM index is a listing of the plurality of integer values; forming based on the Windex, by the computer, a Windex Block Index from the transformed input data, wherein the Windex Block Index is a listing of the plurality of integer values and their respective locations in the input data; storing, by the computer, the Windex, the ISetM Index, and the Windex Block Index in a memory of the computer, wherein the Windex is different from the ISetM Index, the ISetM Index is different from the Windex Block Index, and the Windex is different from the Windex Block index; processing, by the computer, the Windex, the ISetM Index, and the Windex Block Index from the memory to form a composite storage of the input data as a model for retrieving natural language information and other types of information found in any type of documentation, wherein the composite storage includes two sections;
a number of extra grammatical length or lengths that contain other kinds of information outside the range of language and a ISetM length, which can be divided into sub lengths, that contains most or all the grammar elements from a set of at least one language; andperforming, by the computer, a search on the input data using the composite storage of the input data to retrieve, based on classifications, a portion of the input data in response to user'"'"'s input without requiring a full grammatical interpretation of the user'"'"'s input, wherein the retrieving further includes the filtering the ISetM to determine whether the ISetM is responsive to the user'"'"'s input, if the ISetM does not contain a value within the first range, regardless of the alphanumeric order of the terms, the ISetM is not responsive to the user'"'"'s input. - View Dependent Claims (2, 3)
-
-
4. A computing device comprising:
-
one or more processors; and a non-transitory computer-readable medium storing programming instructions that are executed by the one or more processors to; receive at a computer input data comprising at least one document; parse the input data to form a term unit matrix, the term unit matrix comprising a plurality of term units, wherein at least one of the plurality of term units comprises a linguistic word, the linguistic word comprises plural characters, each term unit is one of a functional term unit or a content term unit, and the term unit matrix is a substantially lossless representation of the input data; form, based on processing the term unit matrix to transform the input data, a Windex, wherein the Windex comprises a plurality of integer values that are determined based on grammar-embedded rules, wherein at least one of the plurality of integer values numerically encodes an entirety of the linguistic word as a single integer value, wherein the Windex is an ordered index with each entry mapped to one of a plurality of ranges of numbers based on determined grammar function, wherein a first range of the numbers is related with functional term units, and a second range of the numbers is related with content term units; form, based on the process the term unit matrix to transform the input data, an ISetM Index, wherein the ISetM Index is a listing of integer value entries of the Windex; form, based on the Windex, a Windex Block index from the transformed input data, wherein the Windex Block Index comprises a listing of integer value entries of the Windex and their respective locations in the input data; store the Windex, the ISetM Index, and the Windex Block Index in a memory a the computer; process the Windex, the ISetM Index, and the Windex Block Index from the memory to form a composite storage of the input data as a model for retrieving natural language information and other types of information found in any type of documentation, wherein the composite storage is comprised of data in at least two basic sections;
a number of extra grammatical lenght or lenghts that contain other kinds of information outside the range of language and a ISetM length, which can be divided into sub lengths, that contains most or all the grammar elements from a set of at least one language, wherein the Windex is different from the ISetM Index, the ISetM Index is different from the Windex Block Index, and the Windex is different from the Windex Block Index; andperform a search on the input data using the composite storage of the input data to retrieve, based on classifications, a portion of the input data in response to user'"'"'s input without requiring a full grammatical interpreatition of the user'"'"'s input, wherein the retrieving further includes the filtering the ISetM to determine whether the ISetM is resoonsive to the user'"'"'s input, if the ISetM does not contain a value within the first range, regardless of the alphanumeric order of the terms, the ISetM is not responsive to the user'"'"'s input. - View Dependent Claims (5, 6)
-
-
7. A non-transitory computer-readable medium storing programming instructions that are executed by a processor of a computer to:
-
receive at the computer input data comprising at least one document; parse the input data to form a term unit matrix, the term unit matrix comprising a plurality of term units, wherein at least one of the plurality of term units comprises a linguistic word, the linguistic word comprises plural characters, each term unit is one of a functional term unit or a content term unit, and the term unit matrix is a substantially lossless representation of the input data; form, based on processing the term unit matrix to transform the input data, a Windex, wherein the Windex comprises a plurality of integer values that are determined based on grammar-embedded rules, wherein at least one of the plurality of integer values numerically encodes an entirety of the linguistic word as a single integer value, wherein the Windex is an ordered index with each entry mapped to one of a plurality of ranges of numbers based on determined grammar function, wherein a first range of the numbers is related with functional term units, and a second range of the numbers is related with content term units; form, based on the processing the term unit matrix to transform the input data, an ISetM Index, wherein the ISetM Index is a listing of integer value entries of the Windex; form, based on the Windex, a Windex Block index from the transformed data, wherein the Windex Block Index comprises a listing of integer value entries of the Windex and theft respective locations in the input data; store the Windex, the ISetM Index, and the Windex Block Index in a memory of the computer; process the Windex, the ISetM Index, and the Windex Block Index from the memory to form a composite storage of the input data as a model for retrieving natural language information and other types of information found in any type of documentation, wherein the composite storagae is comprised of data in at least two basic sections;
a number of extra grammatical length or lengths that contain other kinds of information outside the range of language and a ISetM length, which can be divided into sub lengths, that contains most or all the grammar elements from a set of at least one language, wherein the Windex is different from the ISetM Index, the ISetM Index is different from the Windex Block Index, and the Windex is different from the Windex Block Index; andperform a search on the input data using the composite storage of the input data to retrieve, based on classifications, a portion of the input data in response to user'"'"'s input without requiring a full grammatical interpretation of the user'"'"'s input, wherein the retrieving further includes the filtering the ISetM to determine whether the ISetM is responsive to the user'"'"'s input, if the ISetM does not contain a value within the first range, regardless of the alphanumeric order of the terms, the iSetM is not responsive to the user'"'"'s input. - View Dependent Claims (8, 9, 10, 11, 12, 13)
-
Specification