High accuracy document information-element vector encoding server
First Claim
1. A computer-implemented method comprising:
- applying finite state automaton (FSA) to parse a document to identify one or more information elements (IEs) in the document; and
deriving a unique symbolic sequence particular to the document based on the one or more IEs contained in the document, such unique symbolic sequence being analogous to the DeoxyriboNucleic Acid (DNA) sequence in animals and/or plants.
0 Assignments
0 Petitions
Accused Products
Abstract
Some embodiments of a high-accuracy document information element-vector (IE-vector) encoding server have been presented. In one embodiment, the high-accuracy document IE-vector encoding server applies finite state automaton (FSA) to parse a document to identify one or more information elements (IEs) in the document. Then a DNA sequence of the document is derived based on the one or more IEs. The concept of DNA sequence of a document is powerful and can be used in building automated tools such as computer based processes to automatically reason and search for similarity, dissimilarity, equivalence and other relationships between structured, semi-structured and unstructured data and information. The DNA sequence of a document provides powerful paradigm to build sophisticated information and data search and retrieval techniques and tools.
-
Citations
20 Claims
-
1. A computer-implemented method comprising:
-
applying finite state automaton (FSA) to parse a document to identify one or more information elements (IEs) in the document; and
deriving a unique symbolic sequence particular to the document based on the one or more IEs contained in the document, such unique symbolic sequence being analogous to the DeoxyriboNucleic Acid (DNA) sequence in animals and/or plants. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A machine-readable medium that provides information encoding comprising:
-
a plurality of information element (IE) definitions, wherein each of the plurality of IE definitions corresponds to a basic building block of documents that encapsulates a predetermined type of information; and
a plurality of data structure, each of the plurality of data structure being associated with a distinct one of the plurality of IE definitions. - View Dependent Claims (11, 12, 13)
-
-
14. An apparatus comprising:
-
a finite state machine to parse a document to identify one or more information elements (IEs) in the document; and
a DeoxyriboNucleic Acid (DNA) generator coupled to the finite state machine to derive a DNA sequence of the document based on the one or more IEs. - View Dependent Claims (15, 16, 17, 18, 19, 20)
-
Specification