Recognizer of text-based work
First Claim
1. A computer-implemented method for hashing a body of text, the method comprising:
- obtaining a body of text;
deriving a hash value representative of content of the body of text, perceptually distinct bodies of text having hash values that are substantially independent of each other.
2 Assignments
0 Petitions
Accused Products
Abstract
Described herein is a technology for recognizing the content of text documents. The technology determines one or more hash values for the content of a text document. Alternatively, the technology may generate a “sifted text” version of a document. In one implementation described herein, document recognition is used to determine whether the content of one document is copied (i.e., plagiarized) from another document. This is done by comparing hash values of documents (or alternatively their sifted text). In another implementation described herein, document recognition is used to categorize the content of a document so that it may be grouped with other documents in the same category. This abstract itself is not intended to limit the scope of this patent. The scope of the present invention is pointed out in the appending claims.
-
Citations
66 Claims
-
1. A computer-implemented method for hashing a body of text, the method comprising:
-
obtaining a body of text;
deriving a hash value representative of content of the body of text, perceptually distinct bodies of text having hash values that are substantially independent of each other. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A method for facilitating recognition of content of a body of text, the method comprising:
-
filtering the content a body of text to remove elements of the content;
determining a recognition representation of the content of such body based upon the filtered subtext. - View Dependent Claims (9, 10, 11, 12, 13, 14, 16, 17, 18, 19)
-
-
15. A computer-implemented method for hashing a body of text, the method comprising:
-
obtaining a body of text;
deriving a hash value representative of the body of text, perceptually similar bodies of text having proximally similar hash values.
-
-
20. A method for facilitating recognition of content of a body of text, the method comprising:
-
obtaining a body of text;
determining a self-synchronized recognition representation of the content of such body. - View Dependent Claims (21, 22, 23, 24, 25, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 46, 47, 48, 49, 50)
-
-
26. A method for facilitating recognition of content of a body of text, the method comprising:
-
filtering the content of a body of text to select a subset of content of such body;
determining a recognition representation of the content of such body based upon the selected subtext.
-
-
45. A method for facilitating detection of textual similarity, the method comprising:
-
comparing recognition representations of text of at least two bodies of text, wherein such recognition representations are computed by;
text sifting text of the bodies of text to select a subset of text for each body;
determining such recognition representation of the text for each body based upon the selected subtext of each body;
indicating a match if recognition representations of the text of at least two of the bodies substantially match.
-
-
51. A method of manipulating content of a source body of text, the method comprising:
-
obtaining a source body of text;
generating content of a target body of text by deriving the content of the target body from the source body;
wherein the content of the target body has a self-synchronized recognition representation that does not substantially match a self-synchronized recognition representation of the content of the source body. - View Dependent Claims (52, 53)
-
-
54. A text recognition system, comprising:
-
text retriever for obtaining body of text;
text sifter for selecting a subset of text of such body;
recognition representation determiner for determining a recognition representation of the text of such body based upon the selected subtext. - View Dependent Claims (55, 56, 57, 58, 59, 60, 61, 62)
-
-
63. A computer-readable medium having stored thereon a data structure, comprising:
-
a first data field containing a body of text;
a second data field derived from the first field by text sifting the text of such body to select a subset of text of such body and determining a recognition representation of the text of such body based upon the selected subtext;
a third data field functioning to delimit the end of the data structure.
-
-
64. A computer-readable medium having computer-executable instructions that, when executed by a computer, performs the method comprising:
-
obtaining a body of text;
deriving a hash value representative of content of the body of text, perceptually distinct bodies of text having hash values that are substantially independent of each other.
-
-
65. A computer-readable medium having computer-executable instructions that, when executed by a computer, performs the method comprising:
-
obtaining a body of text;
deriving a hash value representative of the body of text, perceptually similar bodies of text having proximally similar hash values.
-
-
66. A computer-readable medium having computer-executable instructions that, when executed by a computer, performs the method comprising:
-
obtaining a body of text;
text sifting the text of such body to select a subset of text of such body;
determining a recognition representation of the text of such body based upon the selected subtext.
-
Specification