Method and system for assessing copyright fees based on the content being copied
First Claim
Patent Images
1. A system for assessing copyright fees based on the content being copied, comprising:
- a processor;
a scanning module operable to scan a document comprising at least one page;
a content identifying module operable to identify a content on each scanned page of the document and comprising an Optical Character Recognition (OCR) engine operable to extract a stream of text from each scanned page of the document; and
a copyright holder identifying module operable to identify a copyright holder of the identified content;
wherein the identifying a copyright holder of the identified content comprises;
processing the stream of text into contiguous text segments;
forming a separate query for each of the contiguous text segments; and
searching a copyrighted content database for matching copyrighted content based on the query;
wherein the processing the stream of text into contiguous text segments is based on textual coherence determined in accordance with linguistic analysis of the scanned text.
2 Assignments
0 Petitions
Accused Products
Abstract
Described system makes it possible to charge copy fees related to the amount of copyrighted material being copied and to provide those fees to the appropriate copyright holder. The scanned information is passed through an OCR filter that produces a stream of text, which is then passed to a full-text search service that identifies matching passages in its index. Sufficiently long passages found in the copied document that match previously indexed documents held by the service constitute copyrighted materials. In addition, the scanned image may be processed to identify instances of copyrighted images present in the scan.
-
Citations
30 Claims
-
1. A system for assessing copyright fees based on the content being copied, comprising:
-
a processor; a scanning module operable to scan a document comprising at least one page; a content identifying module operable to identify a content on each scanned page of the document and comprising an Optical Character Recognition (OCR) engine operable to extract a stream of text from each scanned page of the document; and a copyright holder identifying module operable to identify a copyright holder of the identified content; wherein the identifying a copyright holder of the identified content comprises; processing the stream of text into contiguous text segments; forming a separate query for each of the contiguous text segments; and searching a copyrighted content database for matching copyrighted content based on the query; wherein the processing the stream of text into contiguous text segments is based on textual coherence determined in accordance with linguistic analysis of the scanned text. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A method for assessing copyright fees based on the content being copied, comprising:
-
a. scanning a document comprising at least one page; b. identifying a content on each scanned page of the document by performing an Optical Character Recognition (OCR) to extract a stream of text from each scanned page of the document; and c. utilizing a processor to execute a process for identifying a copyright holder of the identified content; wherein the process for identifying a copyright holder of the identified content comprises; processing the stream of text into contiguous text segments; forming a separate query for each of the contiguous text segments; and searching a copyrighted content database for matching copyrighted content based on the query; wherein the processing the stream of text into contiguous text segments is based on textual coherence determined in accordance with linguistic analysis of the scanned text. - View Dependent Claims (17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29)
-
-
30. A computer programming product embodied on a non-transitory computer readable medium for assessing copyright fees based on the content being copied, comprising:
-
a. Code for scanning a document comprising at least one page; b. Code for identifying a content on each scanned page of the document by performing an Optical Character Recognition (OCR) to extract a stream of text from each scanned page of the document; and c. Code for identifying a copyright holder of the identified content; wherein the identifying a copyright holder of the identified content comprises; processing the stream of text into contiguous text segments; forming a separate query for each of the contiguous text segments; and searching a copyrighted content database for matching copyrighted content based on the query; wherein the processing the stream of text into contiguous text segments is based on textual coherence determined in accordance with linguistic analysis of the scanned text.
-
Specification