METHOD AND APPARATUS FOR DETECTING SENSITIVE CONTENT IN A DOCUMENT
First Claim
1. A computer-executed method for detecting sensitive content in a document, the method comprising:
- receiving a document;
identifying a set of terms in the document that are candidate sensitive terms;
generating a combination of terms, based on the identified terms, that is associated with a semantic meaning;
performing searches through a corpus based on the combination of terms and determining hit counts returned for each term in the combination and for the combination;
determining whether the combination of terms is sensitive based on the hit count for the combination and the hit counts for the individual terms in the combination; and
generating a result that indicates portions of the document which contain sensitive combinations.
6 Assignments
0 Petitions
Accused Products
Abstract
One embodiment of the present invention provides a system that detects sensitive content in a document. In doing so, the system receives a document, identifies a set of terms in the document that are candidate sensitive terms, and generates a combination of terms based on the identified terms that is associated with a semantic meaning. Next, the system performs searches through a corpus based on the combination of terms and determines hit counts returned for each term in the combination and for the combination. The system then determines whether the combination of terms is sensitive based on the hit count for the combination and the hit counts for the individual terms in the combination, and generates a result that indicates portions of the document which contain sensitive combinations.
24 Citations
21 Claims
-
1. A computer-executed method for detecting sensitive content in a document, the method comprising:
-
receiving a document; identifying a set of terms in the document that are candidate sensitive terms; generating a combination of terms, based on the identified terms, that is associated with a semantic meaning; performing searches through a corpus based on the combination of terms and determining hit counts returned for each term in the combination and for the combination; determining whether the combination of terms is sensitive based on the hit count for the combination and the hit counts for the individual terms in the combination; and generating a result that indicates portions of the document which contain sensitive combinations. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A computer-readable storage medium storing instructions that when executed by a computer cause the computer to perform a method for detecting sensitive content in a document, the method comprising:
-
receiving a document; identifying a set of terms in the document that are candidate sensitive terms; generating a combination of terms, based on the identified terms, that is associated with a semantic meaning; performing searches through a corpus based on the combination of terms and determining hit counts returned for each term in the combination and for the combination; determining whether the combination of terms is sensitive based on the hit count for the combination and the hit counts for the individual terms in the combination; and generating a result that indicates portions of the document which contain sensitive combinations. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. An apparatus for detecting sensitive content in a document, comprising:
-
a receiving mechanism configured to receive a document; an analysis mechanism configured to; identify a set of terms in the document that are candidate sensitive terms; and generate a combination of terms, based on the identified terms, that is associated with a semantic meaning; and a search engine interface configured to perform searches through a corpus based on the combination of terms and determining hit counts returned for each term in the combination and for the combination; wherein the analysis mechanism is further configured to; determine whether the combination of terms is sensitive based on the hit count for the combination and the hit counts for the individual terms in the combination; and generate a result that indicates portions of the document which contain sensitive combinations. - View Dependent Claims (16, 17, 18, 19, 20, 21)
-
Specification