High-accuracy confidential data detection
First Claim
Patent Images
1. A computer-implemented method comprising:
- storing, in memory, a plurality of classified data patterns for personal identifiers, the plurality of classified data patterns corresponding to variations of personal identifier formats, the personal identifiers including confidential information of a plurality of entities;
identifying a text document to be searched;
searching the text document for data expressed in a format that matches any of the plurality of classified data patterns corresponding to the variations of personal identifier formats;
finding, in the text document, one or more personal identifier candidates matching any of the plurality of classified data patterns; and
validating each of the personal identifier candidates using one or more personal identifier validators to provide accurate detection of the confidential information in the text document.
2 Assignments
0 Petitions
Accused Products
Abstract
A method and apparatus for providing accurate detection of confidential information is described. In one embodiment, the method includes searching a text document for multiple classified data patterns associated with confidential information that is represented as personal identifiers. The method further includes finding, in the text document, one or more personal identifier candidates matching any of the classified data patterns, and validating each of the personal identifier candidates using one or more personal identifier validators to provide accurate detection of the confidential information in the text document.
20 Citations
20 Claims
-
1. A computer-implemented method comprising:
-
storing, in memory, a plurality of classified data patterns for personal identifiers, the plurality of classified data patterns corresponding to variations of personal identifier formats, the personal identifiers including confidential information of a plurality of entities; identifying a text document to be searched; searching the text document for data expressed in a format that matches any of the plurality of classified data patterns corresponding to the variations of personal identifier formats; finding, in the text document, one or more personal identifier candidates matching any of the plurality of classified data patterns; and validating each of the personal identifier candidates using one or more personal identifier validators to provide accurate detection of the confidential information in the text document. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 10)
-
-
9. The method 8 wherein the data immediately preceding or following a personal identifier candidate indicates a false positive if the personal identifier candidate is immediately preceded, or immediately followed by, a number or a letter.
-
11. A system comprising:
-
a memory to store a plurality of classified data patterns for personal identifiers, the plurality of classified data patterns corresponding to variations of personal identifier formats, the personal identifiers including confidential information of a plurality of entities; a processor, coupled to the memory, to; search a text document for data expressed in a format that matches any of the plurality of classified data patterns corresponding to the variations of personal identifier formats, find, in the text document, one or more personal identifier candidates matching any of the plurality of classified data patterns, and validate each of the personal identifier candidates using one or more personal identifier validators to provide accurate detection of the confidential information in the text document. - View Dependent Claims (12, 13, 14, 15)
-
-
16. A non-transitory computer readable storage medium that provides instructions, which when executed on a processing system cause the processing system to perform a method comprising:
-
storing, in memory, a plurality of classified data patterns for personal identifiers, the plurality of classified data patterns corresponding to variations of personal identifier formats, the personal identifiers including confidential information of a plurality of entities; identifying a text document to be searched; searching the text document for data expressed in a format that matches any of the plurality of classified data patterns corresponding to the variations of personal identifier formats; finding, in the text document, one or more personal identifier candidates matching any of the plurality of classified data patterns; and validating each of the personal identifier candidates using one or more personal identifier validators to provide accurate detection of the confidential information in the text document. - View Dependent Claims (17, 18, 19, 20)
-
Specification