Method and apparatus for forensic analysis of information stored in computer-readable media
First Claim
1. A computer-implemented method of automatically extracting from a large ambient data file containing a mixture of textual data and binary data information that has a relatively high probability of corresponding to Internet-related activity of interest to an investigator, the method comprising:
- providing an ambient data file including ambient data from one or more of the following sources, alone or in combination;
unallocated storage space, file slack space at the end of one or more computer files, a windows swap, and one or more temporary system files;
reading a portion of the ambient data file into random access memory;
searching the portion of the ambient data to determine the presence of a first character or character group within a pre-specified proximity to a second character or character group to locate Internet-related identifiers of interest to the investigator;
if internet-related identifiers are located, copying the internet-related identifiers to an output file, thereby providing an output file for the investigator to review that excludes non-textual data, is greatly reduced in size from the original ambient data file; and
that includes most or all of the internet-related information of interest to an investigator in the ambient data file.
9 Assignments
0 Petitions
Accused Products
Abstract
Ambient data is data created or retained as an artifact of a computer system, rather than by an intention of the user. Ambient data typically includes both textual and binary, i.e., non-textual, data. Ambient data can include information of which the user is unaware, and an investigator can review ambient data to learn about the Internet-related activity performed on the computer. Most of the information in the ambient data is not useful, and the large amount of ambient data on a typical computer system can require significant time to review. The invention locates useful internet-related information in the ambient data and outputs the information in a useful database format, excluding non-textual data and text that is unrelated to Internet activity. The system locates internet-related information of interest using proximity rules, that is, the system writes output when only when certain characters appear in the ambient data within a specified proximity to other characters. The characters can include including symbols, abbreviations, or words, specified either individually or on a pre-compiled list. Exclusionary rules can also eliminate firewall aliases, internet identifiers that are less useful to an investigator. By applying such rules, an output file including only textual data representing useful Internet addresses and URL is presented to an investigator.
-
Citations
22 Claims
-
1. A computer-implemented method of automatically extracting from a large ambient data file containing a mixture of textual data and binary data information that has a relatively high probability of corresponding to Internet-related activity of interest to an investigator, the method comprising:
-
providing an ambient data file including ambient data from one or more of the following sources, alone or in combination;
unallocated storage space, file slack space at the end of one or more computer files, a windows swap, and one or more temporary system files;
reading a portion of the ambient data file into random access memory;
searching the portion of the ambient data to determine the presence of a first character or character group within a pre-specified proximity to a second character or character group to locate Internet-related identifiers of interest to the investigator;
if internet-related identifiers are located, copying the internet-related identifiers to an output file, thereby providing an output file for the investigator to review that excludes non-textual data, is greatly reduced in size from the original ambient data file; and
that includes most or all of the internet-related information of interest to an investigator in the ambient data file.- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A computer-implemented method of reducing the amount of data requiring review by an investigator from a large ambient computer data file containing a mixture of textual data and binary data by eliminating non-textual data unrelated to Internet activity in a subject area of interest to the investigator, the method comprising:
-
searching the ambient data to locate a first character or group of characters;
searching the ambient data to locate a second character or group of character within a specified proximity to the first group of characters, the presence and proximity of the characters or character groups being indicative of Internet-related activity of interest to an investigator;
if the second character or group of characters is located within the specified proximity of the first character or group of characters, writing a portion of the ambient data including the first and second characters or groups of characters to an output file, thereby providing an output file for the investigator to review that excludes non-textual data, is greatly reduced in size from the original ambient data file, and that includes most or all of the internet-related information in the ambient data file and of interest to the investigator. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18)
-
-
19. An apparatus for extracting from a large ambient data file containing a mixture of textual data and binary data information that has a relatively high probability of corresponding to an Internet-related activity of interest to an investigator, the method comprising:
-
data storage means including an ambient data file including ambient data from one or more of the following sources;
unallocated storage space, file slack space at the end of one or more computer files, a windows swap, and a temporary system file;
random access memory for sequentially storing portions of the ambient data file;
means for searching the ambient data to locate a first character or group of characters;
means for searching the ambient data to locate a second character or group of characters within a specified proximity of the first group of characters, the presence and proximity of the characters being indicative of Internet-related activity of interest to an investigator;
means for copying information corresponding to the Internet related activity to an output file, thereby providing an output file for the investigator to review that is greatly reduced in size from the original ambient data file, that excludes non-textual data, and that includes most or all of the Internet-related information of interest to an investigator in the ambient data file. - View Dependent Claims (20, 21, 22)
-
Specification