Method and system for generating analogous fictional data from non-fictional data

US 7,958,162 B2
Filed: 08/28/2009
Issued: 06/07/2011
Est. Priority Date: 06/06/2008
Status: Expired due to Fees

First Claim

Patent Images

1. A computer-implemented method for generating a second data set from a first data set comprising:

determining an occurrence value for each element of said first data set pertaining to the quantity of occurrences of that element within said first data set;

receiving a similarity value configured by a user indicating a desired similarity between said first and second data sets; and

generating said second data set to include elements with said occurrence values for said second data set based on said occurrence values of elements of said first data set and said similarity value, wherein said generating said second data set further includes;

examining said occurrence value of an element in said first data set;

retrieving another element of said first data set with an occurrence value within a range from said occurrence value of said examined element in said first data set, wherein said range is indicated by said similarity value to provide said desired similarity; and

substituting said examined element with said retrieved element by placing said retrieved element at a location within said second data set that corresponds to a location of said examined element in said first data set.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and system for generating analogous fictional data from non-fictional data, is provided. One implementation involves recording non-fictional data, scoring the non-fictional data in terms of occurrence percentile, obtaining a set of user-configurations that represents a likeness range between non-fictional data and corresponding fictional data, based on the scores and the user-configurations, generating analogous fictional data from the non-fictional data, and comparing hash values for the fictional data with hash values for the non-fictional data to determine matches, and in case of matches, generating analogous fictional data from the non-fictional data based on the scores and incrementally lowered likeness range, whereby entire records of fictional data are generated based on entire records of non-fictional data, wherein the fictional data is consistent with the non-fictional data.

26 Citations

View as Search Results

21 Claims

1. A computer-implemented method for generating a second data set from a first data set comprising:
- determining an occurrence value for each element of said first data set pertaining to the quantity of occurrences of that element within said first data set;
  
  receiving a similarity value configured by a user indicating a desired similarity between said first and second data sets; and
  
  generating said second data set to include elements with said occurrence values for said second data set based on said occurrence values of elements of said first data set and said similarity value, wherein said generating said second data set further includes;
  
  examining said occurrence value of an element in said first data set;
  
  retrieving another element of said first data set with an occurrence value within a range from said occurrence value of said examined element in said first data set, wherein said range is indicated by said similarity value to provide said desired similarity; and
  
  substituting said examined element with said retrieved element by placing said retrieved element at a location within said second data set that corresponds to a location of said examined element in said first data set.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, wherein said first data set includes non-fictional data and said second data set includes fictional data.
  - 3. The method of claim 2, wherein entire records of fictional data are generated based on entire records of non-fictional data, and wherein said fictional data is consistent with said non-fictional data.
  - 4. The method of claim 1, further including:
    - comparing elements of said first and second data sets; and
      
      generating new elements for said second data set in response to matching elements with said first data set by adjusting said similarity value for decreased similarity and replacing each matching element in said second data set with another element of said first data set with an occurrence value based on said adjusted similarity value.
  - 5. The method of claim 4, wherein said comparing elements of said first and second data sets further includes:
    - determining hash values for said first and second data sets; and
      
      comparing said hash values of said first and second data sets to determine said matching elements.
  - 6. The method of claim 5, wherein said first and second data sets are stored in respective first and second databases with identical schemas of rows and columns, and said determining said hash values further includes:
    - determining said hash values for each row and column value of said first and second databases; and
      
      selectively determining a hash value from two or more column values selected by a user to produce a combined hash value for said comparison.
  - 7. The method of claim 1, wherein said first and second data sets are stored in identical schemas.

8. A computer system for generating a second data set from a first data set comprising:
- at least one storage system for storing a computer program; and
  
  at least one processor for processing said computer program to;
  
  determine an occurrence value for each element of said first data set pertaining to the quantity of occurrences of that element within said first data set;
  
  receive a similarity value configured by a user indicating a desired similarity between said first and second data sets; and
  
  generate said second data set to include elements with said occurrence values for said second data set based on said occurrence values of elements of said first data set and said similarity value, wherein said generating said second data set further includes;
  
  examining said occurrence value of an element in said first data set;
  
  retrieving another element of said first data set with an occurrence value within a range from said occurrence value of said examined element in said first data set, wherein said range is indicated by said similarity value to provide said desired similarity; and
  
  substituting said examined element with said retrieved element by placing said retrieved element at a location within said second data set that corresponds to a location of said examined element in said first data set.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The computer system of claim 8, wherein said first data set includes non-fictional data and said second data set includes fictional data.
  - 10. The computer system of claim 9, wherein entire records of fictional data are generated based on entire records of non-fictional data, and wherein said fictional data is consistent with said non-fictional data.
  - 11. The computer system of claim 8, wherein said processor further processes said computer program to:
    - compare elements of said first and second data sets; and
      
      generate new elements for said second data set in response to matching elements with said first data set by adjusting said similarity value for decreased similarity and replacing each matching element in said second data set with another element of said first data set with an occurrence value based on said adjusted similarity value.
  - 12. The computer system of claim 11, wherein comparison of elements of said first and second data sets further includes:
    - determining hash values for said first and second data sets; and
      
      comparing said hash values of said first and second data sets to determine said matching elements.
  - 13. The computer system of claim 12, wherein said first and second data sets are stored in respective first and second databases with identical schemas of rows and columns, and said determining said hash values further includes:
    - determining said hash values for each row and column value of said first and second databases; and
      
      selectively determining a hash value from two or more column values selected by a user to produce a combined hash value for said comparison.
  - 14. The computer system of claim 8, wherein said first and second data sets are stored in identical schemas.

15. A computer program product for generating a second data set from a first data set, the computer program product comprising:
- a computer useable storage medium having computer readable program code embodied therewith, the computer readable program code comprising computer readable program code to;
  
  determine an occurrence value for each element of said first data set pertaining to the quantity of occurrences of that element within said first data set;
  
  receive a similarity value configured by a user indicating a desired similarity between said first and second data sets; and
  
  generate said second data set to include elements with said occurrence values for said second data set based on said occurrence values of elements of said first data set and said similarity value, wherein said generating said second data set further includes;
  
  examining said occurrence value of an element in said first data set;
  
  retrieving another element of said first data set with an occurrence value within a range from said occurrence value of said examined element in said first data set, wherein said range is indicated by said similarity value to provide said desired similarity; and
  
  substituting said examined element with said retrieved element by placing said retrieved element at a location within said second data set that corresponds to a location of said examined element in said first data set.
- View Dependent Claims (16, 17, 18, 19, 20, 21)
- - 16. The computer program product of claim 15, wherein said first data set includes non-fictional data and said second data set includes fictional data.
  - 17. The computer program product of claim 16, wherein entire records of fictional data are generated based on entire records of non-fictional data, and wherein said fictional data is consistent with said non-fictional data.
  - 18. The computer program product of claim 15, wherein said computer readable program code further includes computer readable program code to:
    - compare elements of said first and second data sets; and
      
      generate new elements for said second data set in response to matching elements with said first data set by adjusting said similarity value for decreased similarity and replacing each matching element in said second data set with another element of said first data set with an occurrence value based on said adjusted similarity value.
  - 19. The computer program product of claim 18, wherein comparison of elements of said first and second data sets further includes:
    - determining hash values for said first and second data sets; and
      
      comparing said hash values of said first and second data sets to determine said matching elements.
  - 20. The computer program product of claim 19, wherein said first and second data sets are stored in respective first and second databases with identical schemas of rows and columns, and said determining said hash values further includes:
    - determining said hash values for each row and column value of said first and second databases; and
      
      selectively determining a hash value from two or more column values selected by a user to produce a combined hash value for said comparison.
  - 21. The computer program product of claim 15, wherein said first and second data sets are stored in identical schemas.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Basile, Ryan M.
Primary Examiner(s)
TIMBLIN, ROBERT M

Application Number

US12/549,470
Publication Number

US 20090319520A1
Time in Patent Office

648 Days
Field of Search

707/687, 707/756, 707/803, 707/804, 707/809, 707/776, 707/796, 707/999.101, 705/74
US Class Current

707/804
CPC Class Codes

G06F 16/258   Data format conversion from...

G06F 16/9014   hash tables

G06Q 20/383   Anonymous user system

Method and system for generating analogous fictional data from non-fictional data

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

26 Citations

21 Claims

Specification

Use Cases

Quick Links

Others

Method and system for generating analogous fictional data from non-fictional data

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

26 Citations

21 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others