Analyzing data files

US 6,640,219 B2
Filed: 04/20/1999
Issued: 10/28/2003
Est. Priority Date: 04/24/1998
Status: Expired due to Fees

First Claim

Patent Images

1. Apparatus arranged to receive data files from data sources and to categorize said data files to facilitate searching in response to user-requests, wherein said data files contain unspecified high value items whose characteristic rather than content may be of interest and high value to a user, said apparatus comprising:

identifying means for identifying occurrences within a received data file of unspecified candidate items in preferred contexts based on first rules likely to identify a preferred specified category, and for identifying occurrences within said received data file of said unspecified candidate items in non-preferred contexts based on second rules likely to identify a non-preferred specified category; and

processing means for processing said preferred occurrences with said non-preferred occurrences for each of said unspecified candidate items and to select one of said unspecified candidate items as a high value item whose characteristic rather than content may be of interest and high value to a user.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Data files (205) are categorised in order to facilitate the searching for information. The analysis is performed in order to identify items which may be considered as having high value without actually being directly specified. Occurrences of unspecified candidate items are identified (207) in contexts for a preferred specified category. Occurrences of unspecified candidate items are identified (209) in non-preferred contexts. The preferred occurrences are processed (211) with the non-preferred occurrences for each candidate item in order to select candidate items as being high value items. In the preferred embodiment, data relating to companies is identified without specific company names being defined.

Citations

42 Claims

1. Apparatus arranged to receive data files from data sources and to categorize said data files to facilitate searching in response to user-requests, wherein said data files contain unspecified high value items whose characteristic rather than content may be of interest and high value to a user, said apparatus comprising:
- identifying means for identifying occurrences within a received data file of unspecified candidate items in preferred contexts based on first rules likely to identify a preferred specified category, and for identifying occurrences within said received data file of said unspecified candidate items in non-preferred contexts based on second rules likely to identify a non-preferred specified category; and
  
  processing means for processing said preferred occurrences with said non-preferred occurrences for each of said unspecified candidate items and to select one of said unspecified candidate items as a high value item whose characteristic rather than content may be of interest and high value to a user.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. Apparatus according to claim 1, wherein said identifying means is configured to identify occurrences of unspecified candidate terms for a plurality of non-preferred categories.
  - 3. Apparatus according to claim 2, wherein said identifying means is configured to identify preferred categories representing companies and to identify non-preferred categories including place names and personal names.
  - 4. Apparatus according to claim 1, wherein said processing means is configured to perform a plurality of processes to remove candidates to produce a refined list of high value items.
  - 5. Apparatus according to claim 4, wherein said processing means is configured to increase score values in response to identifying occurrences and to process said score values.
  - 6. Apparatus according to claim 5, wherein said processing means is configured to increase said score values non-linearly so as to restrain said scores within a predetermined maximum value.
  - 7. Apparatus according to claim 4, wherein said processing means is configured to identify similar entries and to remove one or more of said similar entries in response to a score comparison.
  - 8. Apparatus according to claim 7, wherein similar entries represent situations in which a first entry is the same as a second entry with an extension added thereto.
  - 9. Apparatus according to claim 1, including first transmission means for continually supplying input data files from a plurality of sources.
  - 10. Apparatus according to claim 1, including second transmission means for supplying information to users in response to user requests.
  - 11. Apparatus according to claim 1, wherein said preferred contexts are defined by a first set of phrases, and said non-preferred contexts are defined by a second set of phrases, and said identifying means is configured to identify occurrences of unspecified candidate items in phrases defining preferred contexts and to identify occurrences of unspecified candidate items in phrases defining non-preferred contexts.

12. A method of analyzing data files containing representations of a natural language to identify unspecified high value items whose characteristic rather than content may be of interest and high value to a user, said method comprising:
- identifying occurrences of unspecified candidate items within a data file in preferred contexts based on first rules likely to identify a preferred specified category;
  
  identifying occurrences of unspecified candidate items within said data file in non-preferred contexts based on second rules likely to identify a non-preferred specified category;
  
  processing said preferred occurrences with said non-preferred occurrences for each one of said unspecified candidate items; and
  
  selecting one of said unspecified candidate items as a high value item whose characteristic rather than content may be of interest and high value to a user.
- View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
- - 13. A method according to claim 12, wherein occurrences of unspecified candidate terms are identified for a plurality of non-preferred categories.
  - 14. A method according to claim 13, wherein the preferred category represents companies and said non-preferred categories include place names and personal names.
  - 15. A method according to claim 12, wherein a plurality of processes are performed to remove candidates to produce a refined list of high value items.
  - 16. A method according to claim 15, wherein identified occurrences result in score values being increased and said processing steps involve the processing of said score values.
  - 17. A method according to claim 15, wherein said score values are increased non-linear so as to restrain said scores within a predetermined maximum value.
  - 18. A method according to claim 15, wherein similar entries are identified and one or more of said similar entries are removed in response to a score comparison.
  - 19. A method according to claim 18, wherein similar entries represent situations in which a first entries is the same as a second entry with an extension added thereto.
  - 20. A method according to claim 12, wherein data files are continually received from a plurality of data sources.
  - 21. A method according to claim 12, wherein information is supplied to users in response to user requests.
  - 22. A method of analyzing data files according to claim 12, wherein said contexts for a preferred category are defined by a first set of phrases, and said contexts for a non-preferred category are defined by a second set of phrases, such that occurrences of said unspecified candidate items in phrases relating to a preferred category are identified, and occurrences of said unspecified candidate items in phrases relating to a non-preferred category are identified.

23. A computer system programmed to execute stored instructions such that in response to said stored instructions said system is configured to:
- identify occurrences within a data file of unspecified candidate items in preferred contexts based on first rules likely to identify a preferred specified category;
  
  identify occurrences within said data file of unspecified candidate items in non-preferred contexts based on second rules likely to identify a non-preferred specified category;
  
  process said preferred occurrences with said non-preferred occurrences for each one of said unspecified candidate items; and
  
  select one of said unspecified candidate items as a high value item whose characteristic rather than content may be of interest and high value to a user.
- View Dependent Claims (24, 25, 26, 27, 28, 29, 30, 31)
- - 24. A computer system programmed to execute stored instructions according to claim 23, configured to identify occurrences of unspecified candidate items for a plurality of non-preferred categories.
  - 25. A computer system programmed to execute stored instructions according to claim 24, configured to identify preferred categories representing companies and to identify non-preferred categories including place names and personal names.
  - 26. A computer system programmed to execute sorted instructions according to claim 23, configured to perform a plurality of processes to remove candidates to produce a refined list of high value items.
  - 27. A computer system programmed to execute stored instructions according to claim 26, configured to increase score values in response to identifying occurrences and to process said score values.
  - 28. A computer system programmed to execute stored instructions according to claim 27, configured to increase said score values non-linearly so as to restrain said scores within a predetermined maximum value.
  - 29. A computer system programmed to execute stored instructions according to claim 26, configured to identify similar entries and to remove one or more of said similar entries in response to a score comparison.
  - 30. A computer system programmed to execute stored instructions according to claim 29, configured to continually supply input data files from transmission means.
  - 31. A computer system programmed to execute stored instructions according to claim 23, wherein said preferred contexts are defined by a first set of phrases, and said non-preferred contexts are defined by a second set of phrases, and said system is configured to identify occurrences of unspecified candidate items in phrases defining preferred contexts and to identify occurrences of unspecified candidate items in phrases defining non-preferred contexts.

32. A computer-readable medium having computer-readable instructions executable by a computer such that, when executing said instructions, the computer will perform the steps of:
- identifying occurrences within a data file of unspecified candidate items in preferred contexts based on first rules likely to identify a preferred specified category;
  
  identifying occurrences within said data file of unspecified candidate items in non-preferred contexts based on second rules likely to identify a non-preferred specified category;
  
  processing said preferred occurrences with said non-preferred occurrences for each one of said unspecified candidate items; and
  
  selecting one of said unspecified candidate items as a high value item whose characteristic rather than content may be of interest and high value to a user.
- View Dependent Claims (33, 34, 35, 36, 37, 38, 39, 40, 41, 42)
- - 33. A computer-readable medium having computer-readable instructions according to claim 32, such that when executing said instructions a computer will also perform the step of identifying occurrences of unspecified candidate terms for a plurality of non-preferred categories.
  - 34. A computer-readable medium having computer-readable instructions according to claim 33, such that when executing said instructions a computer will also perform the step of processing preferred categories representing companies and representing non-preferred categories representing place names and personal names.
  - 35. A computer-readable medium having computer-readable instructions according to claim 32, such that when executing said instructions a computer will also perform the step of performing a plurality of processes to remove candidates and to produce a refine list of high-value items.
  - 36. A computer-readable medium having computer-readable instructions according to claim 35, such that when executing said instructions a computer will also perform the step of increasing score values as a result of occurrences being identified and processing said score values.
  - 37. A computer-readable medium having computer-readable instructions according to claim 36, such that when executing said instructions a computer will also perform the step of non-linearly increasing said score values so as to restrain said scores within a predetermined maximum value.
  - 38. A computer-readable medium having computer-readable instructions according to claim 35, such that when executing said instructions a computer will also perform the step of identifying similar entries and removing one or more of said similar entries in response to a score comparison.
  - 39. A computer-readable medium having computer-readable instructions according to claim 38, such that when executing said instructions a computer will also perform the step of identifying similar entries representing situations in which a first entry is the same as a second entry with an extension added thereto.
  - 40. A computer-readable medium having computer-readable instructions according to claim 32, such that when executing said instructions a computer will also perform the step of continually receiving data files from a plurality of data sources.
  - 41. A computer-readable medium having computer-readable instructions according to claim 32, such that when executing said instructions a computer will also perform the step of supplying information to users in response to user requests.
  - 42. A computer readable medium having computer readable instructions according to claim 32, wherein said contexts for a preferred category are defined by a first set of phrases, and said contexts for a non-preferred category are defined by a second set of phrases, such that occurrences of said unspecified candidate items in phrases relating to a preferred category are identified, and occurrences of said unspecified candidate items in phrases relating to a non-preferred category are identified.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
APR Smartlogik Limited
Original Assignee
Applied Psychology Research Limited
Inventors
Hammond, Rachel, Fernandes, Llewelyn Ignazio
Primary Examiner(s)
Vu, Kim
Assistant Examiner(s)
TRUONG, CAM Y T

Application Number

US09/295,290
Publication Number

US 20020165851A1
Time in Patent Office

1,652 Days
Field of Search

707/1-5, 707/10, 707/100, 705/1-14, 706/47, 706/50
US Class Current

1/1
CPC Class Codes

G06F 16/35   Clustering; Classification

G06F 16/358   Browsing; Visualisation the...

G06F 18/00   Pattern recognition

Y10S 707/99931   Database or file accessing

Y10S 707/99932   Access augmentation or opti...

Y10S 707/99933   Query processing, i.e. sear...

Y10S 707/99934   Query formulation, input pr...

Y10S 707/99935   Query augmenting and refini...

Analyzing data files

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

42 Claims

Specification

Solutions

Use Cases

Quick Links

Analyzing data files

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

42 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links