Neural network for classifying speech and textural data based on agglomerates in a taxonomy table

US 8,428,935 B2
Filed: 08/09/2005
Issued: 04/23/2013
Est. Priority Date: 08/13/2004
Status: Active Grant

First Claim

Patent Images

1. A language and text analysis apparatus for forming a search and classification catalog, the apparatus having at least one linguistic databank for accessing linguistic terms with data records so as to perform at least one of classifying and sorting at least one of language and text data corresponding to the data records, the linguistic terms having at least one of keywords and search terms, and the linguistic databank further having links between words and linked terms of similar meaning so that the links are associated with synonym groups in a taxonomy table, the apparatus comprising:

a weighting module for weighting of table elements in the taxonomy table on a basis of frequency of occurrence of individual links in the linguistic databank,an integration module configured to generate a multidimensional, weighted n-dimensional content matrix in an n-dimensional content space on a basis of agglomerates of elements in the taxonomy table, and configured to choose and project axes of the n-dimensional content matrix based on a relevancy of a total hit frequency of words and linked terms of all the data records for the at least one of the language and text data so as to optimally characterize the data records with the axes, anda neural network module for at least one of classification and sorting of at least one of the language and text data on a basis of the content matrix, by using definable descriptors for the language and text analysis apparatus to determine appropriate constraints for one or more subject groups.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech and textual analysis device and method for forming a search and/or classification catalog. The device is based on a linguistic database and includes a taxonomy table containing variable taxon nodes. The speech and textual analysis device includes a weighting module, a weighting parameter being additionally assigned to each stored taxon node to register recurrence frequency of terms in the linguistic and/or textual data that is to be classified and/or sorted. The speech and/or textual analysis device includes an integration module for determining a predefinable number of agglomerates based on the weighting parameters of the taxon nodes in the taxonomy table and at least one neuronal network module for classifying and/or sorting the speech and/or textual data based on the agglomerates in the taxonomy table.

15 Citations

View as Search Results

19 Claims

1. A language and text analysis apparatus for forming a search and classification catalog, the apparatus having at least one linguistic databank for accessing linguistic terms with data records so as to perform at least one of classifying and sorting at least one of language and text data corresponding to the data records, the linguistic terms having at least one of keywords and search terms, and the linguistic databank further having links between words and linked terms of similar meaning so that the links are associated with synonym groups in a taxonomy table, the apparatus comprising:
- a weighting module for weighting of table elements in the taxonomy table on a basis of frequency of occurrence of individual links in the linguistic databank,an integration module configured to generate a multidimensional, weighted n-dimensional content matrix in an n-dimensional content space on a basis of agglomerates of elements in the taxonomy table, and configured to choose and project axes of the n-dimensional content matrix based on a relevancy of a total hit frequency of words and linked terms of all the data records for the at least one of the language and text data so as to optimally characterize the data records with the axes, anda neural network module for at least one of classification and sorting of at least one of the language and text data on a basis of the content matrix, by using definable descriptors for the language and text analysis apparatus to determine appropriate constraints for one or more subject groups.
- View Dependent Claims (2)
- - 2. The language and text analysis apparatus as claimed in claim 1, wherein the links in the linguistic databank are defined over more than one language.

3. A language and text analysis apparatus for formation of a search and classification catalog, the apparatus having at least one linguistic databank for association of linguistic terms with data records, so that the language and text analysis apparatus is configured to perform at least one of classifying and sorting at least one of language and text data corresponding to the data records, the linguistic terms including at least one of keywords and search terms, the apparatus comprising:
- a taxonomy table with variable taxon nodes on a basis of the linguistic databank, so that one or more data records can be associated with one taxon node in the taxonomy table, and each data record includes a variable significance factor for weighting of terms on a basis of at least one of filling words, linking words, and keywords,a weighting module, in which a weighting parameter for recording of frequencies of occurrence of terms within the at least one of language and text data to be at least one of sorted and classified is additionally stored associated with each taxon node,an integration module for determination of agglomerates on a basis of the weighting parameters of the taxon nodes in the taxonomy table, with one agglomerate including at least one taxon node, the agglomerates associated with an n-dimensional content matrix in an n-dimensional content space, and for choosing and projecting axes of the n-dimensional content matrix based on a relevancy of a total hit frequency of words and linked terms of all the data records for the at least one of the language and text data so as to optimally characterize the data records with the axes, anda neural network module configured to perform at least one of classification and sorting of at least one of the language and the text data on a basis of the agglomerates in the taxonomy table.
- View Dependent Claims (4, 5, 6, 7, 8, 9, 10)
- - 4. The language and text analysis apparatus as claimed in claim 3, wherein the neural network module includes a self-organizing Kohonen map.
  - 5. The language and text analysis apparatus as claimed in claim 3, further comprising:
    - an entropy module for determining an entropy parameter, stored in a memory module, on a basis of distribution of a data record in the at least one of the language or the text data.
  - 6. The language and text analysis apparatus as claimed in claim 3, wherein the linguistic databank includes multilingual data records.
  - 7. The language and text analysis apparatus as claimed in claim 3, further comprising:
    - a hash table that is associated with the linguistic databank so that a hash value is used to identify linguistically linked data records in the hash table.
  - 8. The language and text analysis apparatus as claimed in claim 3, wherein a language parameter is used to associate the data records with a language and is identified as a synonym in the taxonomy table.
  - 9. The language and text analysis apparatus as claimed in claim 3, further comprising:
    - descriptors by which constraints that correspond to definable descriptors are determined for a subject group.
  - 10. The language and text analysis apparatus as claimed in claim 3, wherein the databank includes a universal, subject-independent, linguistic databank, and the taxonomy table is produced universally and independently of subject.

11. An automated language and text analysis method for forming a search and classification catalog, with a linguistic databank being used to record data records and to perform at least one of classifying and sorting at least one of language and text data on a basis of the data records, the method comprising the steps of:
- associating the data records that are stored in the linguistic databank with a taxon node in a taxonomy table, with each data record including a variable significance factor for weighting of terms based one at least one of filling words, linking words, and keywords,recording at least one of the language and text data on a basis of the taxonomy table, with frequency of individual data records in the at least one of the language and text data being determined by a weighting module and being associated with a weighting parameter for the taxon node,determining a determinable number of agglomerates by an integration module in the taxonomy table on a basis of the weighting parameters of one or more taxon nodes, the agglomerates associated with an n-dimensional content matrix in an n-dimensional content space,choosing and projecting axes of the n-dimensional content matrix based on a relevancy of a total hit frequency of words and linked terms of all the data records for the at least one of the language and text data so as to optimally characterize the data records with the axes, andusing a neural network module to perform at least one of classifying and sorting at least one of the language and text data on a basis of the agglomerates in the taxonomy table.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18)
- - 12. The automated language and text analysis method as claimed in claim 11, wherein the neural network module includes a self-organizing Kohonen map.
  - 13. The automated language and text analysis method as claimed in claim 11, further comprising a step of:
    - using an entropy module to determine an entropy factor on a basis of distribution of a data record in the at least one of language and text data.
  - 14. The automated language and text analysis method as claimed in claim 11, wherein the linguistic databank includes multilingual data records.
  - 15. The automated language and text analysis method as claimed in claim 11, wherein a hash table is stored associated with the linguistic databank, with the hash table including an identification of linked data records by a hash value.
  - 16. The automated language and text analysis method as claimed in claim 11, further comprising a step of:
    - associating the data records with a language and that is weighted synonymously in the taxonomy table by a language parameter.
  - 17. The automated language and text analysis method as claimed in claim 11, further comprising a step of:
    - using definable descriptors to determine corresponding constraints for a subject group.
  - 18. The language and text analysis method as claimed in claim 11, further comprising a step of:
    - producing the taxon nodes in the taxonomy table on a basis of a universal, subject-independent, linguistic databank, with the databank including at least the universal, subject-independent, linguistic databank.

19. A non-transitory computer-readable medium with computer program code recorded thereon, the computer program code configured to control one or more processors in a computer-based system to perform a method for automated language and text analysis by formation of a search and/or classification catalog, with data records being recorded on the basis of a linguistic databank, and with language and/or text data being classified and/or sorted on the basis of the data records, the method comprising the steps of:
- storing the data records in the linguistic databank associated with a taxon node in a taxonomy table, with each data record including a variable significance factor for weighting of terms on the basis of at least one of filling words, linking words, and keywords,recording at least one of the language and text data on the basis of the taxonomy table, with frequency of individual data records in the at least one of language and text data determining a weighting parameter for the taxon node,determining a determinable number of agglomerates in the taxonomy table on the basis of the weighting parameter of one or more taxon nodes, the agglomerates associated with an n-dimensional content matrix in an n-dimensional content space,choosing and projecting axes of the n-dimensional content matrix based on a relevancy of a total hit frequency of words and linked terms of all the data records for the at least one of the language and text data so as to optimally characterize the data records with the axes, andgenerating a neural network, which performs at least one of classifying and sorting the at least one of language and text data on the basis of the agglomerates in the at least one of the taxonomy table, the language, and text data.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Infocodex AG
Original Assignee
Infocodex AG
Inventors
Cuypers, Frank, Waelti, Christoph P., Waelti, Paul, Trugenberger, Carlo A.
Primary Examiner(s)
PULLIAS, JESSE SCOTT

Application Number

US11/659,962
Publication Number

US 20070282598A1
Time in Patent Office

2,814 Days
Field of Search

None
US Class Current

704/10
CPC Class Codes

G06F 16/355   Class or cluster creation o...

G06F 40/247   Thesauruses; Synonyms

G06F 40/30   Semantic analysis

G06N 3/088   Non-supervised learning, e....

Neural network for classifying speech and textural data based on agglomerates in a taxonomy table

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

15 Citations

19 Claims

Specification

Solutions

Use Cases

Quick Links

Neural network for classifying speech and textural data based on agglomerates in a taxonomy table

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

15 Citations

19 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links