Neural network for classifying speech and textural data based on agglomerates in a taxonomy table
First Claim
1. A language and text analysis apparatus for forming a search and classification catalog, the apparatus having at least one linguistic databank for accessing linguistic terms with data records so as to perform at least one of classifying and sorting at least one of language and text data corresponding to the data records, the linguistic terms having at least one of keywords and search terms, and the linguistic databank further having links between words and linked terms of similar meaning so that the links are associated with synonym groups in a taxonomy table, the apparatus comprising:
- a weighting module for weighting of table elements in the taxonomy table on a basis of frequency of occurrence of individual links in the linguistic databank,an integration module configured to generate a multidimensional, weighted n-dimensional content matrix in an n-dimensional content space on a basis of agglomerates of elements in the taxonomy table, and configured to choose and project axes of the n-dimensional content matrix based on a relevancy of a total hit frequency of words and linked terms of all the data records for the at least one of the language and text data so as to optimally characterize the data records with the axes, anda neural network module for at least one of classification and sorting of at least one of the language and text data on a basis of the content matrix, by using definable descriptors for the language and text analysis apparatus to determine appropriate constraints for one or more subject groups.
3 Assignments
0 Petitions
Accused Products
Abstract
A speech and textual analysis device and method for forming a search and/or classification catalog. The device is based on a linguistic database and includes a taxonomy table containing variable taxon nodes. The speech and textual analysis device includes a weighting module, a weighting parameter being additionally assigned to each stored taxon node to register recurrence frequency of terms in the linguistic and/or textual data that is to be classified and/or sorted. The speech and/or textual analysis device includes an integration module for determining a predefinable number of agglomerates based on the weighting parameters of the taxon nodes in the taxonomy table and at least one neuronal network module for classifying and/or sorting the speech and/or textual data based on the agglomerates in the taxonomy table.
15 Citations
19 Claims
-
1. A language and text analysis apparatus for forming a search and classification catalog, the apparatus having at least one linguistic databank for accessing linguistic terms with data records so as to perform at least one of classifying and sorting at least one of language and text data corresponding to the data records, the linguistic terms having at least one of keywords and search terms, and the linguistic databank further having links between words and linked terms of similar meaning so that the links are associated with synonym groups in a taxonomy table, the apparatus comprising:
-
a weighting module for weighting of table elements in the taxonomy table on a basis of frequency of occurrence of individual links in the linguistic databank, an integration module configured to generate a multidimensional, weighted n-dimensional content matrix in an n-dimensional content space on a basis of agglomerates of elements in the taxonomy table, and configured to choose and project axes of the n-dimensional content matrix based on a relevancy of a total hit frequency of words and linked terms of all the data records for the at least one of the language and text data so as to optimally characterize the data records with the axes, and a neural network module for at least one of classification and sorting of at least one of the language and text data on a basis of the content matrix, by using definable descriptors for the language and text analysis apparatus to determine appropriate constraints for one or more subject groups. - View Dependent Claims (2)
-
-
3. A language and text analysis apparatus for formation of a search and classification catalog, the apparatus having at least one linguistic databank for association of linguistic terms with data records, so that the language and text analysis apparatus is configured to perform at least one of classifying and sorting at least one of language and text data corresponding to the data records, the linguistic terms including at least one of keywords and search terms, the apparatus comprising:
-
a taxonomy table with variable taxon nodes on a basis of the linguistic databank, so that one or more data records can be associated with one taxon node in the taxonomy table, and each data record includes a variable significance factor for weighting of terms on a basis of at least one of filling words, linking words, and keywords, a weighting module, in which a weighting parameter for recording of frequencies of occurrence of terms within the at least one of language and text data to be at least one of sorted and classified is additionally stored associated with each taxon node, an integration module for determination of agglomerates on a basis of the weighting parameters of the taxon nodes in the taxonomy table, with one agglomerate including at least one taxon node, the agglomerates associated with an n-dimensional content matrix in an n-dimensional content space, and for choosing and projecting axes of the n-dimensional content matrix based on a relevancy of a total hit frequency of words and linked terms of all the data records for the at least one of the language and text data so as to optimally characterize the data records with the axes, and a neural network module configured to perform at least one of classification and sorting of at least one of the language and the text data on a basis of the agglomerates in the taxonomy table. - View Dependent Claims (4, 5, 6, 7, 8, 9, 10)
-
-
11. An automated language and text analysis method for forming a search and classification catalog, with a linguistic databank being used to record data records and to perform at least one of classifying and sorting at least one of language and text data on a basis of the data records, the method comprising the steps of:
-
associating the data records that are stored in the linguistic databank with a taxon node in a taxonomy table, with each data record including a variable significance factor for weighting of terms based one at least one of filling words, linking words, and keywords, recording at least one of the language and text data on a basis of the taxonomy table, with frequency of individual data records in the at least one of the language and text data being determined by a weighting module and being associated with a weighting parameter for the taxon node, determining a determinable number of agglomerates by an integration module in the taxonomy table on a basis of the weighting parameters of one or more taxon nodes, the agglomerates associated with an n-dimensional content matrix in an n-dimensional content space, choosing and projecting axes of the n-dimensional content matrix based on a relevancy of a total hit frequency of words and linked terms of all the data records for the at least one of the language and text data so as to optimally characterize the data records with the axes, and using a neural network module to perform at least one of classifying and sorting at least one of the language and text data on a basis of the agglomerates in the taxonomy table. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18)
-
-
19. A non-transitory computer-readable medium with computer program code recorded thereon, the computer program code configured to control one or more processors in a computer-based system to perform a method for automated language and text analysis by formation of a search and/or classification catalog, with data records being recorded on the basis of a linguistic databank, and with language and/or text data being classified and/or sorted on the basis of the data records, the method comprising the steps of:
-
storing the data records in the linguistic databank associated with a taxon node in a taxonomy table, with each data record including a variable significance factor for weighting of terms on the basis of at least one of filling words, linking words, and keywords, recording at least one of the language and text data on the basis of the taxonomy table, with frequency of individual data records in the at least one of language and text data determining a weighting parameter for the taxon node, determining a determinable number of agglomerates in the taxonomy table on the basis of the weighting parameter of one or more taxon nodes, the agglomerates associated with an n-dimensional content matrix in an n-dimensional content space, choosing and projecting axes of the n-dimensional content matrix based on a relevancy of a total hit frequency of words and linked terms of all the data records for the at least one of the language and text data so as to optimally characterize the data records with the axes, and generating a neural network, which performs at least one of classifying and sorting the at least one of language and text data on the basis of the agglomerates in the at least one of the taxonomy table, the language, and text data.
-
Specification