LATENT METONYMICAL ANALYSIS AND INDEXING (LMAI)
0 Assignments
0 Petitions
Accused Products
Abstract
The present invention relates to Latent Metonymical analysis and Indexing (LMai) is a novel concept for Advance Machine Learning or Unsupervised Machine Learning Techniques, which uses a statistical approach to identify the relationship between the words in a set of given documents (Unstructured Data). This approach does not necessarily need training data to make decisions on matching the related words together but actually has the ability to do the classification by itself. All that is needed is to give the algorithm a set of natural documents. The method is elegant enough to classify the relationships automatically without any human guidance during the process as shown in FIGS. 6 and 7.
140 Citations
93 Claims
-
1-72. -72. (canceled)
-
73. A method for advance and/or unsupervised machine learning by Latent Metonymical Analysis and Indexing (LMai), said method comprising steps of:
-
a. inputting natural documents; b. eliminating special characters to count number of words within the given document, filtering the contents based on the predefined stop-words and calculating the fraction of the stop-words present in the document; c. determining Significant Single Value Term data set and Significant Multi Value Term data set from the document being processed; d. decomposing the words in Significant Single Value Term data set and Significant Multi Value Term data set to extract the Keywords of the document being processed; e. optionally, determining KeyTerms and their respective hand-in-hand (HiH) words automatically for further decomposition; f. identifying Topic in an unsupervised manner based not just on File Name but also by manipulating/comparing with various combinations of document attributes that are extracted to identify Best Topic candidates and thereafter defining an appropriate Topic based on predefined rules; and g. analyzing relationship between the Topics and the Keywords and thereafter indexing the Topics and their related Keywords, KeyTerms and their respective hand-in-hand terms into Metonymy cluster and KeyTerms HiH cluster respectively. - View Dependent Claims (74, 75, 76, 77, 78, 79, 80)
-
-
81. A decomposition method to extract Keywords and KeyTerms from the documents, said method comprising steps of:
-
a. inputting natural documents; b. checking the document being processed to identify the prerequisite minimal size of data and/or word articles/words; c. storing the data or words in the document in a sequential order as per their occurrence in the document; d. creating two identical instances of the data to facilitate the identification of Significant Single Value Term data set and Significant Multi Value Term data set; e. determining Significant Single Value Term from one of the instance of the data set and Significant Multi Value Term from the other instance of the data set starting from the highest hand-in-hand range predefined, followed by consecutive hand-in-hand range terms of lesser dimension; f. storing the identified Significant Single Value Term and Significant Multi Value Term of different hand-in-hand range in their respective data sets; g. comparing words in Significant Multi Value Term data sets with the words in Significant Single Value Term data set to extract those words in the respective hand-in-hand range of each Significant Multi Value Term data set as Best-Terms, which have at least one instance of Single Value Terms within their range and the rest of the hand-in-hand terms are decomposed; and h. comparing the data sets in such way that every individual hand-in-hand range term that has at least one instance of any term in Significant Single Value Term data set is extracted as a Keyword and the rest are decomposed to determine the KeyTerms. - View Dependent Claims (82, 83, 84, 85, 86, 87)
-
-
88. A method of defining an appropriate Topic to the document based on the document content comprises steps of:
-
a. cleaning up the document'"'"'s File Name to remove the file dot (.) extension and any alphanumeric characters; b. extracting the first few predefined number of words from the beginning of the document as the Document Header; c. comparing each word in the File Name and each word in the Document Header with every word in Significant Single Value Terms data set to extract the words that match into two separate data sets; d. comparing each word in the Document Header with every word in File Name to extract the words that match into a separate data set; e. Transferring the data from the said individual data sets achieved in steps c and d into another data set;
thereafter processing the data/words to determine frequency of each word occurrence;f. comparing every word in the Significant Multi Value Term data sets of a predefined range with the File Name to extract the hand-in-hand words that match in the separate data set; g. comparing every word in the Significant Multi Value Term data set of a predefined range with the Document Header to extract the hand-in-hand words that match in the separate data set; h. transferring the data from the individual data sets achieved in steps f and g into another separate data set;
thereafter processing the data/words to determine frequency of each word occurrence;i. comparison of the data set achieved in step e, which consists of words of type Single Value Term and the data set achieved in step h, which consists of words of type Multi Value Term to extract those hand-in-hand words as Best Topic candidates that have at least one instance of any of the words of type Single Value Term; and j. defining an appropriate Topic based on predefined rules. - View Dependent Claims (89, 90)
-
-
91. A system for automatically identifying Keywords, KeyTerms and Topics from a set of documents and thereafter automatically identifying the metonymical/related words by Latent Metonymical Analysis and Indexing (LMai), said system comprising:
-
a. document input module for providing unstructured data; b. analyzer to identify similar words having singular and plural form and to convert the words into one of the form; c. means for decomposing the words in Significant Single Value Term data set and Significant Multi Value Term data set to extract the Keywords of the document being processed; d. means for analyzing relationship between the Topics and the Keywords and thereafter indexing the Topics and their related Keywords, KeyTerms and their respective hand-in-hand terms into Metonymy cluster and KeyTerms HiH cluster respectively; e. an indexing module for indexing/clustering Topics and their related words, and also KeyTerm and their HiH terms; f. retrieval engine to Analyze the Topic'"'"'s of each document during retrieval process to identify the Topic'"'"'s that are related to each other based on a predefined threshold limit to retrieve the context based results from the index/cluster; and g. display system to display a. link to take the user to content page; and b. Topic and significant Keywords extracted by the method to understand the content within the link without having to visit result page. - View Dependent Claims (92, 93)
-
Specification