System and method for rectifying a typographical error in a text file
First Claim
1. A method for rectifying a typographical error in a text file, the method comprising:
- generating a linguistic network of a plurality of words present in the text file, wherein each pair of words of the plurality of words in the linguistic network is interconnected via an edge;
computing similarity between each pair of words based on a set of parameters associated with words of each pair, wherein the set of parameters comprises distance between the words, phonetic similarity between the words, consonant skeleton distance, presence of the words in lexicon, morphological root form of the words, frequency of the words in the text file, probability of occurrence of the words in the context, domain similarity of the words, and a flag associated to similar starting character of the words;
assigning a weight to the edge between the each pair of words based on the similarity computed from the set of parameters;
categorizing one or more words present in the linguistic network in a category, wherein the one or more words are categorized based on the weight assigned to each edge connecting the each pair of words of the one or more words;
identifying a reference word for each word which is deemed to have the typographical error from the category; and
substituting each word of the category, having the typographical error, with the reference word corresponding to the word having typographical error in the text file for rectifying the typographical error, wherein the generating, the computing, the assigning, the categorizing, the identifying and the substituting are performed by a processor using programmed instructions stored in a memory.
2 Assignments
0 Petitions
Accused Products
Abstract
Disclosed is a system for rectifying a typographical error in a text file. The system includes a network generating module for generating a linguistic network of a plurality of words present in the text file. A computation module configured to compute the similarity between each pair of words based on a set of parameters. A weight assignment module for assigning a weight to the edge present between the each pair of words based the set of parameters. A categorization module configured to categorize one or more words present in the linguistic network in a category. A word identification module configured to identify a reference word from the category. A word substitution module configured to substitute each word of the category deemed as erroneous with corresponding reference word for rectifying the typographical error.
5 Citations
13 Claims
-
1. A method for rectifying a typographical error in a text file, the method comprising:
-
generating a linguistic network of a plurality of words present in the text file, wherein each pair of words of the plurality of words in the linguistic network is interconnected via an edge; computing similarity between each pair of words based on a set of parameters associated with words of each pair, wherein the set of parameters comprises distance between the words, phonetic similarity between the words, consonant skeleton distance, presence of the words in lexicon, morphological root form of the words, frequency of the words in the text file, probability of occurrence of the words in the context, domain similarity of the words, and a flag associated to similar starting character of the words; assigning a weight to the edge between the each pair of words based on the similarity computed from the set of parameters; categorizing one or more words present in the linguistic network in a category, wherein the one or more words are categorized based on the weight assigned to each edge connecting the each pair of words of the one or more words; identifying a reference word for each word which is deemed to have the typographical error from the category; and substituting each word of the category, having the typographical error, with the reference word corresponding to the word having typographical error in the text file for rectifying the typographical error, wherein the generating, the computing, the assigning, the categorizing, the identifying and the substituting are performed by a processor using programmed instructions stored in a memory. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A system for rectifying a typographical error in a text file, the system comprising:
-
a processor; and a memory coupled to the processor, wherein the processor is capable of executing a plurality of modules stored in the memory, and wherein the plurality of module comprising; a network generating module configured to generate a linguistic network of a plurality of words present in the text file, wherein each pair of words of the plurality of words in the linguistic network is interconnected via an edge; a computation module coupled to the network generating module, the computation module configured to compute similarity between the each pair of words based on a set of parameters associated with words of each pair, wherein the set of parameters comprises distance between the words, phonetic similarity between the words, consonant skeleton distance, presence of the words in lexicon, morphological root form of the words, frequency of the words in the text file, probability of occurrence of the words in the context, domain similarity of the words, and a flag associated to similar starting character of the words; a weight assignment module configured to assign a weight to the edge between the each pair of words based on the similarity computed from the set of parameters; a categorization module configured to categorize one or more words present in the linguistic network in a category, wherein the one or more words are categorized based on the weight assigned to each edge connecting the each pair of words of the one or more words; a word identification module configured to identify a reference word for each word which is deemed to have the typographical error from the category; and a word substitution module configured to substitute each word of the category, having the typographical error, with the corresponding reference word in the text file for rectifying the typographical error. - View Dependent Claims (9, 10, 11, 12)
-
-
13. A non-transitory computer readable medium having embodied thereon a computer program for rectifying a typographical error in a text file, the computer program comprising instructions for:
-
generating a linguistic network of a plurality of words present in the text file, wherein each pair of words of the plurality of words in the linguistic network is interconnected via an edge; computing similarity between the each pair of words based on a set of parameters associated with words of each pair, wherein the set of parameters comprises distance between the words, phonetic similarity between the words, consonant skeleton distance, presence of the words in lexicon, morphological root form of the words, frequency of the words in the text file, probability of occurrence of the words in the context, domain similarity of the words, and a flag associated to similar starting character of the words; assigning a weight to the edge between the each pair of words based on the similarity computed from the set of parameters; categorizing one or more words present in the linguistic network in a category, wherein the one or more words are categorized based on the weight assigned to each edge connecting the each pair of words of the one or more words; identifying a reference word for each word which is deemed to have the typographical error from the category; and substituting each word of the category, having the typographical error, with the corresponding reference word in the text file for rectifying the typographical error.
-
Specification