Recognition of target words using designated characteristic values
First Claim
1. A method of target word recognition, comprising:
- obtaining a candidate word set and corresponding characteristic computation data, the candidate word set comprising text data, and characteristic computation data being associated with the candidate word set;
performing segmentation of the characteristic computation data to generate a plurality of text segments;
combining the plurality of text segments to form a text data combination set;
determining an intersection of the candidate word set and the text data combination set, the intersection comprising a plurality of text data combinations;
determining a plurality of designated characteristic values for the plurality of text data combinations;
determining, using a processor, a criterion, including;
obtaining a training sample word set and sample characteristic computation data, the sample characteristic computation data comprising a plurality of sample words and designated characteristic values of the plurality of sample words;
obtaining a sample text data combination set based on the plurality of sample words;
determining a plurality of designated characteristic values of sample text data combinations in an intersection of the sample text data combination set and the training sample word set; and
setting a threshold value of a designated characteristic value of a sample text data combination in the intersection as a part of the criterion; and
based at least in part on the plurality of designated characteristic values for the plurality of text data combinations and according to at least the criterion, recognizing among the plurality of text data combinations, target words whose characteristic values fulfill the criterion.
1 Assignment
0 Petitions
Accused Products
Abstract
Target word recognition includes: obtaining a candidate word set and corresponding characteristic computation data, the candidate word set comprising text data, and characteristic computation data being associated with the candidate word set; performing segmentation of the characteristic computation data to generate a plurality of text segments; combining the plurality of text segments to form a text data combination set; determining an intersection of the candidate word set and the text data combination set, the intersection comprising a plurality of text data combinations; determining a plurality of designated characteristic values for the plurality of text data combinations; based at least in part on the plurality of designated characteristic values and according to at least a criterion, recognizing among the plurality of text data combinations target words whose characteristic values fulfill the criterion.
-
Citations
21 Claims
-
1. A method of target word recognition, comprising:
-
obtaining a candidate word set and corresponding characteristic computation data, the candidate word set comprising text data, and characteristic computation data being associated with the candidate word set; performing segmentation of the characteristic computation data to generate a plurality of text segments; combining the plurality of text segments to form a text data combination set; determining an intersection of the candidate word set and the text data combination set, the intersection comprising a plurality of text data combinations; determining a plurality of designated characteristic values for the plurality of text data combinations; determining, using a processor, a criterion, including; obtaining a training sample word set and sample characteristic computation data, the sample characteristic computation data comprising a plurality of sample words and designated characteristic values of the plurality of sample words; obtaining a sample text data combination set based on the plurality of sample words; determining a plurality of designated characteristic values of sample text data combinations in an intersection of the sample text data combination set and the training sample word set; and setting a threshold value of a designated characteristic value of a sample text data combination in the intersection as a part of the criterion; and based at least in part on the plurality of designated characteristic values for the plurality of text data combinations and according to at least the criterion, recognizing among the plurality of text data combinations, target words whose characteristic values fulfill the criterion. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A target word recognition system, comprising:
-
one or more processors configured to; obtain a candidate word set and corresponding characteristic computation data, the candidate word set comprising text data, and characteristic computation data being associated with the candidate word set; perform segmentation of the characteristic computation data to generate a plurality of text segments; combine the plurality of text segments to form a text data combination set; determine an intersection of the candidate word set and the text data combination set, the intersection comprising a plurality of text data combinations; determine a plurality of designated characteristic values for the plurality of text data combinations; determine a criterion, including; obtain a training sample word set and sample characteristic computation data, the sample characteristic computation data comprising a plurality of sample words and designated characteristic values of the plurality of sample words; obtain a sample text data combination set based on the plurality of sample words; determine a plurality of designated characteristic values of sample text data combinations in an intersection of the sample text data combination set and the training sample word set; and set a threshold value of a designated characteristic value of a sample text data combination in the intersection as a part of the criterion; and based at least in part on the plurality of designated characteristic values for the plurality of text data combinations and according to at least the criterion, recognize among the plurality of text data combinations, target words whose characteristic values fulfill the criterion; and one or more memories coupled to the one or more processors, configured to provide the one or more processors with instructions. - View Dependent Claims (13, 14, 15, 16, 17)
-
-
18. A non-transitory computer program product for target word recognition, the computer program product being embodied in a tangible non-transitory computer readable storage medium and comprising computer instructions for:
-
obtaining a candidate word set and corresponding characteristic computation data, the candidate word set comprising text data, and characteristic computation data being associated with the candidate word set; performing segmentation of the characteristic computation data to generate a plurality of text segments; combining the plurality of text segments to form a text data combination set; determining an intersection of the candidate word set and the text data combination set, the intersection comprising a plurality of text data combinations; determining a plurality of designated characteristic values for the plurality of text data combinations; determining a criterion, including; obtaining a training sample word set and sample characteristic computation data, the sample characteristic computation data comprising a plurality of sample words and designated characteristic values of the plurality of sample words; obtaining a sample text data combination set based on the plurality of sample words; determining a plurality of designated characteristic values of sample text data combinations in an intersection of the sample text data combination set and the training sample word set; and setting a threshold value of a designated characteristic value of a sample text data combination in the intersection as a part of the criterion; and based at least in part on the plurality of designated characteristic values for the plurality of text data combinations and according to at least the criterion, recognizing among the plurality of text data combinations, target words whose characteristic values fulfill the criterion.
-
-
19. A method of target word recognition, comprising:
-
obtaining a candidate word set and corresponding characteristic computation data, the candidate word set comprising text data, and characteristic computation data being associated with the candidate word set; performing segmentation of the characteristic computation data to generate a plurality of text segments; combining the plurality of text segments to form a text data combination set; determining an intersection of the candidate word set and the text data combination set, the intersection comprising a plurality of text data combinations; determining a plurality of designated characteristic values for the plurality of text data combinations; determining, using a processor, a criterion, including; obtaining a training sample word set and sample characteristic computation data, the training sample word set comprising a plurality of sample words and sorting results indicating whether each of the plurality of sample words is a target word, and the sample characteristic computation data comprising the plurality of sample words and designated characteristic values of the plurality of sample words; segmenting the plurality of sample words to obtain a plurality of sample segments of minimum granularity; combining the plurality of sample segments to obtain a sample text data combination set; determining an intersection of the sample text data combination set and the training sample word set; determining a plurality of designated characteristic values of sample text data combinations in the intersection; and setting a threshold value of a designated characteristic value of a sample text data combination in the intersection as a part of the criterion; and based at least in part on the plurality of designated characteristic values for the plurality of text data combinations and according to at least the criterion, recognizing among the plurality of text data combinations, target words whose characteristic values fulfill the criterion.
-
-
20. A target word recognition system, comprising:
-
one or more processors configured to; obtain a candidate word set and corresponding characteristic computation data, the candidate word set comprising text data, and characteristic computation data being associated with the candidate word set; perform segmentation of the characteristic computation data to generate a plurality of text segments; combine the plurality of text segments to form a text data combination set; determine an intersection of the candidate word set and the text data combination set, the intersection comprising a plurality of text data combinations; determine a plurality of designated characteristic values for the plurality of text data combinations; determine a criterion, including; obtaining a training sample word set and sample characteristic computation data, the training sample word set comprising a plurality of sample words and sorting results indicating whether each of the plurality of sample words is a target word, and the sample characteristic computation data comprising the plurality of sample words and designated characteristic values of the plurality of sample words; segmenting the plurality of sample words to obtain a plurality of sample segments of minimum granularity; combining the plurality of sample segments to obtain a sample text data combination set; determining an intersection of the sample text data combination set and the training sample word set; determining a plurality of designated characteristic values of sample text data combinations in the intersection; and setting a threshold value of a designated characteristic value of a sample text data combination in the intersection as a part of the criterion; and based at least in part on the plurality of designated characteristic values for the plurality of text data combinations and according to at least the criterion, recognize among the plurality of text data combinations, target words whose characteristic values fulfill the criterion; and one or more memories coupled to the one or more processors, configured to provide the one or more processors with instructions.
-
-
21. A non-transitory computer program product for target word recognition, the computer program product being embodied in a tangible non-transitory computer readable storage medium and comprising computer instructions for:
-
obtaining a candidate word set and corresponding characteristic computation data, the candidate word set comprising text data, and characteristic computation data being associated with the candidate word set; performing segmentation of the characteristic computation data to generate a plurality of text segments; combining the plurality of text segments to form a text data combination set; determining an intersection of the candidate word set and the text data combination set, the intersection comprising a plurality of text data combinations; determining a plurality of designated characteristic values for the plurality of text data combinations; determining a criterion, including; obtaining a training sample word set and sample characteristic computation data, the training sample word set comprising a plurality of sample words and sorting results indicating whether each of the plurality of sample words is a target word, and the sample characteristic computation data comprising the plurality of sample words and designated characteristic values of the plurality of sample words; segmenting the plurality of sample words to obtain a plurality of sample segments of minimum granularity; combining the plurality of sample segments to obtain a sample text data combination set; determining an intersection of the sample text data combination set and the training sample word set; determining a plurality of designated characteristic values of sample text data combinations in the intersection; and setting a threshold value of a designated characteristic value of a sample text data combinations, in the intersection as a part of the criterion; and based at least in part on the plurality of designated characteristic values for the plurality of text data combinations and according to at least the criterion, recognizing among the plurality of text data combinations target words whose characteristic values fulfill the criterion.
-
Specification