Method, apparatus, and storage medium for text information processing
First Claim
1. A text information processing method, applied to a terminal, the terminal comprising one or more processors, a memory, and program instructions stored in the memory, the program instructions being executed by the one or more processors, and the method comprising:
- performing word segmentation on a target text according to a preset fixed word segmentation policy, to obtain a word segmentation result;
comparing the word segmentation result with a preset word segmentation list, and obtaining a word, which is not in the preset word segmentation list, as a new word;
adding the new word to the preset word segmentation list, to obtain a test word segmentation list;
classifying a test text according to the preset word segmentation list, to obtain a first text, and classifying the test text according to the test word segmentation list, to obtain a second text;
calculating classification accuracy of the first text and classification accuracy of the second text;
comparing the classification accuracy of the first text with the classification accuracy of the second text, and determining a target new word from the new word according to a comparison result;
adding the target new word to the preset word segmentation list, to obtain a target preset word segmentation list; and
classifying the target text according to the target preset word segmentation list,wherein the classifying a test text according to the preset word segmentation list, to obtain a first text, and classifying the test text according to the test word segmentation list, to obtain a second text comprises;
classifying the test text according to a preset classification algorithm, to obtain the first text, wherein the preset classification algorithm is associated with the preset word segmentation list; and
classifying the test text according to the preset classification algorithm, to obtain the second text, wherein the preset classification algorithm is associated with the test word segmentation list; and
the classifying the target text according to the target preset word segmentation list comprises;
calibrating the preset classification algorithm according to the target preset word segmentation list, and classifying the target text according to the calibrated preset classification algorithm.
1 Assignment
0 Petitions
Accused Products
Abstract
Method, apparatus, and storage medium for text information processing are provided. The method includes: performing word segmentation on a target text according to a preset fixed word segmentation policy, and comparing a word segmentation result with a preset word segmentation list, to obtain a new word; adding the new word to the preset word segmentation list, to obtain a test word segmentation list; classifying a test text according to the preset word segmentation list, to obtain a first text, and classifying the test text according to the test word segmentation list, to obtain a second text; comparing classification accuracy of the first text with classification accuracy of the second text, and determining a target new word from the new word according to a comparison result; and adding the target new word to the preset word segmentation list, and classifying the target text.
8 Citations
15 Claims
-
1. A text information processing method, applied to a terminal, the terminal comprising one or more processors, a memory, and program instructions stored in the memory, the program instructions being executed by the one or more processors, and the method comprising:
-
performing word segmentation on a target text according to a preset fixed word segmentation policy, to obtain a word segmentation result; comparing the word segmentation result with a preset word segmentation list, and obtaining a word, which is not in the preset word segmentation list, as a new word; adding the new word to the preset word segmentation list, to obtain a test word segmentation list; classifying a test text according to the preset word segmentation list, to obtain a first text, and classifying the test text according to the test word segmentation list, to obtain a second text; calculating classification accuracy of the first text and classification accuracy of the second text; comparing the classification accuracy of the first text with the classification accuracy of the second text, and determining a target new word from the new word according to a comparison result; adding the target new word to the preset word segmentation list, to obtain a target preset word segmentation list; and classifying the target text according to the target preset word segmentation list, wherein the classifying a test text according to the preset word segmentation list, to obtain a first text, and classifying the test text according to the test word segmentation list, to obtain a second text comprises; classifying the test text according to a preset classification algorithm, to obtain the first text, wherein the preset classification algorithm is associated with the preset word segmentation list; and classifying the test text according to the preset classification algorithm, to obtain the second text, wherein the preset classification algorithm is associated with the test word segmentation list; and the classifying the target text according to the target preset word segmentation list comprises; calibrating the preset classification algorithm according to the target preset word segmentation list, and classifying the target text according to the calibrated preset classification algorithm. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A text information processing apparatus, the apparatus comprising:
-
one or more processors; a memory; and one or more program modules, stored in the memory, executed by the one or more processors, and the one or more program modules comprising; a new-word processing module, configured to perform word segmentation on a target text according to a preset fixed word segmentation policy, to obtain a word segmentation result; and
compare the word segmentation result with a preset word segmentation list, to obtain a word segmentation result, which is not in the preset word segmentation list, as a new word;an adding module, configured to add the new word to the preset word segmentation list, to obtain a test word segmentation list; a test-text classification module, configured to classify a test text according to the preset word segmentation list, to obtain a first text, and classify the test text according to the test word segmentation list, to obtain a second text; a target-new-word determining module, configured to calculate classification accuracy of the first text and classification accuracy of the second text, compare the classification accuracy of the first text with the classification accuracy of the second text, and determine a target new word from the new word according to a comparison result; and a target-text classification module, configured to add the target new word to the preset word segmentation list, to obtain a target preset word segmentation list; and
classify the target text according to the target preset word segmentation list,wherein the test-text classification module comprises; a first classification unit, configured to classify the test text according to a preset classification algorithm, to obtain the first text, wherein the preset classification algorithm is associated with the preset word segmentation list; and a second classification unit, configured to classify the test text according to the preset classification algorithm, to obtain the second text, wherein the preset classification algorithm is associated with the test word segmentation list; and the target-text classification module classifies the target text according to the target preset word segmentation list comprises; calibrating the preset classification algorithm according to the target preset word segmentation list, and classifying the target text according to the calibrated preset classification algorithm. - View Dependent Claims (7, 8, 9, 10)
-
-
11. A non-transitory computer readable storage medium, having computer executable instructions stored therein, and when these executable instructions run in a terminal, the terminal executing a text information processing method, comprising:
-
performing word segmentation on a target text according to a preset fixed word segmentation policy, to obtain a word segmentation result; comparing the word segmentation result with a preset word segmentation list, to obtain a word segmentation result, which is not in the preset word segmentation list, as a new word; adding the new word to the preset word segmentation list, to obtain a test word segmentation list; classifying a test text according to the preset word segmentation list, to obtain a first text, and classifying the test text according to the test word segmentation list, to obtain a second text; calculating classification accuracy of the first text and classification accuracy of the second text; comparing the classification accuracy of the first text with the classification accuracy of the second text, and determining a target new word from the new word according to a comparison result; adding the target new word to the preset word segmentation list, to obtain a target preset word segmentation list; and classifying the target text according to the target preset word segmentation list, wherein the classifying a test text according to the preset word segmentation list, to obtain a first text, and classifying the test text according to the test word segmentation list, to obtain a second text comprises; classifying the test text according to a preset classification algorithm, to obtain the first text, wherein the preset classification algorithm is associated with the preset word segmentation list; and classifying the test text according to the preset classification algorithm, to obtain the second text, wherein the preset classification algorithm is associated with the test word segmentation list; and the classifying the target text according to the target preset word segmentation list comprises; calibrating the preset classification algorithm according to the target preset word segmentation list, and classifying the target text according to the calibrated preset classification algorithm. - View Dependent Claims (12, 13, 14, 15)
-
Specification