Semiotic class normalization
First Claim
1. A system, comprising:
- a data processing apparatus; and
a non-transitory computer readable storage medium in data communication with the data processing apparatus storing instructions executable by the data processing apparatus and that upon such execution causes the data processing apparatus to perform operations comprising;
building a semiotic class text normalization system, the building comprising;
identifying multiple possible verbalizations for a string, wherein the string includes one or more instances of members of one or more semiotic classes;
generating, for each possible verbalization for the string, a verbalization score according to a scoring function, wherein;
the scoring function comprises a scoring model that is trained using written expressions of instances of members of semiotic classes and corresponding spoken words for each written expression; and
the written expressions of instances of members of semiotic classes are generated from the spoken words by providing the spoken words as inputs to an inverse of a verbalization transducer; and
selecting one of the possible verbalizations as a selected verbalization for the string based on the respective verbalization scores.
3 Assignments
0 Petitions
Accused Products
Abstract
A language processing system for text normalization of an input string of a semiotic class. In an aspect, a method includes receiving an input string; accessing, for a semiotic class of non-standard words, a language universal covering grammar for a plurality of languages that generates, for each language of the plurality of languages, one or more sequences of word-level components for each instance of the semiotic class in the language; for each of the plurality of languages, accessing a lexical map specific to the language and that maps each sequence of word-level components for each instance of the semiotic class in the language verbalizations in the language; generating, from the language universal grammar and the lexical maps, a lattice of possible verbalizations of the input string; and selecting one of the possible verbalizations as a selected verbalization for the input string.
30 Citations
15 Claims
-
1. A system, comprising:
-
a data processing apparatus; and a non-transitory computer readable storage medium in data communication with the data processing apparatus storing instructions executable by the data processing apparatus and that upon such execution causes the data processing apparatus to perform operations comprising; building a semiotic class text normalization system, the building comprising; identifying multiple possible verbalizations for a string, wherein the string includes one or more instances of members of one or more semiotic classes; generating, for each possible verbalization for the string, a verbalization score according to a scoring function, wherein; the scoring function comprises a scoring model that is trained using written expressions of instances of members of semiotic classes and corresponding spoken words for each written expression; and the written expressions of instances of members of semiotic classes are generated from the spoken words by providing the spoken words as inputs to an inverse of a verbalization transducer; and selecting one of the possible verbalizations as a selected verbalization for the string based on the respective verbalization scores. - View Dependent Claims (2, 3, 5)
-
-
4. The system of 1, wherein the written expressions of instances of members of semiotic classes are collected from a corpus of documents.
-
6. A non-transitory computer readable storage medium storing instructions executable by a data processing apparatus and that upon such execution causes the data processing apparatus to perform operations comprising:
building a semiotic class text normalization system, the building comprising; identifying multiple possible verbalizations for a string, wherein the string includes one or more instances of members of one or more semiotic classes; generating, for each possible verbalization for the string, a verbalization score according to a scoring function, wherein; the scoring function comprises a scoring model that is trained using written expressions of instances of members of semiotic classes and corresponding spoken words for each written expression; and the written expressions of instances of members of semiotic classes are generated from the spoken words by providing the spoken words as inputs to an inverse of a verbalization transducer; and selecting one of the possible verbalizations as a selected verbalization for the string based on the respective verbalization scores. - View Dependent Claims (7, 8, 9, 10)
-
11. A computer implemented method, comprising:
building a semiotic class text normalization system, the building comprising; identifying multiple possible verbalizations for a string, wherein the string includes one or more instances of members of one or more semiotic classes; generating, for each possible verbalization for the string, a verbalization score according to a scoring function, wherein; the scoring function comprises a scoring model that is trained using written expressions of instances of members of semiotic classes and corresponding spoken words for each written expression; and the written expressions of instances of members of semiotic classes are generated from the spoken words by providing the spoken words as inputs to an inverse of a verbalization transducer; and selecting one of the possible verbalizations as a selected verbalization for the string based on the respective verbalization scores. - View Dependent Claims (12, 13, 14, 15)
Specification