System and method for text normalization using atomic tokens

US 10,388,270 B2
Filed: 11/05/2014
Issued: 08/20/2019
Est. Priority Date: 11/05/2014
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

receiving a text corpus;

tokenizing, via a tokenization module on a computing device, the text corpus into application tokens, each application token of the application tokens comprising one of a sequence of letters, a sequence of digits, and punctuation, wherein the tokenization module is trained on training data generated by a feature extraction module that extracts morphological and lexical text features from a training data token and from an n-left token or an n-right token associated with the training data token;

comparing the application tokens to a language-independent pattern list that comprises number patterns, to yield a token comparison;

identifying text-to-speech pronunciation guidelines associated with each application token in the application tokens, wherein the text-to-speech pronunciation guidelines comprise at least one of reorder, asword, and split; and

generating, via a text-to-speech computer system and an output device, audible speech from the application tokens in the text corpus using the token comparison and the text-to-speech pronunciation guidelines.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system, method and computer-readable storage devices are for normalizing text for ASR and TTS in a language-neutral way. The system described herein divides Unicode text into meaningful chunks called “atomic tokens.” The atomic tokens strongly correlate to their actual pronunciation, and not to their meaning The system combines the tokenization with a data-driven classification scheme, followed by class-determined actions to convert text to normalized form. The classification labels are based on pronunciation, unlike alternative approaches that typically employ Named Entity-based categories. Thus, this approach is relatively simple to adapt to new languages. Non-experts can easily annotate training data because the tokens are based on pronunciation alone.

37 Citations

20 Claims

1. A method comprising:
- receiving a text corpus;
  
  tokenizing, via a tokenization module on a computing device, the text corpus into application tokens, each application token of the application tokens comprising one of a sequence of letters, a sequence of digits, and punctuation, wherein the tokenization module is trained on training data generated by a feature extraction module that extracts morphological and lexical text features from a training data token and from an n-left token or an n-right token associated with the training data token;
  
  comparing the application tokens to a language-independent pattern list that comprises number patterns, to yield a token comparison;
  
  identifying text-to-speech pronunciation guidelines associated with each application token in the application tokens, wherein the text-to-speech pronunciation guidelines comprise at least one of reorder, asword, and split; and
  
  generating, via a text-to-speech computer system and an output device, audible speech from the application tokens in the text corpus using the token comparison and the text-to-speech pronunciation guidelines.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, wherein the text-to-speech pronunciation guidelines further comprise at least one of spell, expand, and digits.
  - 3. The method of claim 1, wherein the audible speech is further generated for a given application token based on one of N tokens to a left context and N tokens to a right context of the given application token.
  - 4. The method of claim 1, wherein the generating of the audible speech further comprises generating the text-to-speech pronunciation guidelines for at least one of the application tokens.
  - 5. The method of claim 1, wherein the generating of the audible speech further comprises instructing a text-to-speech module how to pronounce at least one of the application tokens.
  - 6. The method of claim 1, wherein the text corpus is Unicode encoded.
  - 7. The method of claim 1, further comprising normalizing the text corpus prior to generation of the audible speech, wherein the normalization comprises:
    - classifying the application tokens into classes; and
      
      modifying the text corpus using class-determined actions corresponding to the classes.

8. A system comprising:
- a processor configured to perform text-to-speech generation; and
  
  a computer-readable storage medium having instructions stored which, when executed by the processor, cause the processor to perform operations comprising;
  
  receiving a text corpus;
  
  tokenizing, via a tokenization module, the text corpus into application tokens, each application token of the application tokens comprising one of a sequence of letters, a sequence of digits, and punctuation, wherein the tokenization module is trained on training data generated by a feature extraction module that extracts morphological and lexical text features from a training data token and from an n-left token or an n-right token associated with the training data token;
  
  comparing the application tokens to a language-independent pattern list that comprises number patterns, to yield a token comparison;
  
  identifying text-to-speech pronunciation guidelines associated with each application token in the application tokens, wherein the text-to-speech pronunciation guidelines comprise at least one of reorder, asword, and split; and
  
  generating audible speech from the application tokens in the text corpus using the token comparison and the text-to-speech pronunciation guidelines.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The system of claim 8, wherein the text-to-speech pronunciation guidelines further comprise at least one of spell, expand, and digits.
  - 10. The system of claim 8, wherein the audible speech is further generated for a given application token based on one of N tokens to a left context and N tokens to a right context of the given application token.
  - 11. The system of claim 8, wherein the generating of the audible speech further comprises generating text-to-speech pronunciation guidelines for at least one of the application tokens.
  - 12. The system of claim 8, wherein the generating of the audible speech further comprises instructing a text-to-speech module how to pronounce at least one of the application tokens.
  - 13. The system of claim 8, wherein the text corpus is Unicode encoded.
  - 14. The system of claim 8, the computer-readable storage medium having additional instructions stored which, when executed by the processor, cause the processor to perform operations comprising normalizing the text corpus prior to generation of the audible speech, wherein the normalization comprises:
    - classifying the application tokens into classes; and
      
      modifying the text corpus using class-determined actions corresponding to the classes.

15. A computer-readable storage device having instructions stored which, when executed by a computing device configured to perform text-to-speech generation, cause the computing device to perform operations comprising:
- receiving a text corpus;
  
  tokenizing, via a tokenization module, the text corpus into application tokens, each application token of the application tokens comprising one of a sequence of letters, a sequence of digits, and punctuation, wherein the tokenization module is trained on training data generated by a feature extraction module that extracts morphological and lexical text features from a training data token and from an n-left token or an n-right token associated with the training data token;
  
  comparing the application tokens to a language-independent pattern list that comprises number patterns, to yield a token comparison;
  
  identifying text-to-speech pronunciation guidelines associated with each application token in the application tokens, wherein the text-to-speech pronunciation guidelines comprise at least one of reorder, asword, and split; and
  
  generating audible speech from the application tokens in the text corpus using the token comparison and the text-to-speech pronunciation guidelines.
- View Dependent Claims (16, 17, 18, 19, 20)
- - 16. The computer-readable storage device of claim 15, wherein the text-to-speech pronunciation guidelines further comprise at least one of spell, expand, and digits.
  - 17. The computer-readable storage device of claim 15, wherein the audible speech is further generated for a given application token based on one of N tokens to a left context and N tokens to a right context of the given application token.
  - 18. The computer-readable storage device of claim 15, wherein the generating of the audible speech further comprises generating text-to-speech pronunciation guidelines for at least one of the application tokens.
  - 19. The computer-readable storage device of claim 15, wherein the generating of the audible speech further comprises instructing a text-to-speech module how to pronounce at least one of the application tokens.
  - 20. The computer-readable storage device of claim 15, having additional instructions stored which, when executed by the computing device, cause the computing device to perform operations comprising normalizing the text corpus prior to generation of the audible speech, wherein the normalization comprises:
    - classifying the application tokens into classes; and
      
      modifying the text corpus using class-determined actions corresponding to the classes.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
AT&T Intellectual Property I LP (AT&T, Inc.)
Original Assignee
AT&T Intellectual Property I LP (AT&T, Inc.)
Inventors
Golipour, Ladan, Conkie, Alistair D.
Primary Examiner(s)
Jackson, Jakieda R

Application Number

US14/533,589
Publication Number

US 20160125872A1
Time in Patent Office

1,749 Days
Field of Search

704260
US Class Current
CPC Class Codes

G10L 13/10 Prosody rules derived from ...

System and method for text normalization using atomic tokens

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

37 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for text normalization using atomic tokens

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

37 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links