System and method for transcription of spoken words using multilingual mismatched crowd unfamiliar with a spoken language
First Claim
1. A computer implemented method for transcribing one or more spoken word utterances of a source language using a multilingual mismatched crowd unfamiliar with the source language, the method comprises:
- collecting, at a word transcription table, a plurality of multi-scripted noisy transcriptions of the spoken word obtained from a plurality of workers of the multilingual mismatched crowd, wherein the word transcription table is configured to store transcription responses of the spoken word segments presented to the plurality of workers, audio chunk id, each of the plurality of workers id, and the plurality of workers transcription text;
mapping each of the collected plurality of multi-scripted transcriptions to a phoneme sequence in the source language using script specific graphemes to phoneme model;
building worker specific insertion-deletion-substitution (IDS) channel model, multi-scripted transcription script specific IDS channel model and a global IDS channel model from the multi-scripted transcriptions;
filtering out a set of workers of the plurality of workers based on the reputation of the workers, estimated by simulating IDS channel for worker specific on dictionary words using worker reputation module;
allocating the transcription tasks to the set of workers such that required number of transcriptions per word are minimized; and
decoding, at a transcription decoding module, the plurality of multi-scripted transcriptions are combined to decode the transcription in source script, wherein the decoding comprises steps of;
finding likelihood probability of the mapped phoneme sequences of the multi-scripted mismatched crowd transcriptions with each of the dictionary words phoneme sequence using insertion-deletion-substitution channel parameters and voting the dictionary word that maximizes above likelihood; and
determining word belief by taking ratio of the likelihood probability of the mapped phoneme sequences of transcriptions given current estimate of word to sum of the likelihood probabilities of mapped phoneme sequences of the transcriptions given the phoneme sequence of each dictionary word.
1 Assignment
0 Petitions
Accused Products
Abstract
The disclosure generally relates to transcription of spoken words, and more particularly to a system and method for transcription of spoken words using multilingual mismatched words. The process comprises collection of multi-scripted noisy transcriptions of the spoken word obtained from workers of the multilingual mismatched crowd unfamiliar with the spoken language. The collected words are mapped to a phoneme sequence in the source language using script specific graphemes to phoneme model. Further, it builds a multi-scripted transcription script specific, worker specific and a global insertion-deletion-substitution (IDS) channel. Furthermore, the disclosure also determines reputation of workers to allocate the transcription task. Determination of reputation is based on word belief. The word belief is determined by taking ratio of likelihood probability of mapped phoneme sequence of transcriptions given the current estimate of word to the sum of likelihood probabilities of mapped phoneme sequences of the transcriptions given the phoneme sequence of each dictionary word.
9 Citations
15 Claims
-
1. A computer implemented method for transcribing one or more spoken word utterances of a source language using a multilingual mismatched crowd unfamiliar with the source language, the method comprises:
-
collecting, at a word transcription table, a plurality of multi-scripted noisy transcriptions of the spoken word obtained from a plurality of workers of the multilingual mismatched crowd, wherein the word transcription table is configured to store transcription responses of the spoken word segments presented to the plurality of workers, audio chunk id, each of the plurality of workers id, and the plurality of workers transcription text; mapping each of the collected plurality of multi-scripted transcriptions to a phoneme sequence in the source language using script specific graphemes to phoneme model; building worker specific insertion-deletion-substitution (IDS) channel model, multi-scripted transcription script specific IDS channel model and a global IDS channel model from the multi-scripted transcriptions; filtering out a set of workers of the plurality of workers based on the reputation of the workers, estimated by simulating IDS channel for worker specific on dictionary words using worker reputation module; allocating the transcription tasks to the set of workers such that required number of transcriptions per word are minimized; and decoding, at a transcription decoding module, the plurality of multi-scripted transcriptions are combined to decode the transcription in source script, wherein the decoding comprises steps of; finding likelihood probability of the mapped phoneme sequences of the multi-scripted mismatched crowd transcriptions with each of the dictionary words phoneme sequence using insertion-deletion-substitution channel parameters and voting the dictionary word that maximizes above likelihood; and determining word belief by taking ratio of the likelihood probability of the mapped phoneme sequences of transcriptions given current estimate of word to sum of the likelihood probabilities of mapped phoneme sequences of the transcriptions given the phoneme sequence of each dictionary word. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A system for transcribing one or more spoken word utterances of a source language using a multilingual mismatched crowd unfamiliar with the source language, the system comprising:
-
a processor; a memory communicatively coupled to the processor and the memory contains instructions that are readable by the processor; a database parted within the memory, wherein the database comprises an audio chunk table and a word transcription table, wherein the audio chunk table is configured to store one or more information of a plurality of workers, one or more spoken word segments of each of the plurality of workers, number of responses given by the each of the plurality of workers and transcription score of the each of the plurality of workers, and further wherein, the word transcription table is configured to store transcription responses of the spoken word segments presented to the plurality of workers, audio chunk id, each of the plurality of workers id, and the plurality of workers transcription text; a plurality of typing interfaces are configured according to script preference of the each of the plurality of workers of the multilingual mismatched crowd; a reputation module is configured to compute the worker reputation and filter out spammer from the plurality of workers; a task allocation module is configured to compute word beliefs and to allocate the transcription tasks to the plurality of workers, wherein the reputation of the plurality of workers is estimated by simulating the worker specific insertion-deletion-substitution (IDS) channel on dictionary words; and a transcription decoding module is configured to generate transcription in the source language from multi-scripted transcriptions of the plurality of workers. - View Dependent Claims (12, 13, 14, 15)
-
Specification