System and method for real-time detection and preservation of speech onset in a signal
First Claim
1. A system for encoding an audio signal, comprising:
- analyzing sequential segments of at least one digital audio signal to determine segment type as one of speech type segments, non-speech type segments, and unknown type segments;
encoding each speech segment as one or more signal frames using a speech segment-specific encoder;
encoding each non-speech frame as one or more signal frames using a non-speech segment-specific encoder;
buffering each sequential unknown type segment in a segment buffer until analysis of a subsequent segment identifies the subsequent segment type as any of a speech segment and a silence segment; and
encoding the buffered segments and the subsequent segment as one or more signal frames using the segment-specific encoder corresponding to the type of the subsequent segment.
3 Assignments
0 Petitions
Accused Products
Abstract
A “speech onset detector” provides a variable length frame buffer in combination with either variable transmission rate or temporal speech compression for buffered signal frames. The variable length buffer buffers frames that are not clearly identified as either speech or non-speech frames during an initial analysis. Buffering of signal frames continues until a current frame is identified as either speech or non-speech. If the current frame is identified as non-speech, buffered frames are encoded as non-speech frames. However, if the current frame is identified as a speech frame, buffered frames are searched for the actual onset point of the speech. Once that onset point is identified, the signal is either transmitted in a burst, or a time-scale modification of the buffered signal is applied for compressing buffered frames beginning with the frame in which onset point is detected. The compressed frames are then encoded as one or more speech frames.
68 Citations
36 Claims
-
1. A system for encoding an audio signal, comprising:
-
analyzing sequential segments of at least one digital audio signal to determine segment type as one of speech type segments, non-speech type segments, and unknown type segments;
encoding each speech segment as one or more signal frames using a speech segment-specific encoder;
encoding each non-speech frame as one or more signal frames using a non-speech segment-specific encoder;
buffering each sequential unknown type segment in a segment buffer until analysis of a subsequent segment identifies the subsequent segment type as any of a speech segment and a silence segment; and
encoding the buffered segments and the subsequent segment as one or more signal frames using the segment-specific encoder corresponding to the type of the subsequent segment. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 16, 17, 18, 19, 20)
-
-
15. A system for encoding speech onset in a signal, comprising:
-
continuously analyzing and encoding sequential frames of at least one digital audio signal while analysis of the sequential frames indicates that the sequential frames is of a frame type including any of a speech type signal frame and a non-speech type signal frame;
continuously analyzing and buffering sequential frames of the at least one digital audio signal while analysis of each sequential frame is unable to determine whether each sequential frame is of a frame type including any of the speech type signal frame and the non-speech type signal frame;
automatically identifying at least one of the buffered sequential frames as having the same type as a current sequential frame when analysis of the current sequential frame indicates that it is of a frame type including any of the speech type signal frame and the non-speech type signal frame; and
encoding the buffered sequential frames. - View Dependent Claims (21, 22, 23)
-
-
24. A computer-implemented process for encoding at least one frame of a digital audio signal, comprising:
-
encoding a current frame of the audio signal when it is determined that the current frame of the audio signal includes any of speech and non-speech;
buffering the current frame of the audio signal in a frame buffer when it can not be determined whether the current frame of the audio signal includes any of speech and non-speech;
sequentially analyzing and buffering subsequent frames of the audio signal until analysis of the subsequent frames identifies a frame including any of speech and non-speech;
temporally compressing each buffered frame; and
encoding the temporally compressed frames as one or more signal frames. - View Dependent Claims (25, 26, 27, 28, 29, 30)
-
-
31. A method for capturing speech onset in a digital audio signal, comprising:
-
sequentially analyzing and encoding chronological frames of a digital audio signal when an analysis of the chronological frames identifies the presence of any of speech and non-speech in the frames of the digital audio signal;
buffering all chronological frames of the digital audio signal when the analysis of the chronological frames is unable to identify a presence of any of speech and non-speech in the frames of the digital audio signal;
identifying at least one of the buffered chronological frames as having a same content type as a current chronological frame of the digital audio signal when the analysis the current chronological frame identifies the presence of any of speech and non-speech in the digital signal following the buffering of any chronological frames; and
encoding the current chronological frame and at least one of the buffered chronological frames. - View Dependent Claims (32, 36)
-
- 33. The method of 32 further comprising searching the buffered chronological frames in the frame buffer, prior to temporally compressing at least one of the buffered chronological frames, for identifying a speech onset point within one of the buffered chronological frames, and wherein said search is initialized using speech identified in the current chronological frame.
Specification