Systems and methods for automatic-creation of soundtracks for speech audio
First Claim
1. A method of automatically generating a digital soundtrack intended for synchronised playback with associated speech audio, the soundtrack comprising one or more audio regions configured for synchronised playback with corresponding speech regions of the speech audio, the method executed by a processing device or devices having associated memory, the method comprising:
- (a) receiving or retrieving raw text data representing or corresponding to the speech audio into memory;
(b) applying natural language processing (NLP) to the raw text data to generate processed text data comprising token data that identifies individual tokens in the raw text, the tokens at least identifying distinct words or word concepts;
(c) applying semantic analysis to a series of text segments of the processed text data based on a continuous emotion model defined by a predefined number of emotional category identifiers each representing an emotional category in the model, the semantic analysis being configured to parse the processed text data to generate, for each text segment, a segment emotional data profile based on the continuous emotion model;
(d) identifying a series of text or speech regions comprising a text segment or a plurality of adjacent text segments having an emotional association by processing the segment emotional data profiles of the text segments with respect to predefined rules, and generating audio region data defining the audio regions corresponding to the identified text regions, each audio region being defined by a start position indicative of the position in the text at which the audio region is to commence playback, a stop position indicative of the position in the text at which the audio region is to cease playback, and a generated region emotional profile based on the segment emotional profiles of the text segments within its associated text region;
(e) processing an accessible audio database or databases comprising audio data files and associated audio profile information to select an audio data file for playback in each audio region, the selection at least partly based on the audio profile information of the audio data file corresponding to the region emotional profile of the audio region, and defining audio data for each audio region representing the selected audio data file for playback; and
(f) generating soundtrack data representing the created soundtrack for synchronised playback with the speech audio, the soundtrack data comprising data representing the audio regions and audio data associated with those audio regions.
2 Assignments
0 Petitions
Accused Products
Abstract
A method of automatically generating a digital soundtrack intended for synchronised playback with associated speech audio, the method executed by a processing device or devices having associated memory. The method comprises syntactically and/or semantically analysing text representing or corresponding to the speech audio at a text segment level to generate an emotional profile for each text segment in the context of a continuous emotion model. The method further comprises generating a soundtrack for the speech audio comprising one or more audio regions that are configured or selected for playback during corresponding speech regions of the speech audio, and wherein the audio configured for playback in the audio regions is based on or a function of the emotional profile of one or more of the text segments within the respective speech regions.
37 Citations
34 Claims
-
1. A method of automatically generating a digital soundtrack intended for synchronised playback with associated speech audio, the soundtrack comprising one or more audio regions configured for synchronised playback with corresponding speech regions of the speech audio, the method executed by a processing device or devices having associated memory, the method comprising:
-
(a) receiving or retrieving raw text data representing or corresponding to the speech audio into memory; (b) applying natural language processing (NLP) to the raw text data to generate processed text data comprising token data that identifies individual tokens in the raw text, the tokens at least identifying distinct words or word concepts; (c) applying semantic analysis to a series of text segments of the processed text data based on a continuous emotion model defined by a predefined number of emotional category identifiers each representing an emotional category in the model, the semantic analysis being configured to parse the processed text data to generate, for each text segment, a segment emotional data profile based on the continuous emotion model; (d) identifying a series of text or speech regions comprising a text segment or a plurality of adjacent text segments having an emotional association by processing the segment emotional data profiles of the text segments with respect to predefined rules, and generating audio region data defining the audio regions corresponding to the identified text regions, each audio region being defined by a start position indicative of the position in the text at which the audio region is to commence playback, a stop position indicative of the position in the text at which the audio region is to cease playback, and a generated region emotional profile based on the segment emotional profiles of the text segments within its associated text region; (e) processing an accessible audio database or databases comprising audio data files and associated audio profile information to select an audio data file for playback in each audio region, the selection at least partly based on the audio profile information of the audio data file corresponding to the region emotional profile of the audio region, and defining audio data for each audio region representing the selected audio data file for playback; and (f) generating soundtrack data representing the created soundtrack for synchronised playback with the speech audio, the soundtrack data comprising data representing the audio regions and audio data associated with those audio regions. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32)
-
-
33. A system comprising a processor or processors configured to implement a method of automatically generating a digital soundtrack intended for synchronised playback with associated speech audio, the soundtrack comprising one or more audio regions configured for synchronised playback with corresponding speech regions of the speech audio, the method comprising:
-
(a) receiving or retrieving raw text data representing or corresponding to the speech audio into memory; (b) applying natural language processing (NLP) to the raw text data to generate processed text data comprising token data that identifies individual tokens in the raw text, the tokens at least identifying distinct words or word concepts; (c) applying semantic analysis to a series of text segments of the processed text data based on a continuous emotion model defined by a predefined number of emotional category identifiers each representing an emotional category in the model, the semantic analysis being configured to parse the processed text data to generate, for each text segment, a segment emotional data profile based on the continuous emotion model; (d) identifying a series of text or speech regions comprising a text segment or a plurality of adjacent text segments having an emotional association by processing the segment emotional data profiles of the text segments with respect to predefined rules, and generating audio region data defining the audio regions corresponding to the identified text regions, each audio region being defined by a start position indicative of the position in the text at which the audio region is to commence playback, a stop position indicative of the position in the text at which the audio region is to cease playback, and a generated region emotional profile based on the segment emotional profiles of the text segments within its associated text region; (e) processing an accessible audio database or databases comprising audio data files and associated audio profile information to select an audio data file for playback in each audio region, the selection at least partly based on the audio profile information of the audio data file corresponding to the region emotional profile of the audio region, and defining audio data for each audio region representing the selected audio data file for playback; and (f) generating soundtrack data representing the created soundtrack for synchronised playback with the speech audio, the soundtrack data comprising data representing the audio regions and audio data associated with those audio regions.
-
-
34. A non-transitory computer-readable medium having stored thereon computer readable instructions that, when executed on a processing device or devices, cause the processing device to perform a method of automatically generating a digital soundtrack intended for synchronised playback with associated speech audio, the soundtrack comprising one or more audio regions configured for synchronised playback with corresponding speech regions of the speech audio, the method comprising:
-
(a) receiving or retrieving raw text data representing or corresponding to the speech audio into memory; (b) applying natural language processing (NLP) to the raw text data to generate processed text data comprising token data that identifies individual tokens in the raw text, the tokens at least identifying distinct words or word concepts; (c) applying semantic analysis to a series of text segments of the processed text data based on a continuous emotion model defined by a predefined number of emotional category identifiers each representing an emotional category in the model, the semantic analysis being configured to parse the processed text data to generate, for each text segment, a segment emotional data profile based on the continuous emotion model; (d) identifying a series of text or speech regions comprising a text segment or a plurality of adjacent text segments having an emotional association by processing the segment emotional data profiles of the text segments with respect to predefined rules, and generating audio region data defining the audio regions corresponding to the identified text regions, each audio region being defined by a start position indicative of the position in the text at which the audio region is to commence playback, a stop position indicative of the position in the text at which the audio region is to cease playback, and a generated region emotional profile based on the segment emotional profiles of the text segments within its associated text region; (e) processing an accessible audio database or databases comprising audio data files and associated audio profile information to select an audio data file for playback in each audio region, the selection at least partly based on the audio profile information of the audio data file corresponding to the region emotional profile of the audio region, and defining audio data for each audio region representing the selected audio data file for playback; and (f) generating soundtrack data representing the created soundtrack for synchronised playback with the speech audio, the soundtrack data comprising data representing the audio regions and audio data associated with those audio regions.
-
Specification