System and Method of Reading Environment Sound Enhancement Based on Image Processing and Semantic Analysis
First Claim
1. A reading environment sound enhancement system based on image processing and semantic analysis, comprising:
- an image acquisition device for acquiring a reading image of a user, anda processing device having an operation unit for respectively controlling a transmission unit, a memory unit and an audio unit, and for performing transmission, storage and audio synthesis,wherein the operation unit comprises;
an image extraction module configured to receive an input signal of the image acquisition device, and to convert the image into an image signal;
a word recognition module configured to process the image signal for clear and easy recognition, to identify at least one word from the image signal, to store the recognized word in a cached text file, and to classify the words in the text file;
a semantic analysis module configured to identify the semantics of the classified word, to extract environmental semantic words and emotional semantic words respectively, and to retrieve an environmental background music or an emotional background music by comparing the environmental semantic words or the emotional semantic words to an element in a background music library; and
an audio synthesis module for audio synthesis and sound enhancement on the basis of background music, comprising;
a time domain recorder for recording at least one reading time node according to a text change in a reading target area of the acquired image,recording at least one emotional time node if the accumulated emotional score value exceeds a preset threshold, each emotional time node corresponding to a position of the emotional word in a text segment, and generating a time domain control bar by integrating the reading time node and the emotional time node; and
a mixer for superimposing audio signals of the background music and the sound effect music in time domain by a saturator having an attenuation factor, by means of the time domain control bar.
1 Assignment
0 Petitions
Accused Products
Abstract
The disclosure relates to a system and a method for disposing background music and sound effects based on image processing and semantic analysis. The method includes determining the environment and emotional attributes of a text semantics in a reading page by analyzing and processing an acquired reading page image, and selecting music and sound material to synthesize reading background music and sound effects according to the text semantics, so as to achieve sound enhancement in the reading environment. The system includes an image acquisition device for acquiring a reading image of a user, and a processing device having an operation unit for performing the method.
2 Citations
13 Claims
-
1. A reading environment sound enhancement system based on image processing and semantic analysis, comprising:
-
an image acquisition device for acquiring a reading image of a user, and a processing device having an operation unit for respectively controlling a transmission unit, a memory unit and an audio unit, and for performing transmission, storage and audio synthesis, wherein the operation unit comprises; an image extraction module configured to receive an input signal of the image acquisition device, and to convert the image into an image signal; a word recognition module configured to process the image signal for clear and easy recognition, to identify at least one word from the image signal, to store the recognized word in a cached text file, and to classify the words in the text file; a semantic analysis module configured to identify the semantics of the classified word, to extract environmental semantic words and emotional semantic words respectively, and to retrieve an environmental background music or an emotional background music by comparing the environmental semantic words or the emotional semantic words to an element in a background music library; and an audio synthesis module for audio synthesis and sound enhancement on the basis of background music, comprising; a time domain recorder for recording at least one reading time node according to a text change in a reading target area of the acquired image, recording at least one emotional time node if the accumulated emotional score value exceeds a preset threshold, each emotional time node corresponding to a position of the emotional word in a text segment, and generating a time domain control bar by integrating the reading time node and the emotional time node; and a mixer for superimposing audio signals of the background music and the sound effect music in time domain by a saturator having an attenuation factor, by means of the time domain control bar. - View Dependent Claims (2, 3, 4, 5, 11)
-
-
6. (canceled)
-
7. A reading environment sound enhancement method based on image processing and semantic analysis, comprising the steps of:
-
providing a semantic knowledge base comprising a background semantic set, the background semantic set comprising an environment semantic set and an emotional semantic set, each of the environmental semantic set and the emotional semantic set comprising condition words; receiving an input signal and converting image information from the input image signal including a screenshot in an electronic device or a page shooting image of a paper book; processing the image signal for clear and easy recognition, identifying at least one word from the image signal, storing the recognized word in a cached text file, and classifying the word in the text file; identifying the semantics of the classified word, and extracting environmental semantic words and emotional semantic words respectively; retrieving an environmental background music and an emotional background music by comparing the environmental semantic words or the emotional semantic words to an element in a background music library; performing audio synthesis and sound enhancement on the basis of background music; recording at least one reading time node according to a text change in a reading target area of the acquired image, and recording at least one emotional time node if the accumulated emotional score value exceeds a preset threshold, each emotional time node corresponding to a position of the emotional word in a text segment; generating a time domain control bar by integrating the reading time node and the emotional time node; superimposing audio signals of the background music and the sound effect music in time domain by a saturator having an attenuation factor, by means of the time domain control bar; and playing the synthesized audio by the audio output device. - View Dependent Claims (8, 9, 12, 13)
-
-
10. (canceled)
Specification