System and Method of Reading Environment Sound Enhancement Based on Image Processing and Semantic Analysis

US 20200135158A1
Filed: 06/06/2017
Published: 04/30/2020
Est. Priority Date: 05/02/2017
Status: Active Grant

First Claim

Patent Images

1. A reading environment sound enhancement system based on image processing and semantic analysis, comprising:

an image acquisition device for acquiring a reading image of a user, anda processing device having an operation unit for respectively controlling a transmission unit, a memory unit and an audio unit, and for performing transmission, storage and audio synthesis,wherein the operation unit comprises;

an image extraction module configured to receive an input signal of the image acquisition device, and to convert the image into an image signal;

a word recognition module configured to process the image signal for clear and easy recognition, to identify at least one word from the image signal, to store the recognized word in a cached text file, and to classify the words in the text file;

a semantic analysis module configured to identify the semantics of the classified word, to extract environmental semantic words and emotional semantic words respectively, and to retrieve an environmental background music or an emotional background music by comparing the environmental semantic words or the emotional semantic words to an element in a background music library; and

an audio synthesis module for audio synthesis and sound enhancement on the basis of background music, comprising;

a time domain recorder for recording at least one reading time node according to a text change in a reading target area of the acquired image,recording at least one emotional time node if the accumulated emotional score value exceeds a preset threshold, each emotional time node corresponding to a position of the emotional word in a text segment, and generating a time domain control bar by integrating the reading time node and the emotional time node; and

a mixer for superimposing audio signals of the background music and the sound effect music in time domain by a saturator having an attenuation factor, by means of the time domain control bar.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The disclosure relates to a system and a method for disposing background music and sound effects based on image processing and semantic analysis. The method includes determining the environment and emotional attributes of a text semantics in a reading page by analyzing and processing an acquired reading page image, and selecting music and sound material to synthesize reading background music and sound effects according to the text semantics, so as to achieve sound enhancement in the reading environment. The system includes an image acquisition device for acquiring a reading image of a user, and a processing device having an operation unit for performing the method.

2 Citations

13 Claims

1. A reading environment sound enhancement system based on image processing and semantic analysis, comprising:
- an image acquisition device for acquiring a reading image of a user, anda processing device having an operation unit for respectively controlling a transmission unit, a memory unit and an audio unit, and for performing transmission, storage and audio synthesis,wherein the operation unit comprises;
  
  an image extraction module configured to receive an input signal of the image acquisition device, and to convert the image into an image signal;
  
  a word recognition module configured to process the image signal for clear and easy recognition, to identify at least one word from the image signal, to store the recognized word in a cached text file, and to classify the words in the text file;
  
  a semantic analysis module configured to identify the semantics of the classified word, to extract environmental semantic words and emotional semantic words respectively, and to retrieve an environmental background music or an emotional background music by comparing the environmental semantic words or the emotional semantic words to an element in a background music library; and
  
  an audio synthesis module for audio synthesis and sound enhancement on the basis of background music, comprising;
  
  a time domain recorder for recording at least one reading time node according to a text change in a reading target area of the acquired image,recording at least one emotional time node if the accumulated emotional score value exceeds a preset threshold, each emotional time node corresponding to a position of the emotional word in a text segment, and generating a time domain control bar by integrating the reading time node and the emotional time node; and
  
  a mixer for superimposing audio signals of the background music and the sound effect music in time domain by a saturator having an attenuation factor, by means of the time domain control bar.
- View Dependent Claims (2, 3, 4, 5, 11)
- - 2. The system of claim 1, wherein the image acquisition device comprises a camera and/or a text capturing tool of a smart reading device, and the text capturing tool comprises a screenshot tool, a text memory reading tool or a invoking tool of an application programming interface (API) for a reading application.
  - 3. The system of claim 1, wherein the image acquisition device is attached to an earphone, glasses or a wearable device of the user by at least one accessory.
  - 4. The system of claim 3, further comprising a second operation unit disposed in the image acquisition device, wherein the second operation unit comprises:
    - an image preprocessing module configured to perform calibration of a reading target area on the captured image, and to perform interception, correction, denoising, and binarization processing for the image in the reading target area; and
      
      a transmission module configured to compress and wirelessly transmit the preprocessed image.
  - 5. The system of claim 1, wherein the semantic analysis module comprises:
    - a word slicer configured to invoke a statistical language model to divide at least one word in a text segment, and to calculate a weight value and an emotion score for each divided word;
      
      a topic model solver configured to calculate an optimal solution of an implicit Dirichlet LDA topic model by a random sampling method to represent a classification of each divided word; and
      
      a word feature extractor configured to classify the words in a text string and to distinguish the environmental words and the emotional words in the text string.
  - 11. The system of claim 1, further comprising a second operation unit disposed in the image acquisition device, wherein the second operation unit comprises:
    - an image preprocessing module configured to perform calibration of a reading target area on the captured image, and to perform interception, correction, denoising, and binarization processing for the image in the reading target area; and
      
      a transmission module configured to compress and wirelessly transmit the preprocessed image.

6. (canceled)

7. A reading environment sound enhancement method based on image processing and semantic analysis, comprising the steps of:
- providing a semantic knowledge base comprising a background semantic set, the background semantic set comprising an environment semantic set and an emotional semantic set, each of the environmental semantic set and the emotional semantic set comprising condition words;
  
  receiving an input signal and converting image information from the input image signal including a screenshot in an electronic device or a page shooting image of a paper book;
  
  processing the image signal for clear and easy recognition, identifying at least one word from the image signal, storing the recognized word in a cached text file, and classifying the word in the text file;
  
  identifying the semantics of the classified word, and extracting environmental semantic words and emotional semantic words respectively;
  
  retrieving an environmental background music and an emotional background music by comparing the environmental semantic words or the emotional semantic words to an element in a background music library;
  
  performing audio synthesis and sound enhancement on the basis of background music;
  
  recording at least one reading time node according to a text change in a reading target area of the acquired image, and recording at least one emotional time node if the accumulated emotional score value exceeds a preset threshold, each emotional time node corresponding to a position of the emotional word in a text segment;
  
  generating a time domain control bar by integrating the reading time node and the emotional time node;
  
  superimposing audio signals of the background music and the sound effect music in time domain by a saturator having an attenuation factor, by means of the time domain control bar; and
  
  playing the synthesized audio by the audio output device.
- View Dependent Claims (8, 9, 12, 13)
- - 8. The method of claim 7, wherein:
    - processing the image signal comprises image correction and denoising; and
      
      identifying at least one word comprises text refinement, connected domain digitization, and line segment linearization.
  - 9. The method of claim 7, wherein retrieving an environmental background music and an emotional background music comprises:
    - invoking a statistical language model to divide at least one word in a text segment, and calculating a weight value and an emotion score for each divided word;
      
      calculating an optimal solution of an implicit Dirichlet LDA topic model by a random sampling method to represent a classification of each divided word;
      
      classifying the words in a text string, and distinguishing the environmental words and emotional words in the text string; and
      
      matching a music material to each divided word with a condition word.
  - 12. The method of claim 7, further comprising:
    - determining a reading position of the reader according to a movement of the reading area on a paper book during identifying to the page shooting image; and
      
      updating the reading position according to an identified movement of page turning.
  - 13. The method of claim 7, further comprising:
    - determining a reading position of the reader according to data of a reading software in the reading device; and
      
      locating a position of the text change in a reading article by means of a section break symbol or an end symbol of a paragraph.

10. (canceled)

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Harbin Institute of Technology Company Limited Galaxia, Yunjiang Lou
Original Assignee
Harbin Institute of Technology Company Limited Galaxia
Inventors
YAO, Shunjie, LOU, Wudan, LOU, Yunjiang, CHEN, Yujing

Granted Patent

US 10,692,480 B2
Time in Patent Office

Days
Field of Search
US Class Current
CPC Class Codes

G06F 16/36   Creation of semantic tools,...

G06F 16/635   Filtering based on addition...

G06F 16/686   using information manually ...

G06F 40/216   using statistical methods

G06F 40/237   Lexical tools

G06F 40/284   Lexical analysis, e.g. toke...

G06F 40/30   Semantic analysis

G06T 5/70   Denoising; Smoothing

G06V 30/10   Character recognition

G06V 30/40   Document-oriented image-bas...

G10H 1/36   Accompaniment arrangements

G10H 2210/021   Background music, e.g. for ...

G10H 2210/155   Musical effects

G10H 2240/085   Mood, i.e. generation, dete...

G10H 2240/141   Library retrieval matching,...

G10H 7/00   Instruments in which the to...

G10H 7/008   Means for controlling the t...

G10K 15/02   Synthesis of acoustic waves...

System and Method of Reading Environment Sound Enhancement Based on Image Processing and Semantic Analysis

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

2 Citations

13 Claims

Specification

Solutions

Use Cases

Quick Links

System and Method of Reading Environment Sound Enhancement Based on Image Processing and Semantic Analysis

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

2 Citations

13 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links