×

System for grasping keyword extraction based speech content on recorded voice data, indexing method using the system, and method for grasping speech content

  • US 10,304,441 B2
  • Filed: 09/18/2014
  • Issued: 05/28/2019
  • Est. Priority Date: 11/06/2013
  • Status: Active Grant
First Claim
Patent Images

1. A system for grasping speech content, comprising:

  • an indexing unit, executed by a processor, for receiving voice data, performing per-frame voice recognition with reference to a phoneme to form a phoneme lattice, and generating divided indexing information for a frame of a limited time configured with a plurality of frames, the divided indexing information including a phoneme lattice formed for each frame of the limited time;

    an indexing database, executed by a processor, for storing the divided indexing information generated by the indexing unit so as to be indexed by respective divided indexing information;

    a searcher, executed by a processor, for using a keyword input by a user as a search word, performing a comparison on the divided indexing information stored in the indexing database with reference to a phoneme, and searching a phoneme string matching the search word; and

    a grasper, executed by a processor, for grasping a representative word through a search result searched by the searcher and outputting it to the user so as to retrieve and display on a display device, speech content of the voice data corresponding to the keyword input together with the keyword input,wherein the indexing unit includes;

    a featuring vector extractor for extracting a featuring vector from per-frame voice data;

    a phoneme recognizer for performing phoneme recognition with reference to frame synchronization by use of the featuring vector extracted by the featuring vector extractor, and generating a phoneme string;

    a candidate group forming unit for receiving the phoneme string generated by the phoneme recognizer, and generating candidate groups of phoneme recognition with respect to time for each frame;

    a phoneme lattice forming unit for performing an operation in reverse order of time on the phoneme string candidate groups formed by the candidate group forming unit to select one phoneme string candidate group and form a corresponding phoneme lattice; and

    an indexing controller for controlling the featuring vector extractor, the phoneme recognizer, the candidate group forming unit, and the phoneme lattice forming unit to perform control so as to form a phoneme based lattice for each limited time for the entire voice data and for each frame within the limited time and to perform control so as to store the phoneme lattice formed in this manner in the indexing database as divided indexing information for each limited time and thereby allow the same to be indexed for each limited time.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×