METHOD AND SYSTEM FOR ADDING PUNCTUATION TO VOICE FILES

US 20140350918A1
Filed: 03/19/2014
Published: 11/27/2014
Est. Priority Date: 05/24/2013
Status: Active Grant

First Claim

Patent Images

1. A method for adding punctuations to a voice file, comprising:

utilizing silence or pause duration detection to divide a voice file into a plurality of speech segments for processing, the voice file comprising a plurality of features units;

identifying all features units that appear in the voice file according to every term or expression and semantics features of the every term or expression that form each of the plurality of speech segments;

using a linguistic model to determine a sum of weight of various punctuation modes in the voice file according to all the feature units, wherein the linguistic model is built upon semantics features of various parsed out terms or expressions from a body text of a spoken sentence according to a language library; and

adding punctuations to the voice file based on the determined sum of weight of the various punctuation modes.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and system for adding punctuation to a voice file is disclosed. The method includes: utilizing silence or pause duration detection to divide a voice file into a plurality of speech segments for processing, the voice file includes a plurality of features units; identifying all features units that appear in the voice file according to every term or expression and semantics features of the every term or expression that form each of the plurality of speech segments; using a linguistic model to determine a sum of weight of various punctuation modes in the voice file according to all the feature units, the linguistic model is built upon semantics features of various parsed out terms or expressions from a body text of a spoken sentence according to a language library; and adding punctuations to the voice file based on the determined sum of weight of the various punctuation modes.

Citations

20 Claims

1. A method for adding punctuations to a voice file, comprising:
- utilizing silence or pause duration detection to divide a voice file into a plurality of speech segments for processing, the voice file comprising a plurality of features units;
  
  identifying all features units that appear in the voice file according to every term or expression and semantics features of the every term or expression that form each of the plurality of speech segments;
  
  using a linguistic model to determine a sum of weight of various punctuation modes in the voice file according to all the feature units, wherein the linguistic model is built upon semantics features of various parsed out terms or expressions from a body text of a spoken sentence according to a language library; and
  
  adding punctuations to the voice file based on the determined sum of weight of the various punctuation modes.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method according to claim 1, wherein the silence or pause detection comprises:
    - determining a silence or pause duration threshold according to a current application scenario;
      
      detecting the silence or pause duration in the voice file to be processed, andwhen the silence or pause duration is longer than the silence threshold;
      
      separating the speech segments in the voice file at locations that correspond to the silence or pause duration.
  - 3. The method according to claim 1, wherein the identifying of all the features units that appears in the voice file, comprising:
    - identifying the feature unit that appears in each of the plurality of speech segments to be processed; and
      
      gathering into a set, the feature units that appear in the plurality of speech segments to form the all the feature units.
  - 4. The method according to claim 1, wherein the building of the linguistic model comprises:
    - parsing out the various terms or expressions from the body text of the spoken sentence, wherein punctuations have already been added in advance to the spoken sentence according to the language library;
      
      searching the feature units according to the semantics feature of each parsed out term or expression and according to a preset feature template;
      
      recording a number of occurrences of each punctuation mode in each feature unit in the body text of the spoken sentence, according to the punctuation mode that follows the term or expression in the feature unit;
      
      determining a weight of each punctuation mode of each feature unit according to the number of occurrences of each punctuation mode of each feature unit; and
      
      building the linguistic model which comprises every feature unit and its respective punctuation mode with a corresponding weight relationship.
  - 5. The method according to claim 1, wherein the feature units comprise:
    - a single term or expression feature unit and/or a multi-term or expression feature unit, wherein;
      
      the single term or expression feature unit is acquired according to a single term or expression feature template, and the multi-term or expression feature unit is acquired according to a multi-term or expression feature template;
      
      the single term or expression feature template perform functions, comprising;
      
      acquiring the single term or expression whose current reference location in relation to its relative location fulfills a predetermined requirement, and the semantics features of the single term or expression, andthe acquiring of the single term or expression feature unit which is based on the single term or expression feature template, comprising;
      
      taking a respective location of each term or expression as the current reference location, determining the single term or expression whose relative location relationship with the current reference location fulfills the requirements of the single term or expression feature template according to the single term or expression feature template;
      
      identifying the single term or expression feature unit according to the semantic features of the single term or expression, the single term or expression feature unit comprises the single term or expression, the semantic features of the single term or expression and the relative location relationship of the single term or expression with the current reference location; and
      
      the multi-term or expression feature template includes acquiring multiple term or expression whose relative location relationship with the current reference location fulfills the predetermined requirements and the semantics features of each of the multiple terms or expressions, andthe multi-term or expression feature units perform functions, comprising;
      
      acquiring the respective location of each term or expression as the current reference location, determining the multiple terms or expressions whose relative location relationship with the current reference location fulfills the requirements of the multi-term or expression feature template according to the multi-term or expression feature template;
      
      identifying multi-term or expression feature unit according to the semantic features of each of the multiple terms or expressions, the multi-term or expression feature unit comprises the multiple terms or expressions the semantic features of the each of the multiple terms or expressions and the relative location relationship of each of the multiple terms or expressions with the current reference location.
  - 6. The method according to claim 1, wherein the determining of the sum of weight on various modes of punctuations in the voice file according to all the feature units, comprises:
    - acquiring from the linguistic model corresponding relationships between each feature unit among all the feature units and the weights of the respective various punctuation modes;
      
      determining the weight of the punctuation mode of each term or expression in the voice file to be processed according to the acquired corresponding relationships, and determining the sum of weight of the various punctuation modes of the voice file to be processed according to the weight of the punctuation mode of each term or expression.
  - 7. The method according to claim 1, wherein the semantic features comprise a part of speech and/or content of a sentence.

8. A system for adding punctuation to a voice file comprises at least a processor working in conjunction with a memory and a plurality of modules, wherein the modules comprise at least:
- a silence or pause duration detection module, an identification module and a punctuation addition module, wherein;
  
  the silence or pause duration detection module divides a voice file to be processed into a plurality of speech segments to be processed based on silence or pause detection;
  
  the identification module identifies all features units that appear in the voice file according to every term or expression and semantics features of the every term or expression that form each of the plurality of speech segments;
  
  the punctuation addition module uses a linguistic model to determine a sum of weight of various punctuation modes in the voice file according to all the feature units, wherein the linguistic model is built upon semantics features of various parsed out terms or expressions from a body text of a spoken sentence according to a language library; and
  
  adding punctuations to the voice file based on the determined sum of weight of the various punctuation modes.
- View Dependent Claims (9, 10, 11, 12, 13)
- - 9. The system according to claim 8, wherein:
    - the silence or pause duration detection module performs;
      
      determining the silence or pause duration threshold according to a current application scenario;
      
      detecting the silence or pause duration in the voice file to be processed, and when the silence duration is longer than the silence threshold; and
      
      separating the voice segments to be processed in the voice file at locations that correspond to the silence or pause duration.
  - 10. The system according to claim 8, wherein:
    - the identification module performs;
      
      identifying the feature unit that appears in each of the plurality of speech segments to be processed; and
      
      gathering into a set, the feature units that appear in the plurality of speech segments to form the all the feature units.
  - 11. The system according to claim 8, wherein the linguistic model is a linguistic model built through the following steps:
    - parsing out the various terms or expressions from the body text of the spoken sentence, wherein punctuations have already been added in advance to the spoken sentence according to the language library;
      
      searching the feature units according to the semantics feature of each separated term or expression as a result of the parsing on the body text of the spoken sentence, and according to a preset feature template;
      
      recording a number of occurrences of each punctuation mode in each feature unit in the body text of the spoken sentence, according to the punctuation mode that follows the term or expression in the feature unit;
      
      determining a weight of each punctuation mode of each feature unit according to the number of occurrences of each punctuation mode of each feature unit; and
      
      building the linguistic model which comprises every feature unit and its respective punctuation mode with a corresponding weight relationship.
  - 12. The system according to claim 8, wherein the feature units include single term or expression feature units and/or multi-term or expression feature units;
    - the identification module performs;
      
      taking a respective location of each term or expression as the current reference location, determining the single term or expression whose relative location relationship with the current reference location fulfills the requirements of the single term or expression feature template according to the single term or expression feature template;
      
      identifying the single term or expression feature unit according to the semantic features of the single term or expression, the single term or expression feature unit comprises the single term or expression, the semantic features of the single term or expression and the relative location relationship of the single term or expression with the current reference location.
  - 13. The system according to claim 8, wherein:
    - the punctuation addition module performs;
      
      acquiring from the linguistic model corresponding relationships between each feature unit among all the feature units and the weights of the respective various punctuation modes;
      
      determining the weight of the punctuation mode of each term or expression in the voice file to be processed according to the acquired corresponding relationships, and determining the sum of weight of the various punctuation modes of the voice file to be processed according to the weight of the punctuation mode of each term or expression.

14. A non-transitory computer-readable medium having stored thereon, a computer program having at least one code section being executable by a machine for causing the machine to perform steps comprising:
- utilizing silence or pause duration detection to divide a voice file into a plurality of speech segments for processing, the voice file comprising a plurality of features units;
  
  identifying all features units that appear in the voice file according to every term or expression and semantics features of the every term or expression that form each of the plurality of speech segments;
  
  using a linguistic model to determine a sum of weight of various punctuation modes in the voice file according to all the feature units, wherein the linguistic model is built upon semantics features of various parsed out terms or expressions from a body text of a spoken sentence according to a language library; and
  
  adding punctuations to the voice file based on the determined sum of weight of the various punctuation modes.
- View Dependent Claims (15, 16, 17, 18, 19, 20)
- - 15. The non-transitory computer-readable medium according to claim 14, wherein the silence or pause detection comprises:
    - determining a silence or pause duration threshold according to a current application scenario;
      
      detecting the silence or pause duration in the voice file to be processed, andwhen the silence or pause duration is longer than the silence threshold;
      
      separating the speech segments in the voice file at locations that correspond to the silence or pause duration.
  - 16. The non-transitory computer-readable medium according to claim 14, wherein the identifying of all the features units that appears in the voice file, comprises:
    - identifying the feature unit that appears in each of the plurality of speech segments to be processed; and
      
      gathering into a set, the feature units that appear in the plurality of speech segments to form the all the feature units.
  - 17. The non-transitory computer-readable medium according to claim 14, wherein the building of the linguistic model comprises:
    - parsing out the various terms or expressions from the body text of the spoken sentence, wherein punctuations have already been added in advance to the spoken sentence according to the language library;
      
      searching the feature units according to the semantics feature of each parsed out term or expression and according to a preset feature template;
      
      recording a number of occurrences of each punctuation mode in each feature unit in the body text of the spoken sentence, according to the punctuation mode that follows the term or expression in the feature unit;
      
      determining a weight of each punctuation mode of each feature unit according to the number of occurrences of each punctuation mode of each feature unit; and
      
      building the linguistic model which comprises every feature unit and its respective punctuation mode with a corresponding weight relationship.
  - 18. The non-transitory computer-readable medium according to claim 14, wherein the feature units comprise:
    - a single term or expression feature unit and/or a multi-term or expression feature unit, wherein;
      
      the single term or expression feature unit is acquired according to a single term or expression feature template, and the multi-term or expression feature unit is acquired according to a multi-term or expression feature template;
      
      the single term or expression feature template perform functions, comprising;
      
      acquiring the single term or expression whose current reference location in relation to its relative location fulfills a predetermined requirement, and the semantics features of the single term or expression, andthe acquiring of the single term or expression feature unit which is based on the single term or expression feature template, comprising;
      
      taking a respective location of each term or expression as the current reference location, determining the single term or expression whose relative location relationship with the current reference location fulfills the requirements of the single term or expression feature template according to the single term or expression feature template;
      
      identifying the single term or expression feature unit according to the semantic features of the single term or expression, the single term or expression feature unit comprises the single term or expression, the semantic features of the single term or expression and the relative location relationship of the single term or expression with the current reference location; and
      
      the multi-term or expression feature template includes acquiring multiple term or expression whose relative location relationship with the current reference location fulfills the predetermined requirements and the semantics features of each of the multiple terms or expressions, andthe multi-term or expression feature units perform functions, comprising;
      
      acquiring the respective location of each term or expression as the current reference location, determining the multiple terms or expressions whose relative location relationship with the current reference location fulfills the requirements of the multi-term or expression feature template according to the multi-term or expression feature template;
      
      identifying multi-term or expression feature unit according to the semantic features of each of the multiple terms or expressions, the multi-term or expression feature unit comprises the multiple terms or expressions the semantic features of the each of the multiple terms or expressions and the relative location relationship of each of the multiple terms or expressions with the current reference location.
  - 19. The non-transitory computer-readable medium according to claim 14, wherein the determining of the sum of weight on various modes of punctuations in the voice file according to all the feature units comprises:
    - acquiring from the linguistic model corresponding relationships between each feature unit among all the feature units and the weights of the respective various punctuation modes;
      
      determining the weight of the punctuation mode of each term or expression in the voice file to be processed according to the acquired corresponding relationships, and determining the sum of weight of the various punctuation modes of the voice file to be processed according to the weight of the punctuation mode of each term or expression.
  - 20. The non-transitory computer-readable medium according to claim 14, wherein the semantic features comprise a part of speech and/or content of a sentence.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Tencent Technology Company Limited (Tencent Holdings Limited)
Original Assignee
Tencent Technology Shenzhen Company Limited (Tencent Holdings Limited)
Inventors
LIU, Haibo, WANG, Eryu, ZHANG, Xiang, LU, Li, YUE, Shuai, CHEN, Bo, LI, Lou, LIU, Jian

Granted Patent

US 9,442,910 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/9
CPC Class Codes

G06F 40/166   Editing, e.g. inserting or ...

G06F 40/30   Semantic analysis

G10L 15/04   Segmentation; Word boundary...

METHOD AND SYSTEM FOR ADDING PUNCTUATION TO VOICE FILES

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

METHOD AND SYSTEM FOR ADDING PUNCTUATION TO VOICE FILES

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links