Filtering sensitive information

US 10,529,336 B1
Filed: 09/13/2017
Issued: 01/07/2020
Est. Priority Date: 09/13/2017
Status: Active Grant

First Claim

Patent Images

1. A non-transitory machine readable storage medium having instructions embodied thereon, the instructions being executable by one or more processors to perform acts comprising:

obtaining a text representation for a first audio block in a stream of audio blocks using a speech-to-text application, wherein the stream of audio blocks represents a conversation between at least two persons;

analyzing the text representation for the first audio block to generate metadata for the first audio block, wherein the metadata includes timestamps for at least one phrase in the first audio block;

comparing the text representation for the first audio block to pattern rules to identify a first portion of sensitive information in the first audio block, wherein a timestamp for the first portion of the sensitive information is identified in the metadata for the first audio block;

determining that the sensitive information extends into a second portion of sensitive information in an adjacent audio block in the stream of audio blocks;

combining the first audio block with the adjacent audio block to form a composite audio block; and

removing a portion of audio data from the composite audio block that corresponds to the first portion of sensitive information in the first audio block and the second portion of sensitive information in the adjacent audio block while the conversation is occurring between the at least two persons, wherein the portion of audio data is removed in accordance with the timestamp for the first portion of sensitive information in the first audio block and a second timestamp for the second portion of sensitive information in the adjacent audio block.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Technology is described for removing sensitive information. An audio block that represents a portion of a conversation may be identified. A text representation for the audio block may be obtained using a speech-to-text process. The text representation for the audio block may be compared to pattern rules to mark sensitive information in the audio block. A portion of audio data from the audio block marked as sensitive information may be removed in the audio block.

23 Citations

View as Search Results

20 Claims

1. A non-transitory machine readable storage medium having instructions embodied thereon, the instructions being executable by one or more processors to perform acts comprising:
- obtaining a text representation for a first audio block in a stream of audio blocks using a speech-to-text application, wherein the stream of audio blocks represents a conversation between at least two persons;
  
  analyzing the text representation for the first audio block to generate metadata for the first audio block, wherein the metadata includes timestamps for at least one phrase in the first audio block;
  
  comparing the text representation for the first audio block to pattern rules to identify a first portion of sensitive information in the first audio block, wherein a timestamp for the first portion of the sensitive information is identified in the metadata for the first audio block;
  
  determining that the sensitive information extends into a second portion of sensitive information in an adjacent audio block in the stream of audio blocks;
  
  combining the first audio block with the adjacent audio block to form a composite audio block; and
  
  removing a portion of audio data from the composite audio block that corresponds to the first portion of sensitive information in the first audio block and the second portion of sensitive information in the adjacent audio block while the conversation is occurring between the at least two persons, wherein the portion of audio data is removed in accordance with the timestamp for the first portion of sensitive information in the first audio block and a second timestamp for the second portion of sensitive information in the adjacent audio block.
- View Dependent Claims (2, 3)
- - 2. The non-transitory machine readable storage medium of claim 1, the acts further comprising providing the composite audio block without the portion of audio data to a monitoring entity, wherein the monitoring entity is able to perform quality assurance for the conversation.
  - 3. The non-transitory machine readable storage medium of claim 1, wherein the metadata for the first audio block further includes at least one of:
    - an audio block identifier, a customer identifier, a call identifier, or confidence scores for each word, phrase or number included in the first audio block.

4. A method, using one or more processors, comprising:
- obtaining a text representation for a first audio block in a stream of audio blocks using a speech-to-text process, wherein the stream of audio blocks represents a conversation between at least two persons;
  
  comparing the text representation for the first audio block to pattern rules to identify a first portion of target information in the first audio block;
  
  marking the first portion of the target information in the first audio block;
  
  determining that the target information extends into a second portion of sensitive information in an adjacent audio block in the stream of audio blocks;
  
  combining the first audio block with the adjacent audio block to form a composite audio block; and
  
  removing a portion of audio data from the composite audio block marked as the first portion of target information and the second portion of target information in the adjacent audio block while the conversation is occurring between the at least two persons.
- View Dependent Claims (5, 6, 7, 8, 9, 10, 11, 12, 13)
- - 5. The method of claim 4, further comprising providing the composite audio block without the portion of audio data to a monitoring entity.
  - 6. The method of claim 4, further comprising providing the composite audio block along with supplemental audio data in place of the portion of audio data removed from the composite audio block.
  - 7. The method of claim 4, further comprising:
    - removing audio data from a plurality of audio blocks that contain portions of target information; and
      
      storing the plurality of audio blocks without the portions of target information as a single audio file.
  - 8. The method of claim 4, further comprising analyzing the text representation for the first audio block to generate metadata for the first audio block, wherein the metadata includes at least one of:
    - timestamps for at least one phrase in the first audio block, an audio block identifier, a customer identifier, a call identifier, or confidence scores for each word, phrase or number included in the first audio block.
  - 9. The method of claim 4, wherein the portion of target information includes a portion of personally identifiable information (PII) of a person associated with the first audio block, wherein the PII includes at least one of:
    - a name, an address, a telephone number, an email address, a credit card number, a social security number, or payment information.
  - 10. The method of claim 4, wherein the pattern rules are configured to identify a plurality of predefined words, phrases, or numbers that have an increased likelihood of being associated with the first portion of target information, or a plurality of predefined words and phrases that have an increased likelihood of being immediately followed by the first portion of target information.
  - 11. The method of claim 4, further comprising verifying accuracy of the first portion of target information by comparing the first portion of target information to a data store of known target information.
  - 12. The method of claim 4, further comprising selecting the pattern rules depending on a context of the first audio block.
  - 13. The method of claim 4, further comprising deleting metadata for the first audio block after the portion of audio data has been removed.

14. A system, comprising:
- at least one processor;
  
  at least one memory device including a data store to store a plurality of data and instructions that, when executed, cause the system to;
  
  receive a first audio blocks in a stream of audio blocks that represents a conversation between at least two persons;
  
  generate a text representation of the first audio blocks using a speech-to-text service;
  
  analyze the text representation to generate metadata for the first audio blocks, wherein the metadata includes a timestamp for a phrase in the first audio blocks;
  
  compare the text representation to pattern rules to identify a first portion of sensitive information in the first audio blocks, wherein the timestamp for the first portion of the sensitive information is identified in the metadata for the first audio blocks;
  
  determine that the sensitive information extends into a second portion of sensitive information in an adjacent audio block in the stream of audio blocks;
  
  combine the first audio block with the adjacent audio block to form a composite audio block; and
  
  remove a portion of audio data from the composite audio blocks that contains the first portion of sensitive information and the second portion of sensitive information while the conversation is occurring between the at least two persons.
- View Dependent Claims (15, 16, 17, 18, 19, 20)
- - 15. The system of claim 14, wherein the plurality of data and instructions, when executed, cause the system to store the stream of audio blocks, without the portion of audio data, as a single audio file.
  - 16. The system of claim 14, wherein the metadata for the first audio blocks further includes at least one of:
    - a customer identifier, a call identifier, or confidence scores for each word, phrase, or number included in the first audio blocks.
  - 17. The system of claim 14, wherein the plurality of data and instructions, when executed, cause the system to select the pattern rules depending on a type of sensitive information to be identified from the first audio blocks.
  - 18. The system of claim 14, wherein the plurality of data and instructions, when executed, cause the system to providing the composite audio block without the portion of audio data to a monitoring entity.
  - 19. The system of claim 14, wherein the plurality of data and instructions, when executed, cause the system to provide the composite audio block along with supplemental audio data in place of the portion of audio data removed from the composite audio block.
  - 20. The system of claim 14, wherein the plurality of data and instructions, when executed, cause the system to verify accuracy of the first portion of sensitive information by comparing the first portion of sensitive information to a data store of known sensitive information.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Original Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Inventors
Matthews, Nicholas Channing, Yeras, Jeddel
Primary Examiner(s)
Abebe, Daniel

Application Number

US15/703,662
Time in Patent Office

846 Days
Field of Search

704250
US Class Current
CPC Class Codes

G06F 16/635   Filtering based on addition...

G06F 21/50   Monitoring users, programs ...

G06F 21/6245   Protecting personal data, e...

G06Q 30/016   After-sales

G10L 15/26   Speech to text systems G10L...

G10L 21/16   Transforming into a non-vis...

H04M 1/271   controlled by voice recogni...

H04M 1/656   for recording conversations

H04M 2201/40   using speech recognition sp...

H04M 2203/301   Management of recordings

H04M 2203/6009   Personal information, e.g. ...

H04M 2203/6027   Fraud preventions

H04M 3/42221   Conversation recording syst...

H04M 3/4936   Speech interaction details ...

H04M 3/5175   Call or contact centers sup...

Filtering sensitive information

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

23 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Filtering sensitive information

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

23 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links