Consolidating Speech Recognition Results

US 20130073286A1
Filed: 09/20/2011
Published: 03/21/2013
Est. Priority Date: 09/20/2011
Status: Abandoned Application

First Claim

Patent Images

1. A computer-implemented method for generating a consolidated list of speech recognition results, comprising:

at a processor, receiving a list of candidate interpretations of spoken input;

at the processor, forming a grid of tokens from the received list, the grid being organized into a plurality of rows and a plurality of columns;

at the processor, splitting the grid into a set of column groups based on timing information, each column group comprising a plurality of token groups, each token group comprising at least one token;

at the processor, responsive to detecting duplicated token groups in the grid, removing the duplicated token groups to generate a consolidated grid; and

at an output device, outputting the candidate interpretations based on the consolidated grid.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Candidate interpretations resulting from application of speech recognition algorithms to spoken input are presented in a consolidated manner that reduces redundancy. A list of candidate interpretations is generated, and each candidate interpretation is subdivided into time-based portions, forming a grid. Those time-based portions that duplicate portions from other candidate interpretations are removed from the grid. A user interface is provided that presents the user with an opportunity to select among the candidate interpretations; the user interface is configured to present these alternatives without duplicate elements.

323 Citations

60 Claims

1. A computer-implemented method for generating a consolidated list of speech recognition results, comprising:
- at a processor, receiving a list of candidate interpretations of spoken input;
  
  at the processor, forming a grid of tokens from the received list, the grid being organized into a plurality of rows and a plurality of columns;
  
  at the processor, splitting the grid into a set of column groups based on timing information, each column group comprising a plurality of token groups, each token group comprising at least one token;
  
  at the processor, responsive to detecting duplicated token groups in the grid, removing the duplicated token groups to generate a consolidated grid; and
  
  at an output device, outputting the candidate interpretations based on the consolidated grid.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
- - 2. The computer-implemented method of claim 1, wherein each candidate interpretation in the received list comprises a plurality of tokens, and wherein forming a grid of tokens from the received list comprises:
    - at the processor, for each token in each candidate interpretation, determining a start time and an end time;
      
      at the processor, forming a set of unique integers from the determined start and end times;
      
      at the processor, forming a grid comprising a number of rows corresponding to the number of candidate interpretations in the received list, each row comprising a number of cells corresponding to the number of unique integers in the set of unique integers, the cells being organized into columns; and
      
      at the processor, inserting each token into all cells spanned by the start and end times of the token.
  - 3. The computer-implemented method of claim 1, wherein each candidate interpretation in the received list comprises a plurality of tokens associated with start and end times, and wherein each column of the grid is associated with a start and end time, and wherein splitting the grid into a set of column groups based on timing information comprises:
    - at the processor, for each column in the grid;
      
      responsive to the column not already belonging to a column group, forming a column group including the current column;
      
      for each token in the column, determining whether any tokens in the column are associated with an end time that spans beyond the end time of the column; and
      
      responsive to any token in the column spanning beyond the end time of the column, adding the next column to the column group that includes the current column.
  - 4. The computer-implemented method of claim 1, wherein removing the duplicates to form a consolidated list of candidate interpretations comprises:
    - at the processor, defining a plurality of token phrases, each token phrase comprising at least one token appearing within a row of a column group; and
      
      for each column group in the grid;
      
      determining whether any token phrases are duplicated within the column group; and
      
      responsive to any token phrases being duplicated, deleting the duplicates.
  - 5. The computer-implemented method of claim 1, further comprising:
    - at the processor, responsive to any edge tokens being shared among token phrases within a column group having at least two tokens in all its token phrases, splitting the column group into a first column group comprising the shared edge tokens and a second column group comprising the at least one remaining token in the token phrases.
  - 6. The computer-implemented method of claim 1, further comprising:
    - at the processor, for each column group having at least two tokens in all its token phrases;
      
      responsive to any tokens appearing at the beginning of all token phrases in the column group, splitting the column group into a first column group comprising the first token and a second column group comprising the at least one remaining token in the token phrases; and
      
      responsive to any tokens appearing at the end of all token phrases in the column group, splitting the column group into a first column group comprising the last token and a second column group comprising the at least one remaining token in the token phrases.
  - 7. The computer-implemented method of claim 1, further comprising:
    - at the processor, responsive to any column group having a number of token phrases exceeding a predetermined threshold;
      
      removing at least one token phrase; and
      
      repeating the steps of splitting the grid and removing duplicates.
  - 8. The computer-implemented method of claim 1, wherein receiving the list of candidate interpretations of spoken input comprises:
    - at the processor, receiving a plurality of tokenized candidate interpretations, each candidate interpretation comprising a plurality of tokens; and
      
      at the processor, receiving timing information for each token.
  - 9. The computer-implemented method of claim 1, wherein forming the grid of tokens comprises:
    - at the processor, splitting the candidate interpretations in the received list into tokens;
      
      at the processor, selecting one of the candidate interpretations;
      
      at the processor, applying a differential algorithm to determine differences of each other candidate interpretation with respect to the selected candidate interpretation; and
      
      at the processor, forming a grid of tokens based on results of the differential algorithm.
  - 10. The computer-implemented method of claim 1, wherein the consolidated list of candidate interpretations comprises:
    - at least one column group having a single token group; and
      
      at least one column group having a plurality of token groups.
  - 11. The computer-implemented method of claim 10, wherein outputting the candidate interpretations comprises:
    - for each column group;
      
      responsive to the column group comprising a single token group, displaying the single token group on the output device; and
      
      responsive to the column group comprising a plurality of token groups, displaying the plurality of the token groups.
  - 12. The computer-implemented method of claim 10, wherein outputting the candidate interpretations comprises:
    - for each column group;
      
      responsive to the column group comprising a single token group, displaying the single token group on the output device; and
      
      responsive to the column group comprising a plurality of token groups, displaying, on the output device, a first one of the token groups, and displaying at least a subset of the remaining token groups in the column group as alternatives to the first token group.
  - 13. The computer-implemented method of claim 12, further comprising:
    - for at least one column group comprising a plurality of token groups, displaying, on the display device, a menu comprising least one alternative token group from the column group.
  - 14. The computer-implemented method of claim 10, wherein outputting the candidate interpretations comprises:
    - for each column group;
      
      responsive to the column group comprising a single token group, displaying the single token group on the output device; and
      
      responsive to the column group comprising a plurality of token groups, displaying, on the output device, a first one of the token groups in a visually distinctive manner as compared with the display of a column group comprising a single token group.
  - 15. The computer-implemented method of claim 14, wherein displaying the first one of the token groups in a visually distinctive manner comprises displaying the first one of the token groups in a manner that indicates a degree of confidence in the displayed token group.
  - 16. The computer-implemented method of claim 14, wherein displaying the first one of the token groups in a visually distinctive manner comprises displaying the first one of the token groups in a manner that indicates relative likelihood that the displayed token group is a correct interpretation of the spoken input.
  - 17. The computer-implemented method of claim 10, wherein outputting the candidate interpretations comprises:
    - for each column group;
      
      responsive to the column group comprising a single token group, displaying the single token group on the output device; and
      
      responsive to the column group comprising a plurality of token groups, displaying and highlighting a one of the token groups on the output device.
  - 18. The computer-implemented method of claim 17, further comprising:
    - at an input device, receiving user input associated with a highlighted token group; and
      
      responsive to the user input associated with a highlighted token group, displaying, on the display device, a menu comprising least one alternative token group from the same column group.
  - 19. The computer-implemented method of claim 18, further comprising:
    - at the input device, receiving user input selecting an alternative token group from the menu;
      
      responsive to the user input selecting an alternative token group from the menu, replacing the highlighted token group with the alternative token group.
  - 20. The computer-implemented method of claim 19, further comprising:
    - responsive to the user input selecting an alternative token group from the menu, providing the selected alternative token group to a speech recognition engine for training of the speech recognition engine.
  - 21. The computer-implemented method of claim 18, wherein receiving user input associated with a highlighted token group comprises user selection of the highlighted token group.
  - 22. The computer-implemented method of claim 18, wherein receiving user input associated with a highlighted token group comprises user contact with a touch-sensitive surface at a location corresponding to a displayed location of the highlighted token group.

23. A computer-implemented method for selecting among entries in a consolidated list of candidate interpretations of speech input, the method comprising:
- receiving, at a processor, a consolidated list of candidate interpretations of speech input, the consolidated list comprising at least one column group having a single token group and at least one column group having a plurality of token groups;
  
  for each column group having a single token group, displaying the single token group on an output device; and
  
  for each column group having a plurality of token groups, displaying, on the output device, a first one of the token groups in a visually distinctive manner as compared with the display of a column group comprising a single token group.
- View Dependent Claims (24, 25, 26, 27, 28, 29)
- - 24. The computer-implemented method of claim 23, further comprising:
    - for at least one column group comprising a plurality of token groups, displaying, on the display device, a menu comprising least one alternative token group from the column group.
  - 25. The computer-implemented method of claim 23, wherein displaying a first one of the token groups in a visually distinctive manner comprises highlighting the displayed token group.
  - 26. The computer-implemented method of claim 25, further comprising:
    - at an input device, receiving user input associated with a highlighted token group; and
      
      responsive to the user input associated with a highlighted token group, displaying, on the display device, a menu comprising least one alternative token group from the same column group.
  - 27. The computer-implemented method of claim 26, further comprising:
    - at the input device, receiving user input selecting an alternative token group from the menu;
      
      responsive to the user input selecting an alternative token group from the menu, replacing the highlighted token group with the alternative token group.
  - 28. The computer-implemented method of claim 26, wherein receiving user input associated with a highlighted token group comprises user selection of the highlighted token group.
  - 29. The computer-implemented method of claim 26, wherein receiving user input associated with a highlighted token group comprises user contact with a touch-sensitive surface at a location corresponding to a displayed location of the highlighted token group.

30. A computer-implemented method for generating a consolidated list of speech recognition results, comprising:
- at a processor running at a server, obtaining a list of candidate interpretations of spoken input;
  
  at the processor, forming a grid of tokens from the received list, the grid being organized into a plurality of rows and a plurality of columns;
  
  at the processor, splitting the grid into a set of column groups based on timing information, each column group comprising a plurality of token groups, each token group comprising at least one token;
  
  at the processor, responsive to detecting duplicated token groups in the grid, removing the duplicated token groups to form a consolidated list of candidates; and
  
  transmitting a representation of the consolidated list of candidates from the server to a client.
- View Dependent Claims (31, 32)
- - 31. The computer-implemented method of claim 30, wherein obtaining a list of candidate interpretations of spoken input comprises:
    - at the server, receiving a representation of an audio stream from the server; and
      
      at the processor, performing speech recognition analysis on the representation of an audio stream to generate a list of candidate interpretations of spoken input.
  - 32. The computer-implemented method of claim 30, wherein obtaining a list of candidate interpretations of spoken input comprises:
    - at the server, receiving a representation of an audio stream from the server;
      
      relaying the representation of the audio stream to a speech recognition server; and
      
      at the server, receiving a list of candidate interpretations of spoken input generated by the speech recognition server.

33. A computer program product for generating a consolidated list of speech recognition results, comprising:
- a nontransitory computer-readable storage medium; and
  
  computer program code, encoded on the medium, configured to cause at least one processor to perform the steps of;
  
  receiving a list of candidate interpretations of spoken input;
  
  forming a grid of tokens from the received list, the grid being organized into a plurality of rows and a plurality of columns;
  
  splitting the grid into a set of column groups based on timing information, each column group comprising a plurality of token groups, each token group comprising at least one token;
  
  responsive to detecting duplicated token groups in the grid, removing the duplicated token groups to generate a consolidated grid; and
  
  causing an output device to output the candidate interpretations based on the consolidated grid.
- View Dependent Claims (34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46)
- - 34. The computer program product of claim 33, wherein each candidate interpretation in the received list comprises a plurality of tokens, and wherein the computer program code configured to cause at least one processor to perform the step of forming a grid of tokens from the received list comprises computer program code configured to cause at least one processor to perform the steps of:
    - for each token in each candidate interpretation, determining a start time and an end time;
      
      forming a set of unique integers from the determined start and end times;
      
      forming a grid comprising a number of rows corresponding to the number of candidate interpretations in the received list, each row comprising a number of cells corresponding to the number of unique integers in the set of unique integers, the cells being organized into columns; and
      
      inserting each token into all cells spanned by the start and end times of the token.
  - 35. The computer program product of claim 33, wherein each candidate interpretation in the received list comprises a plurality of tokens associated with start and end times, and wherein each column of the grid is associated with a start end time, and wherein the computer program code configured to cause at least one processor to perform the step of splitting the grid into a set of column groups based on timing information comprises computer program code configured to cause at least one processor to perform the steps of:
    - for each column in the grid;
      
      responsive to the column not already belonging to a column group, forming a column group including the current column;
      
      for each token in the column, determining whether any tokens in the column are associated with an end time that spans beyond the end time of the column; and
      
      responsive to any token in the column spanning beyond the end time of the column, adding the next column to the column group that includes the current column.
  - 36. The computer program product of claim 33, wherein the computer program code configured to cause at least one processor to perform the step of removing the duplicates to form a consolidated list of candidate interpretations comprises computer program code configured to cause at least one processor to perform the steps of:
    - defining a plurality of token phrases, each token phrase comprising at least one token appearing within a row of a column group; and
      
      for each column group in the grid;
      
      determining whether any token phrases are duplicated within the column group; and
      
      responsive to any token phrases being duplicated, deleting the duplicates.
  - 37. The computer program product of claim 33, further comprising computer program code configured to cause at least one processor to perform the step of:
    - responsive to any edge tokens being shared among token phrases within a column group having at least two tokens in all its token phrases, splitting the column group into a first column group comprising the shared edge tokens and a second column group comprising the at least one remaining token in the token phrases.
  - 38. The computer program product of claim 33, further comprising computer program code configured to cause at least one processor to perform the step of:
    - for each column group having at least two tokens in all its token phrases;
      
      responsive to any tokens appearing at the beginning of all token phrases in the column group, splitting the column group into a first column group comprising the first token and a second column group comprising the at least one remaining token in the token phrases; and
      
      responsive to any tokens appearing at the end of all token phrases in the column group, splitting the column group into a first column group comprising the last token and a second column group comprising the at least one remaining token in the token phrases.
  - 39. The computer program product of claim 33, further comprising computer program code configured to cause at least one processor to perform the step of:
    - responsive to any column group having a number of token phrases exceeding a predetermined threshold;
      
      removing at least one token phrase; and
      
      repeating the steps of splitting the grid and removing duplicates.
  - 40. The computer program product of claim 33, wherein the consolidated list of candidate interpretations comprises:
    - at least one column group having a single token group; and
      
      at least one column group having a plurality of token groups.
  - 41. The computer program product of claim 40, wherein the computer program code configured to cause at least one processor to output the candidate interpretations comprises computer program code configured to cause at least one processor to perform the steps of:
    - for each column group;
      
      responsive to the column group comprising a single token group, causing the output device to display the single token group on the output device; and
      
      responsive to the column group comprising a plurality of token groups, causing the output device to display the plurality of the token groups.
  - 42. The computer program product of claim 40, wherein the computer program code configured to cause at least one processor to output the candidate interpretations comprises computer program code configured to cause at least one processor to perform the steps of:
    - for each column group;
      
      responsive to the column group comprising a single token group, causing the output device to display the single token group on the output device; and
      
      responsive to the column group comprising a plurality of token groups, causing the output device to display a first one of the token groups, and to display at least a subset of the remaining token groups in the column group as alternatives to the first token group.
  - 43. The computer program product of claim 42, further comprising computer program code configured to cause a display device display a menu comprising least one alternative token group from the column group.
  - 44. The computer program product of claim 40, wherein the computer program code configured to cause at least one processor to output the candidate interpretations comprises computer program code configured to cause at least one processor to perform the steps of:
    - for each column group;
      
      responsive to the column group comprising a single token group, causing the output device to display the single token group on the output device; and
      
      responsive to the column group comprising a plurality of token groups, causing the output device to display and highlight a one of the token groups on the output device.
  - 45. The computer program product of claim 44, further comprising computer program code configured to cause at least one processor to perform the steps of:
    - causing an input device to receive user input associated with a highlighted token group; and
      
      responsive to the user input associated with a highlighted token group, causing the output device to display a menu comprising least one alternative token group from the same column group.
  - 46. The computer program product of claim 45, further comprising computer program code configured to cause at least one processor to perform the steps of:
    - causing an input device to receive user input selecting an alternative token group from the menu;
      
      responsive to the user input selecting an alternative token group from the menu, causing the output device to replace the highlighted token group with the alternative token group.

47. A system for generating a consolidated list of speech recognition results, comprising:
- a processor, configured to;
  
  receive a list of candidate interpretations of spoken input;
  
  form a grid of tokens from the received list, the grid being organized into a plurality of rows and a plurality of columns;
  
  split the grid into a set of column groups based on timing information, each column group comprising a plurality of token groups, each token group comprising at least one token;
  
  responsive to detecting duplicated token groups in the grid, remove the duplicated token groups to generate a consolidated grid; and
  
  an output device, communicatively coupled to the processor and configured to output the candidate interpretations based on the consolidated grid.
- View Dependent Claims (48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60)
- - 48. The system of claim 47, wherein each candidate interpretation in the received list comprises a plurality of tokens, and wherein the processor forms the grid of tokens from the received list by:
    - for each token in each candidate interpretation, determining a start time and an end time;
      
      forming a set of unique integers from the determined start and end times;
      
      forming a grid comprising a number of rows corresponding to the number of candidate interpretations in the received list, each row comprising a number of cells corresponding to the number of unique integers in the set of unique integers, the cells being organized into columns; and
      
      inserting each token into all cells spanned by the start and end times of the token.
  - 49. The system of claim 47, wherein each candidate interpretation in the received list comprises a plurality of tokens associated with start and end times, and wherein each column of the grid is associated with a start and end time, and wherein the processor splits the grid into a set of column groups based on timing information by:
    - for each column in the grid;
      
      responsive to the column not already belonging to a column group, forming a column group including the current column;
      
      for each token in the column, determining whether any tokens in the column are associated with an end time that spans beyond the end time of the column; and
      
      responsive to any token in the column spanning beyond the end time of the column, adding the next column to the column group that includes the current column.
  - 50. The system of claim 47, wherein the processor removes the duplicates to form a consolidated list of candidate interpretations by:
    - defining a plurality of token phrases, each token phrase comprising at least one token appearing within a row of a column group; and
      
      for each column group in the grid;
      
      determining whether any token phrases are duplicated within the column group; and
      
      responsive to any token phrases being duplicated, deleting the duplicates.
  - 51. The system of claim 47, wherein the processor is further configured to perform the step of:
    - responsive to any edge tokens being shared among token phrases within a column group having at least two tokens in all its token phrases, splitting the column group into a first column group comprising the shared edge tokens and a second column group comprising the at least one remaining token in the token phrases.
  - 52. The system of claim 47, wherein the processor is further configured to perform the step of:
    - for each column group having at least two tokens in all its token phrases;
      
      responsive to any tokens appearing at the beginning of all token phrases in the column group, splitting the column group into a first column group comprising the first token and a second column group comprising the at least one remaining token in the token phrases; and
      
      responsive to any tokens appearing at the end of all token phrases in the column group, splitting the column group into a first column group comprising the last token and a second column group comprising the at least one remaining token in the token phrases.
  - 53. The system of claim 47, wherein the processor is further configured to perform the step of:
    - responsive to any column group having a number of token phrases exceeding a predetermined threshold;
      
      removing at least one token phrase; and
      
      repeating the steps of splitting the grid and removing duplicates.
  - 54. The system of claim 47, wherein the consolidated list of candidate interpretations comprises:
    - at least one column group having a single token group; and
      
      at least one column group having a plurality of token groups.
  - 55. The system of claim 54, wherein the output device outputs the candidate interpretations by:
    - for each column group;
      
      responsive to the column group comprising a single token group, displaying the single token group; and
      
      responsive to the column group comprising a plurality of token groups, displaying the plurality of the token groups.
  - 56. The system of claim 54, wherein the output device outputs the candidate interpretations by:
    - for each column group;
      
      responsive to the column group comprising a single token group, displaying the single token group; and
      
      responsive to the column group comprising a plurality of token groups, displaying a first one of the token groups, and displaying at least a subset of the remaining token groups in the column group as alternatives to the first token group.
  - 57. The system of claim 56, wherein the output device is configured to display a menu comprising least one alternative token group from the column group.
  - 58. The system of claim 54, wherein the output device outputs the candidate interpretations by:
    - for each column group;
      
      responsive to the column group comprising a single token group, displaying the single token group; and
      
      responsive to the column group comprising a plurality of token groups, displaying and highlighting a one of the token groups.
  - 59. The system of claim 58, further comprising an input device, configured to receive user input associated with a highlighted token group;
    - and wherein;
      
      responsive to the user input associated with a highlighted token group, the output device displays a menu comprising least one alternative token group from the same column group.
  - 60. The system of claim 59, wherein:
    - the input device receives user input selecting an alternative token group from the menu; and
      
      responsive to the user input selecting an alternative token group from the menu, the output device replaces the highlighted token group with the alternative token group.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Apple Inc.
Original Assignee
Apple Inc.
Inventors
Bastea-Forte, Marcello, Winarsky, David A.

Application Number

US13/236,942
Publication Number

US 20130073286A1
Time in Patent Office

Days
Field of Search
US Class Current

704/244
CPC Class Codes

G10L 15/22 Procedures used during a sp...

G10L 2015/221 Announcement of recognition...

Consolidating Speech Recognition Results

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

323 Citations

60 Claims

Specification

Solutions

Use Cases

Quick Links

Consolidating Speech Recognition Results

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

323 Citations

60 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links