Sensor enhanced speech recognition
First Claim
1. A system, comprising:
- a memory that stores instructions;
a processor that executes the instructions to perform operations, the operations comprising;
obtaining, from visual content, metadata associated with a user and an environment of the user;
identifying, based on the visual content and the metadata, an interferer and a location of the interferer in the environment;
obtaining audio content associated with the user and the environment;
enhancing a speech recognition process utilized for processing speech of the user that is within the audio content;
cancelling, after identifying the interferer and the location of the interferer in the environment and by utilizing an audio profile corresponding to the interferer, noise generated by the interferer that interferes with the speech of the user; and
adjusting, based on a user profile of the user and a location profile corresponding to a location of the user, a feature of an application executing the speech recognition process so as to tailor the application to the user, wherein adjusting the feature of the application comprises adjusting at least an audio feature of the application based on the user profile and the location profile.
1 Assignment
0 Petitions
Accused Products
Abstract
A system for sensor enhanced speech recognition is disclosed. The system may obtain visual content or other content associated with a user and an environment of the user. Additionally, the system may obtain, from the visual content, metadata associated with the user and the environment of the user. The system may also include determining, based on the visual content and metadata, if the user is speaking. If the user is determined to be speaking, the system may obtain audio content associated with the user and the environment. The system may then adapt, based on the visual content, audio content, and metadata, one or more acoustic models that match the user and the environment. Once the one or more acoustic models are adapted and loaded, the system may enhance a speech recognition process or other process associated with the user.
-
Citations
21 Claims
-
1. A system, comprising:
-
a memory that stores instructions; a processor that executes the instructions to perform operations, the operations comprising; obtaining, from visual content, metadata associated with a user and an environment of the user; identifying, based on the visual content and the metadata, an interferer and a location of the interferer in the environment; obtaining audio content associated with the user and the environment; enhancing a speech recognition process utilized for processing speech of the user that is within the audio content; cancelling, after identifying the interferer and the location of the interferer in the environment and by utilizing an audio profile corresponding to the interferer, noise generated by the interferer that interferes with the speech of the user; and adjusting, based on a user profile of the user and a location profile corresponding to a location of the user, a feature of an application executing the speech recognition process so as to tailor the application to the user, wherein adjusting the feature of the application comprises adjusting at least an audio feature of the application based on the user profile and the location profile. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A method, comprising:
-
extracting, from visual content, metadata associated with a user and an environment of the user; detecting, based on the visual content and the metadata, an interferer and a location of the interferer in the environment; obtaining audio content associated with the user and the environment; enhancing a speech recognition process utilized for processing speech of the user that is within the audio content, wherein the enhancing is performed by utilizing instructions from a memory that are executed by a processor; cancelling, after identifying the interferer and the location of the interferer in the environment and by utilizing an audio profile corresponding to the interferer, noise generated by the interferer that interferes with the speech of the user; and modifying, based on a user profile of the user and a location profile corresponding to a location of the user, a feature of an application executing the speech recognition process so as to tailor the application to the user, wherein modifying the feature of the application comprises modifying at least an audio feature of the application based on the user profile and the location profile. - View Dependent Claims (15, 16, 17, 18, 19, 20)
-
-
21. A non-transitory computer-readable device comprising instructions, which when executed by a processor, cause the processor to perform operations comprising:
-
extracting, from visual content, metadata associated with a user and an environment of the user; identifying, based on the visual content and the metadata, an interferer and a location of the interferer in the environment; capturing audio content associated with the user and the environment; enhancing a speech recognition process utilized for processing speech of the user that is within the audio content; cancelling, after identifying the interferer and the location of the interferer in the environment and by utilizing an audio profile corresponding to the interferer, noise generated by the interferer that interferes with the speech of the user; and modifying, based on a user profile of the user and a location profile corresponding to a location of the user, a feature of an application executing the speech recognition process so as to tailor the application to the user, wherein modifying the feature of the application comprises modifying at least an audio feature of the application based on the user profile and the location profile.
-
Specification