Health monitoring system and appliance

US 10,573,314 B2
Filed: 02/27/2019
Issued: 02/25/2020
Est. Priority Date: 02/28/2018
Status: Active Grant

First Claim

Patent Images

1. An electronic device configured to process audible expressions from users, comprising:

a network interface;

at least one computing device; and

computer readable memory including instructions operable to be executed by the at least one computing device to perform a set of actions, configuring the at least one computing device to;

receive in real time, over a network via the network interface, a digitized human vocal expression of a first user and one or more digital images from a remote device;

process, remotely from the remote device, the received digitized human vocal expression using digital signal processing to convert the digitized audible expression from a time domain to a frequency domain;

use the processed digitized human vocal expression to determine characteristics of the human vocal expression, including;

determining, using a volume analysis module a volume of the human vocal expression,determining, using a rapidity analysis module that detects quiet time using a power spectrum of the human vocal expression, how rapidly the first user is speaking in the human vocal expression,determining, using a vocal tract analysis module, a magnitude spectrum of the human vocal expression, andidentifying, using a non-speech analysis module, pauses and the length of pauses in speech in the human vocal expression;

use a natural language module to;

identify phonemes in the human vocal expression and map the phonemes to words, to convert audible speech in the human vocal expression to text,divide the text into text elements including words, sentences, and paragraphs,understand audible speech in the human vocal expression using semantic analysis that assigns respective logical and grammatical roles to the text elements, anddetect violations of grammar rules in the text obtained from the human vocal expression to obtain detected grammar violations;

compare the determined characteristics of the human vocal expression with baseline, historical characteristics of human vocal expressions associated with the first user to identify changes in human vocal expression characteristics of the first user as identified vocal changes;

process the received one or more images to detect characteristics of the first user face, including determining the presence of;

a sagging lip, a crooked smile, uneven eyebrows, or facial droop;

compare the detected characteristics of the first user face with baseline, historical characteristics of the first user face accessed from a data store, and identify changes in characteristics of the first user face as identified facial changes;

weight, using a first weight, a first identified change, of the identified vocal changes, with respect to a first vocal expression characteristic of the first user;

weight, using a second weight, a second identified change, of the identified vocal changes, with respect to a second vocal expression characteristic of the first user;

weight, using a third weight, a third identified change, of the identified facial changes, with respect to a first characteristic of the first user face;

weight, using a fourth weight, a fourth identified change, of the identified facial changes, with respect to a second characteristic of the first user face;

weight, using a fifth weight, the detected grammar violations;

infer a change in health status of the first user using the weighted first identified change with respect to the first vocal expression characteristic of the first user, the weighted second identified change with respect to the second vocal expression characteristic of the first user, the weighted third identified change with respect to the first characteristic of the first user face, the weighted fourth identified change with respect to the second characteristic of the first user face and the weighted detected grammar violations;

based at least in part on the inferred change in health status of the first user determine if a vehicle is to be deployed to the first user; and

at least partly in response to a determination that a vehicle is to be deployed to the first user, enable a vehicle to be deployed to a location of the first user.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Systems and methods are disclosed. A digitized human vocal expression of a user and digital images are received over a network from a remote device. The digitized human vocal expression is processed to determine characteristics of the human vocal expression, including: pitch, volume, rapidity, a magnitude spectrum identify, and/or pauses in speech. Digital images are received and processed to detect characteristics of the user face, including detecting if one or more of the following is present: a sagging lip, a crooked smile, uneven eyebrows, and/or facial droop. Based at least on part on the human vocal expression characteristics and face characteristics, a determination is made as to what action is to be taken. A cepstrum pitch may be determined using an inverse Fourier transform of a logarithm of a spectrum of a human vocal expression signal. The volume may be determined using peak heights in a power spectrum of the human vocal expression.

174 Citations

26 Claims

1. An electronic device configured to process audible expressions from users, comprising:
- a network interface;
  
  at least one computing device; and
  
  computer readable memory including instructions operable to be executed by the at least one computing device to perform a set of actions, configuring the at least one computing device to;
  
  receive in real time, over a network via the network interface, a digitized human vocal expression of a first user and one or more digital images from a remote device;
  
  process, remotely from the remote device, the received digitized human vocal expression using digital signal processing to convert the digitized audible expression from a time domain to a frequency domain;
  
  use the processed digitized human vocal expression to determine characteristics of the human vocal expression, including;
  
  determining, using a volume analysis module a volume of the human vocal expression,determining, using a rapidity analysis module that detects quiet time using a power spectrum of the human vocal expression, how rapidly the first user is speaking in the human vocal expression,determining, using a vocal tract analysis module, a magnitude spectrum of the human vocal expression, andidentifying, using a non-speech analysis module, pauses and the length of pauses in speech in the human vocal expression;
  
  use a natural language module to;
  
  identify phonemes in the human vocal expression and map the phonemes to words, to convert audible speech in the human vocal expression to text,divide the text into text elements including words, sentences, and paragraphs,understand audible speech in the human vocal expression using semantic analysis that assigns respective logical and grammatical roles to the text elements, anddetect violations of grammar rules in the text obtained from the human vocal expression to obtain detected grammar violations;
  
  compare the determined characteristics of the human vocal expression with baseline, historical characteristics of human vocal expressions associated with the first user to identify changes in human vocal expression characteristics of the first user as identified vocal changes;
  
  process the received one or more images to detect characteristics of the first user face, including determining the presence of;
  
  a sagging lip, a crooked smile, uneven eyebrows, or facial droop;
  
  compare the detected characteristics of the first user face with baseline, historical characteristics of the first user face accessed from a data store, and identify changes in characteristics of the first user face as identified facial changes;
  
  weight, using a first weight, a first identified change, of the identified vocal changes, with respect to a first vocal expression characteristic of the first user;
  
  weight, using a second weight, a second identified change, of the identified vocal changes, with respect to a second vocal expression characteristic of the first user;
  
  weight, using a third weight, a third identified change, of the identified facial changes, with respect to a first characteristic of the first user face;
  
  weight, using a fourth weight, a fourth identified change, of the identified facial changes, with respect to a second characteristic of the first user face;
  
  weight, using a fifth weight, the detected grammar violations;
  
  infer a change in health status of the first user using the weighted first identified change with respect to the first vocal expression characteristic of the first user, the weighted second identified change with respect to the second vocal expression characteristic of the first user, the weighted third identified change with respect to the first characteristic of the first user face, the weighted fourth identified change with respect to the second characteristic of the first user face and the weighted detected grammar violations;
  
  based at least in part on the inferred change in health status of the first user determine if a vehicle is to be deployed to the first user; and
  
  at least partly in response to a determination that a vehicle is to be deployed to the first user, enable a vehicle to be deployed to a location of the first user.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The electronic device as defined in claim 1, further comprising a pitch analysis module configured to estimate a quasiperiodic signal period of the human vocal expression and determine the pitch using the estimated quasiperiodic signal period.
  - 3. The electronic device as defined in claim 1, further comprising a pitch analysis module configured to determine a cepstrum pitch using an inverse Fourier transform (IFT) of a logarithm of an estimated spectrum of a human vocal expression signal.
  - 4. The electronic device as defined in claim 1, wherein the volume analysis module is configured to determine the volume of the human vocal expression based at least in part on peak heights in a power spectrum of the human vocal expression.
  - 5. The electronic device as defined in claim 1, wherein the rapidity analysis module is configured to determine how rapidly the first user is speaking based at least in part on a determination of how many words are spoken by the first user over a first period of time.
  - 6. The electronic device as defined in claim 1, wherein the non-speech analysis module is configured to identify pauses in speech in the human vocal expression using a power and/or a magnitude spectrum of the human vocal expression.
  - 7. The electronic device as defined in claim 1, wherein the at least one computing device is configured to determine if an occlusion of eyes of the first user by eyelids of the first user indicates an adverse health state.

8. An electronic device, comprising:
- a network interface;
  
  at least one computing device; and
  
  computer readable memory including instructions operable to be executed by the at least one computing device to perform a set of actions, configuring the at least one computing device to;
  
  receive, over a network via the network interface, a digitized human vocal expression of a first user and one or more digital images of the first user from a first source;
  
  process, remotely from the first source, the received digitized human vocal expression using digital signal processing to convert the digitized audible expression from a time domain to a frequency domain;
  
  use the processed digitized human vocal expression to determine characteristics of the human vocal expression, including;
  
  determining a volume, magnitude, and a power spectrum of the human vocal expression, anddetecting quiet time using the power spectrum of the human vocal expression to determine pauses and the length of pauses in speech in the human vocal expression, and to determine how rapidly the first user is speaking in the human vocal expression;
  
  use a natural language module to;
  
  identify phonemes in the human vocal expression and map the phonemes to words, to convert audible speech in the human vocal expression to text,divide the text into text elements including words, sentences, and/or paragraphs,understand audible speech in the human vocal expression using semantic analysis, anddetect violations of grammar rules in the text obtained from the human vocal expression to obtain detected grammar violations;
  
  compare the determined characteristics of the human vocal expression with baseline, historical characteristics of human vocal expressions associated with the first user to identify changes in human vocal expression characteristics of the first user as identified vocal changes;
  
  process the received one or more images to detect characteristics of the first user face, including determining the presence of;
  
  a sagging lip, a crooked smile, uneven eyebrows, or facial droop;
  
  compare the detected characteristics of the first user face with baseline, historical characteristics of the first user face accessed from a data store, and identify changes in characteristics of the first user face as identified facial changes;
  
  weight, using a first weight, a first identified change, of the identified vocal changes, with respect to a first vocal expression characteristic of the first user;
  
  weight, using a second weight, a second identified change, of the identified vocal changes, with respect to a second vocal expression characteristic of the first user;
  
  weight, using a third weight, a third identified change, of the identified facial changes, with respect to a first characteristic of the first user face;
  
  weight, using a fourth weight, a fourth identified change, of the identified facial changes, with respect to a second characteristic of the first user face;
  
  weight, using a fifth weight, the detected grammar violations;
  
  infer a change in health status of the first user based at least in part on the weighted first identified change with respect to the first vocal expression characteristic of the first user, the weighted second identified change with respect to the second vocal expression characteristic of the first user, the weighted third identified change with respect to the first characteristic of the first user face, the weighted fourth identified change with respect to the second characteristic of the first user face, and the weighted detected grammar violations; and
  
  based at least in part on the inferred change in health status of the first user, cause a first action is to be taken.
- View Dependent Claims (9, 10, 11, 12, 13, 14, 15, 16)
- - 9. The electronic device as defined in claim 8, wherein the electronic device comprises a vehicle, and the first action comprises causing the vehicle to be prevented from being drivable or flyable.
  - 10. The electronic device as defined in claim 8, wherein the determined characteristics of the human vocal expression comprise pitch, and the electronic device is configured to estimate a quasiperiodic signal period of the human vocal expression and determine the pitch using the estimated quasiperiodic signal period.
  - 11. The electronic device as defined in claim 8, wherein the determined characteristics of the human vocal expression comprise pitch, and the electronic device is configured to determine a cepstrum pitch using an inverse Fourier transform (IFT) of a logarithm of an estimated spectrum of a human vocal expression signal.
  - 12. The electronic device as defined in claim 8, wherein the electronic device is configured to determine the volume of the human vocal expression based at least in part on peak heights in the power spectrum of the human vocal expression.
  - 13. The electronic device as defined in claim 8, wherein the electronic device is configured to determine how rapidly the first user is speaking based at least in part on a determination of how many words are spoken over a first period of time.
  - 14. The electronic device as defined in claim 8, wherein the electronic device is configured to identify pauses in speech in the human vocal expression using the power and the magnitude spectrum of the human vocal expression.
  - 15. The electronic device as defined in claim 8, wherein the electronic device is configured to determine if an occlusion of eyes of the first user by eyelids of the first user indicates an adverse health state.
  - 16. The electronic device as defined in claim 8, wherein the electronic device is configured to:
    - wherein the first action comprises generation of a notification and provision of the notification to one or more destinations, wherein the notification comprises;
      
      at least a portion of the received digitized human vocal expression,text corresponding to at least a portion of the received digitized human vocal expression,at least one received image, anda facial feature analysis.

17. A computer implemented method, comprising:
- receiving, at a system configured to process digitized human vocal expressions using digital signal processing, a digitized human vocal expression of a first user and one or more digital images of the first user from a first device;
  
  processing remotely from the first source, using digital signal processing, the received digitized human vocal expression to convert the digitized audible expression from a time domain to a frequency domain;
  
  using, by the system, the processed digitized human vocal expression to determine characteristics of the human vocal expression, including;
  
  determining a volume, magnitude, and a power spectrum of the human vocal expression,detecting quiet time using the power spectrum of the human vocal expression to determine pauses and the length of pauses in speech in the human vocal expression, and to determine how rapidly the first user is speaking in the human vocal expression,using natural language processing to;
  
  identify phonemes in the human vocal expression and map the phonemes to words, to convert audible speech in the human vocal expression to text,divide the text into text elements including words, sentences, and/or paragraphs,understand audible speech in the human vocal expression using semantic analysis, anddetect violations of grammar rules in the text obtained from the human vocal expression to obtain detected grammar violations; and
  
  comparing one or more of the determined characteristics of the human vocal expression with one or more baseline, historical characteristics of human vocal expressions associated with the first user as identified vocal changes;
  
  processing the received one or more images to detect characteristics of the first user face, including determining the presence of;
  
  a sagging lip, a crooked smile, uneven eyebrows, or facial droop;
  
  comparing the detected characteristics of the first user face with baseline, historical characteristics of the first user face accessed from a data store, and identify changes in characteristics of the first user face as identified facial changes;
  
  weighting by the system, using a first weight, a first identified change, of the identified vocal changes, with respect to a first vocal expression characteristic of the first user;
  
  weighting by the system, using a second weight, a second identified change, of the identified vocal changes, with respect to a second vocal expression characteristic of the first user;
  
  weighting by the system, using a third weight, a third identified change, of the identified facial changes, with respect to a first characteristic of the first user face;
  
  weighting by the system, using a fourth weight, a fourth identified change, of the identified facial changes, with respect to a second characteristic of the first user face;
  
  weighting by the system, using a fifth weight, the detected grammar violations;
  
  inferring, by the system, a change in health status of the first user based at least in part on the weighted first identified change with respect to the first vocal expression characteristic of the first user, the weighted second identified change with respect to the second vocal expression characteristic of the first user, the weighted third identified change with respect to the first characteristic of the first user face, the weighted fourth identified change with respect to the second characteristic of the first user face, and weighted detected grammar violations; and
  
  based at least in part on the inferred change in health status of the first user, causing a first action is to be taken.
- View Dependent Claims (18, 19, 20, 21, 22)
- - 18. The computer implemented method as defined in claim 17, wherein the first device comprises a vehicle, and the first action comprises causing the vehicle to be prevented from being drivable or flyable.
  - 19. The computer implemented method as defined in claim 17, wherein the determined characteristics of the human vocal expression comprise pitch, the method further comprising estimating a quasiperiodic signal period of the human vocal expression and determine the pitch using the estimated quasiperiodic signal period.
  - 20. The computer implemented method as defined in claim 17, the method further comprising determining the volume of the human vocal expression based at least in part on peak heights in the power spectrum of the human vocal expression.
  - 21. The computer implemented method as defined in claim 17, the method further comprising:
    - processing the received one or more images to detect occlusion of eyes of the first user by eyelids of the first user; and
      
      determining whether an occlusion of eyes of the first user by eyelids of the first user indicates an adverse health state,wherein the first action is caused to be taken based in part on the determination of whether an occlusion of eyes of the first user by eyelids of the first user indicates an adverse health state.
  - 22. The computer implemented method as defined in claim 17, the method further comprising:
    - wherein the first action comprises generating a notification and providing the notification to one or more destinations, wherein the notification comprises;
      
      at least a portion of the received digitized human vocal expression,text corresponding to at least a portion of the received digitized human vocal expression,at least one received image, anda facial feature analysis.

23. A computer implemented method, comprising:
- receiving from a first source, at a computerized device configured to process digitized human vocal expressions using digital signal processing, a digitized human vocal expression of a first user and one or more digital images of the first user;
  
  processing remote from the first source, using digital signal processing, the received digitized human vocal expression to convert the digitized audible expression from a time domain to a frequency domain;
  
  using, by the system, the processed digitized human vocal expression to determine characteristics of the human vocal expression, including;
  
  determining a volume, magnitude, and a power spectrum of the human vocal expression,determining how rapidly the first user is speaking in the human vocal expression by detecting quiet time using a power spectrum of the human vocal expression, anddetermining pauses and the length of pauses in speech in the human vocal expression;
  
  using natural language processing to;
  
  identify phonemes in the human vocal expression and map the phonemes to words, to convert audible speech in the human vocal expression to text,divide the text into text elements including words, sentences, and/or paragraphs,understand audible speech in the human vocal expression using semantic analysis, anddetect violations of grammar rules in the text obtained from the human vocal expression to obtain detected grammar violations;
  
  comparing, using the computerized device, one or more of the determined characteristics of the human vocal expression with one or more baseline, historical characteristics of human vocal expressions associated with the first user as identified vocal changes;
  
  processing the received one or more images to detect characteristics of the first user face, including determining the presence of;
  
  a sagging lip, a crooked smile, uneven eyebrows, or facial droop;
  
  comparing the detected characteristics of the first user face with baseline, historical characteristics of the first user face accessed from a data store, and identify changes in characteristics of the first user face as identified facial changes;
  
  weighting by the system, using a first weight, a first identified change, of the identified vocal changes, with respect to a first vocal expression characteristic of the first user;
  
  weighting by the system, using a second weight, a second identified change, of the identified vocal changes, with respect to a second vocal expression characteristic of the first user;
  
  weighting by the system, using a third weight, a third identified change, of the identified facial changes, with respect to a first characteristic of the first user face;
  
  weighting by the system, using a fourth weight, a fourth identified change, of the identified facial changes, with respect to a second characteristic of the first user face;
  
  weighting, using a fifth weight, the detected grammar violations;
  
  inferring, by the system, a change in health status of the first user based at least in part on the weighted first identified change with respect to the first vocal expression characteristic of the first user, the weighted second identified change with respect to the second vocal expression characteristic of the first user, the weighted third identified change with respect to the first characteristic of the first user face, the weighted fourth identified change with respect to the second characteristic of the first user face, and the weighted detected grammar violations; and
  
  based at least in part on the inferred change in health status of the first user, enabling a first action is to be taken.
- View Dependent Claims (24, 25, 26)
- - 24. The computer implemented method as defined in claim 23, wherein the computerized device comprises a vehicle, and the first action comprises causing the vehicle to be prevented from being drivable or flyable.
  - 25. The computer implemented method as defined in claim 23, the method further comprising determining how rapidly the first user is speaking based at least in part on a determination of how many words are spoken over a first period of time.
  - 26. The computer implemented method as defined in claim 23, the first action comprising generating a notification and providing the notification to one or more destinations, the notification comprising:
    - text corresponding to at least a portion of the received digitized human vocal expression, the text generated from the received digitized human vocal expression utilizing a speech-to-text module,at least a portion of the received digitized human vocal expression, andreceived video content of the first user, comprising one or more digital images of the first user.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
The Notebook, LLC
Original Assignee
The Notebook, LLC
Inventors
Khaleghi, Karen Elaine
Primary Examiner(s)
Sirjani, Fariba

Application Number

US16/286,986
Publication Number

US 20190267003A1
Time in Patent Office

363 Days
Field of Search
US Class Current
CPC Class Codes

A61B 5/165   Evaluating the state of min...

A61B 5/4803   Speech analysis specially a...

B60K 28/06   responsive to incapacity of...

B60W 2040/0818   Inactivity or incapacity of...

B60W 2050/0075   Automatic parameter input, ...

B60W 2540/22   Psychological state; Stress...

B60W 2540/26   Incapacity

B60W 2556/10   Historical data

B60W 2556/45   External transmission of da...

B60W 40/08   related to drivers or passe...

G06V 40/165   using facial parts and geom...

G06V 40/167   using comparisons between t...

G06V 40/171   Local features and componen...

G06V 40/174   Facial expression recognition

G10L 15/1815   Semantic context, e.g. disa...

G10L 15/1822   Parsing for meaning underst...

G10L 15/22   Procedures used during a sp...

G10L 2015/223   Execution procedure of a sp...

G10L 2015/227   of the speaker; Human-fact...

G10L 25/24   the extracted parameters be...

G10L 25/66 : for extracting parameters r...

G10L 25/78 : Detection of presence or ab...

G10L 25/90 : Pitch determination of spee...

G10L 25/93 : Discriminating between voic...

View All

Health monitoring system and appliance

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

174 Citations

26 Claims

Specification

Solutions

Use Cases

Quick Links

Health monitoring system and appliance

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

174 Citations

26 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links