Client device for interacting with a mixed media reality recognition system

US 9,020,966 B2
Filed: 12/19/2008
Issued: 04/28/2015
Est. Priority Date: 07/31/2006
Status: Active Grant

First Claim

Patent Images

1. A method for generating and processing a retrieval request for a visual recognition system, the method comprising:

receiving an image and generating an image query from the image;

receiving audio data associated with the image, the audio data specifying at least one of a location within a document to be retrieved, at least one type of recognition algorithm for recognizing the image, and an order of the at least one type of recognition algorithm that determines a next algorithm used to recognize the image if a previous algorithm fails;

performing command and data recognition on the audio data to produce audio recognition results for improving image recognition, wherein the audio recognition results include a keyword;

performing retrieval of the document from a database of documents based on the image query and the audio recognition results to produce a retrieval result including a document identification, a portion of the document and an x-y location of the image on the portion of the document, wherein performing retrieval of the document includes performing image recognition on the image based on the image query to produce image recognition results from the database of documents, generating confidence scores associated with the image recognition results, modifying the confidence scores using the keyword and identifying the document based on the modified confidence scores; and

providing the document to a user or performing an action based on the document.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The mobile device includes a client that has a number of modules, and the MMR Gateway and MMR matching unit are implemented as a server that has a number of modules. The implementation of the MMR system as a client and a server is advantageous because the modules may be distributed among the client and the server in a variety of configurations. The present invention includes a capture module, a preprocessing module, a feature extraction module, a retrieval module, a send message module, an action module, a prediction module, a feedback module, a sending module, an MMR database, a streaming module, an e-mail module, a voice recognition system and an audio database. These modules and systems are operational upon the client or the server.

472 Citations

23 Claims

1. A method for generating and processing a retrieval request for a visual recognition system, the method comprising:
- receiving an image and generating an image query from the image;
  
  receiving audio data associated with the image, the audio data specifying at least one of a location within a document to be retrieved, at least one type of recognition algorithm for recognizing the image, and an order of the at least one type of recognition algorithm that determines a next algorithm used to recognize the image if a previous algorithm fails;
  
  performing command and data recognition on the audio data to produce audio recognition results for improving image recognition, wherein the audio recognition results include a keyword;
  
  performing retrieval of the document from a database of documents based on the image query and the audio recognition results to produce a retrieval result including a document identification, a portion of the document and an x-y location of the image on the portion of the document, wherein performing retrieval of the document includes performing image recognition on the image based on the image query to produce image recognition results from the database of documents, generating confidence scores associated with the image recognition results, modifying the confidence scores using the keyword and identifying the document based on the modified confidence scores; and
  
  providing the document to a user or performing an action based on the document.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. The method of claim 1, further comprising receiving metadata associated with the image.
  - 3. The method of claim 1, wherein the audio recognition results are used to specify a subject of the document.
  - 4. The method of claim 1, wherein the audio recognition results are used to improve an accuracy of retrieval.
  - 5. The method of claim 1, wherein the audio recognition results are also used to specify how retrieval is performed.
  - 6. The method of claim 1, wherein the audio recognition results are also used to specify the action.
  - 7. The method of claim 1, wherein the audio recognition results are also used for biometric verification of the user that transmitted the image.
  - 8. The method of claim 1, whereinmodifying the confidence scores using the keyword includes increasing a confidence score by a certain increment for each keyword found in an image recognition result associated with the confidence score.
  - 9. The method of claim 1, further comprising:
    - receiving metadata associated with the image; and
      
      wherein modifying the confidence scores is also based on the metadata.
  - 10. The method of claim 1, further comprising using the audio recognition results to select one of a plurality of databases to use for performing the retrieval.
  - 11. The method of claim 1, wherein the audio data is from a video.

12. A method for generating and processing a retrieval request in a distributed visual recognition system, the method comprising:
- receiving an image and generating an image query from the image;
  
  receiving audio data associated with the image, the audio data specifying at least one of a location within a document to be retrieved, at least one type of recognition algorithm for recognizing the image, and an order of the at least one type of recognition algorithm that determines a next algorithm used to recognize the image if a previous algorithm fails;
  
  performing command and data recognition on the audio data to produce audio recognition results for improving image recognition, wherein the audio recognition results include a keyword;
  
  performing retrieval of the document from a database of documents based on the image query and the audio recognition results, wherein performing retrieval of the document includes performing image recognition on the image based on the image query to produce image recognition results from the database of documents, generating confidence scores associated with the image recognition results, modifying the confidence scores using the keyword and identifying the document based on the modified confidence scores; and
  
  generating and sending a first message including a document identification, a portion identification and an x-y location of the image on the portion of the document.
- View Dependent Claims (13, 14, 15, 16)
- - 13. The method of claim 12 further comprising:
    - performing an action based on the document identification, the portion identification and the x-y location.
  - 14. The method of claim 12 wherein retrieval of the document is performed on one from the group of a mobile device and a hardware server.
  - 15. The method of claim 13 wherein performing the action includes:
    - generating and sending a second message, the second message including hotspot data.
  - 16. The method of claim 12 wherein performing retrieval of the document comprises:
    - performing feature extraction on the image query to produce extracted features; and
      
      querying the database of documents using the extracted features to generate the image recognition results including the document identification, the portion identification and the x-y location.

17. A method for generating and processing a retrieval request for a visual recognition system, the method comprising:
- receiving an image and generating an image query from the image;
  
  receiving audio data and metadata associated with the image, the audio data specifying at least one of a location within a document to be retrieved, at least one type of recognition algorithm for recognizing the image, and an order of the at least one type of recognition algorithm that determines a next algorithm used to recognize the image if a previous algorithm fails;
  
  performing command and data recognition on the audio data to produce audio recognition results for improving image recognition, wherein the audio recognition results include a keyword;
  
  performing retrieval of the document from a database of documents based on the image query, the audio recognition results and the metadata to produce a retrieval result including a document identification, a portion of the document and an x-y location of the image on the portion of the document, wherein performing retrieval of the document includes performing image recognition on the image based on the image query to produce image recognition results from the database of documents, generating confidence scores associated with the image recognition results, modifying the confidence scores using the keyword and identifying the document based on the modified confidence scores; and
  
  performing an action based on the document.
- View Dependent Claims (18, 19, 20, 21)
- - 18. The method of claim 17, wherein receiving the image and receiving the audio data and the metadata are performed by a capture device and wherein the capture device generates a first message that includes the image, the audio data and the metadata in a multimedia messaging service (MMS) format.
  - 19. The method of claim 17, wherein the metadata includes one from the group of an email address, a user identification, a preference, global positioning system (GPS) information, device settings, metadata about a query image, a query image location, an audio file, a query image document name and information about the action.
  - 20. The method of claim 17, wherein the action is one from the group of sending a return confirmation message, sending the document, sending the portion of the document, sending the x-y location, sending a thumbnail of the document, sending a video overview of the document, sending a message about the action performed, and sending instructions about receiving the image.
  - 21. The method of claim 17, wherein performing retrieval of the document and performing the action are performed by a server coupled to a mobile device.

22. A system for generating and processing a retrieval request for a visual recognition, the system comprising:
- a processor;
  
  a send message module stored on a memory and executable by the processor, the send message module for receiving an image query and audio data associated with an image, the audio data specifying at least one of a location within a document to be retrieved, at least one type of recognition algorithm for recognizing the image, and an order of the at least one type of recognition algorithm that determines a next algorithm used to recognize the image if a previous algorithm fails;
  
  a voice recognition module coupled to the send message module, the voice recognition module for performing command and data recognition on the audio data to produce audio recognition results for improving image recognition, wherein the audio recognition results include a keyword; and
  
  a retrieval module coupled to the send message module, the retrieval module for performing retrieval of the document from a database of documents based on the image query and the audio recognition results to produce a retrieval result including a document identification, a portion of the document and an x-y location of the image on the portion of the document, wherein performing retrieval of the document includes performing image recognition on the image based on the image query to produce image recognition results from the database of documents, generating confidence scores associated with the image recognition results, modifying the confidence scores using the keyword and identifying the document based on the modified confidence scores;
  
  wherein the send message module provides the document to a user or performs an action based on the document retrieved by the retrieval module.

23. A non-transitory computer readable storage medium comprising a computer readable program, wherein the computer readable program when executed on a computer causes the computer to:
- receive an image and generate an image query from the image;
  
  receive audio data associated with the image, the audio data specifying at least one of a location within a document to be retrieved, at least one type of recognition algorithm for recognizing the image, and an order of the at least one type of recognition algorithm that determines a next algorithm used to recognize the image if a previous algorithm fails;
  
  perform command and data recognition on the audio data to produce audio recognition results for improving image recognition, wherein the audio recognition results include a keyword;
  
  perform retrieval of the document from a database of documents based on the image query and the audio recognition results to produce a retrieval result including a document identification, a portion of the document and an x-y location of the image on the portion of the document, wherein performing retrieval of the document includes performing image recognition on the image based on the image query to produce image recognition results from the database of documents, generating confidence scores associated with the image recognition results, modifying the confidence scores using the keyword and identifying the document based on the modified confidence scores; and
  
  providing the document to a user or performing an action based on the document.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Ricoh Company Limited
Original Assignee
Ricoh Company Limited
Inventors
Erol, Berna, Moraleda, Jorge, Hull, Jonathan J.
Primary Examiner(s)
Liao, Jason
Assistant Examiner(s)
Mitiku, Berhanu

Application Number

US12/340,124
Publication Number

US 20090100050A1
Time in Patent Office

2,321 Days
Field of Search
US Class Current

707/769
CPC Class Codes

G06F 16/433   using audio data

G06F 16/434   using image data, e.g. imag...

G06F 16/48   Retrieval characterised by ...

G06F 16/583   using metadata automaticall...

G06F 16/955   using information identifie...

G06F 18/21   Design or setup of recognit...

G06F 18/217   Validation; Performance eva...

G06F 18/254   of classification results, ...

G06F 18/285   Selection of pattern recogn...

G06V 10/809   of classification results, ...

G06V 10/95   structured as a network, e....

G06V 10/993   Evaluation of the quality o...

G06V 30/19113   Selection of pattern recogn...

G06V 30/1916   Validation; Performance eva...

G06V 30/414   Extracting the geometrical ...

Client device for interacting with a mixed media reality recognition system

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

472 Citations

23 Claims

Specification

Solutions

Use Cases

Quick Links

Client device for interacting with a mixed media reality recognition system

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

472 Citations

23 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links