Systems and methods for displaying foreign character sets and their translations in real time on resource-constrained mobile devices
First Claim
1. A method for translating a video feed in real-time augmented reality from a first language to a second language using a mobile device comprising a video camera, a processor, a memory, and a display, the method comprising the steps of:
- (a) capturing a frame in real-time from the video feed of one or more words in the first language which need to be translated using the video camera to produce a captured frame;
(b) cropping the captured frame to fit inside an image processing bounding box to produce a cropped frame;
(c) pre-processing the cropped frame to produce a pre-processed frame;
(d) performing character segment recognition on the pre-processed frame to produce a plurality of character segments;
(e) performing character merging on the character segments to produce a plurality of merged character segments;
(f) performing character recognition on the merged character segments to produce a recognized frame having a plurality of recognized characters;
(g) processing the recognized frame through a translation engine to produce a translation of the recognized characters in the first language into one or more words of the second language to produce a translated frame, while also calculating a translation quality representing how well the recognized characters have been translated for each translated frame;
(h) storing the translated frame to the memory as a current translated frame, wherein a previous translated frame and a previous translation quality is also stored in the memory;
(i) checking that the bounding box has stayed on a same set of characters for the current translated frame and the previous translated frame by determining a fraction of similar characters that are overlapping between the current translated frame and the previous translated frame, wherein a higher fraction indicates that the bounding box has stayed on the same set of characters for the current translated frame and the previous translated frame;
(j) comparing the translation quality determined by the translation engine for the current translated frame to the previous translation quality for the previous translated frame;
(k) selecting one of the previous translated frame and the current translated frame to be removed from the memory based on a frame having a lower translation quality; and
(l) displaying an optimal translated frame from the previous translated frame and the current translated frame, the optimal translated frame having a higher translation quality, wherein the words of the second language are overlaid over or next to the words in the first language which is being translated in an augmented reality on the display of the mobile device.
1 Assignment
0 Petitions
Accused Products
Abstract
The present invention is related to systems and methods for translating language text on a mobile camera device offline without access to the Internet. More specifically, the present invention relates to systems and methods for displaying text of a first language and a translation of the first language text into a second language text which is displayed in real time in augmented reality on the mobile device. The processing can use a single line or a multiline algorithm designed with a plurality of processing innovations to insure accurate real-time translations without motion jitter. The invention may be used to help travelers in a foreign country with difficulties in reading and understanding text written in the local language of that country. The present invention may be utilized with wearable computers or glasses, producing seamless augmented reality foreign language translations. Some embodiments are particularly useful in translations from Asian languages to English.
84 Citations
30 Claims
-
1. A method for translating a video feed in real-time augmented reality from a first language to a second language using a mobile device comprising a video camera, a processor, a memory, and a display, the method comprising the steps of:
-
(a) capturing a frame in real-time from the video feed of one or more words in the first language which need to be translated using the video camera to produce a captured frame; (b) cropping the captured frame to fit inside an image processing bounding box to produce a cropped frame; (c) pre-processing the cropped frame to produce a pre-processed frame; (d) performing character segment recognition on the pre-processed frame to produce a plurality of character segments; (e) performing character merging on the character segments to produce a plurality of merged character segments; (f) performing character recognition on the merged character segments to produce a recognized frame having a plurality of recognized characters; (g) processing the recognized frame through a translation engine to produce a translation of the recognized characters in the first language into one or more words of the second language to produce a translated frame, while also calculating a translation quality representing how well the recognized characters have been translated for each translated frame; (h) storing the translated frame to the memory as a current translated frame, wherein a previous translated frame and a previous translation quality is also stored in the memory; (i) checking that the bounding box has stayed on a same set of characters for the current translated frame and the previous translated frame by determining a fraction of similar characters that are overlapping between the current translated frame and the previous translated frame, wherein a higher fraction indicates that the bounding box has stayed on the same set of characters for the current translated frame and the previous translated frame; (j) comparing the translation quality determined by the translation engine for the current translated frame to the previous translation quality for the previous translated frame; (k) selecting one of the previous translated frame and the current translated frame to be removed from the memory based on a frame having a lower translation quality; and (l) displaying an optimal translated frame from the previous translated frame and the current translated frame, the optimal translated frame having a higher translation quality, wherein the words of the second language are overlaid over or next to the words in the first language which is being translated in an augmented reality on the display of the mobile device. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A mobile device for translating a video feed in real-time from a first language to a second language, the mobile device comprising:
-
a video camera for capturing the video feed of one or more words in the first language which need translation; a display for displaying the words of the first language and the words of the second language in augmented reality; a processor for processing program code; and at least one memory operatively connected to the processor for storing the program code and one or more frames, which program code when executed by the processor causes the processor to execute a process to; (a) capture a frame in real-time from the video feed of one or more words in the first language which need to be translated using the video camera to produce a captured frame; (b) crop the captured frame to fit inside an image processing bounding box to produce a cropped frame; (c) pre-process the cropped frame to produce a pre-processed frame; (d) perform character segment recognition on the pre-processed frame to produce a plurality of character segments; (e) perform character merging on the character segments to produce a plurality of merged character segments; (f) perform character recognition on the merged character segments to produce a recognized frame having a plurality of recognized characters; (g) process the recognized frame through a translation engine to produce a translation of the recognized characters in the first language into one or more words of the second language to produce a translated frame, while also calculating a translation quality representing how well the recognized characters have been translated for each translated frame; (h) store the translated frame to the memory as a current translated frame, wherein a previous translated frame and a previous translation quality is also stored in the memory; (i) check that the bounding box has stayed on a same set of characters for the current translated frame and the previous translated frame by determining a fraction of similar characters that are overlapping between the current translated frame and the previous translated frame, wherein a higher fraction indicates that the bounding box has stayed on the same set of characters for the current translated frame and the previous translated frame; (j) compare the translation quality determined by the translation engine for the current translated frame to the previous translation quality for the previous translated frame; (k) select one of the previous translated frame and the current translated frame to be removed from the memory based on a frame having a lower translation quality; and (l) display an optimal translated frame from the previous translated frame and the current translated frame, the optimal translated frame having a higher translation quality, wherein the words of the second language are overlaid over or next to the words in the first language which is being translated in an augmented reality on the display of the mobile device. - View Dependent Claims (16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27)
-
-
28. A non-transitory, computer-readable storage medium for storing program code for translating a video feed in real-time from a first language to a second language, the program code, when executed by a processor causes the processor to execute a translation process comprising:
-
(a) a step for capturing a frame in real-time from the video feed of one or more words in the first language which need to be translated using a video camera to produce a captured frame; (b) a step for cropping the captured frame to fit inside an image processing bounding box to produce a cropped frame; (c) a step for pre-processing the cropped frame to produce a pre-processed frame; (d) a step for performing character segment recognition on the pre-processed frame to produce a plurality of character segments; (e) a step for performing character merging on the character segments to produce a plurality of merged character segments; (f) a step for performing character recognition on the merged character segments to produce a recognized frame having a plurality of recognized characters; (g) a step for processing the recognized frame through a translation engine to produce a translation of the recognized characters in the first language into one or more words of the second language to produce a translated frame, while also calculating a translation quality representing how well the recognized characters have been translated for each translated frame; (h) a step for storing the translated frame to a memory as a current translated frame, wherein a previous translated frame and a previous translation quality is also stored in the memory; (i) a step for checking that the bounding box has stayed on a same set of characters for the current translated frame and the previous translated frame by determining a fraction of similar characters that are overlapping between the current translated frame and the previous translated frame, wherein a higher fraction indicates that the bounding box has stayed on the same set of characters for the current translated frame and the previous translated frame; (j) a step for comparing the translation quality determined by the translation engine for the current translated frame to the previous translation quality for the previous translated frame; (k) a step for selecting one of the previous translated frame and the current translated frame to be removed from the memory based on a frame having a lower translation quality; and (l) a step for displaying an optimal translated frame from the previous translated frame and the current translated frame, the optimal translated frame having a higher translation quality, wherein the words of the second language are overlaid over or next to the words in the first language which is being translated in an augmented reality on a display. - View Dependent Claims (29, 30)
-
Specification