System and method for keyword spotting using multiple character encoding schemes
First Claim
1. A method, comprising:
- accepting an input search phrase to be located in a body of data;
identifying multiple candidate character encoding schemes using one or more characteristics of the input search phrase;
translating the input search phrase into multiple encoding-specific search phrases, each encoding-specific search phrase representing the input search phrase in a different, respective candidate character encoding scheme; and
identifying one or more occurrences of the input search phrase in the body of data by searching the body of data using each of the multiple encoding-specific search phrases.
3 Assignments
0 Petitions
Accused Products
Abstract
Methods and systems for finding search phrases in a body of data that is encoded using any of multiple possible character encoding schemes. An analytics system accepts an input search phrase for searching in a certain body of data. The system identifies two or more candidate character encoding schemes, which may have been used for encoding the body of data. Having determined the candidate encoding schemes, the system translates the input search phrase into multiple encoding-specific search phrases that represent the input search phrase in the respective candidate encoding schemes. The system then searches the body of data for occurrences of the input search phrase using the multiple encoding-specific search phrases.
28 Citations
16 Claims
-
1. A method, comprising:
-
accepting an input search phrase to be located in a body of data; identifying multiple candidate character encoding schemes using one or more characteristics of the input search phrase; translating the input search phrase into multiple encoding-specific search phrases, each encoding-specific search phrase representing the input search phrase in a different, respective candidate character encoding scheme; and identifying one or more occurrences of the input search phrase in the body of data by searching the body of data using each of the multiple encoding-specific search phrases. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. Apparatus, comprising:
-
an interface, which is configured to accept an input search phrase to be located in a body of data; and a processor, which is configured to identify multiple candidate character encoding schemes using one or more characteristics of the input search phrase, to translate the input search phrase into multiple encoding-specific search phrases, each encoding-specific search phrase representing the input search phrase in a different, respective candidate character encoding scheme, and to identify one or more occurrences of the input search phrase in the body of data by searching the body of data using each of the multiple encoding-specific search phrases. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
Specification