Content search in complex language, such as Japanese
First Claim
1. A system for searching for content identified using text and symbols of a complex language associated with multiple written forms, the system comprising:
- a computer-readable storage medium having stored thereon an asset repository associated with multiple searchable assets;
a computer-readable storage medium having stored thereon a vocabulary knowledge base for storing vocabulary information associated with the complex language, wherein the vocabulary knowledge base stores information related to multiple semantic concepts that are usable to identify assets within the asset repository, wherein the vocabulary knowledge base is generated or updated by a repeatable method comprising;
assigning an identifier to a semantic concept;
identifying a main written form for the semantic concept, wherein the main written form is based on at least one of the multiple written forms;
for at least one of the multiple written forms associated with the complex language, associating at least one synonymous written form with the semantic concept, wherein the synonymous written form is at least partially distinct from the main written form; and
storing the identifier, the main written form, and the at least one synonymous written form in a data storage component associated with the system; and
a computing system having a processor to execute a search engine for receiving and executing queries for the searchable assets, wherein the execution is based, at least in part, on the contents of the vocabulary knowledge base.
11 Assignments
0 Petitions
Accused Products
Abstract
A search facility provides searching capabilities in languages such as Japanese. The facility may use a vocabulary knowledge base organized by concepts. For example, each concept may be associated with at least one keyword (as well as any synonyms or variant forms) by applying one or more rules that relate to identifying common main forms, script variants, alternative grammatical forms, phonetic variants, proper noun variants, numerical variants, scientific name, cultural relevance, etc. The contents of the vocabulary knowledge base are then used in executing search queries. A user may enter a search query in which keywords (or synonyms associated with those key words) may be identified, along with various stopwords that facilitate segmentation of the search query and other actions. Execution of the search query may result in a list of assets or similar indications being returned, which relate to concepts identified within the search query.
103 Citations
19 Claims
-
1. A system for searching for content identified using text and symbols of a complex language associated with multiple written forms, the system comprising:
-
a computer-readable storage medium having stored thereon an asset repository associated with multiple searchable assets; a computer-readable storage medium having stored thereon a vocabulary knowledge base for storing vocabulary information associated with the complex language, wherein the vocabulary knowledge base stores information related to multiple semantic concepts that are usable to identify assets within the asset repository, wherein the vocabulary knowledge base is generated or updated by a repeatable method comprising; assigning an identifier to a semantic concept; identifying a main written form for the semantic concept, wherein the main written form is based on at least one of the multiple written forms; for at least one of the multiple written forms associated with the complex language, associating at least one synonymous written form with the semantic concept, wherein the synonymous written form is at least partially distinct from the main written form; and storing the identifier, the main written form, and the at least one synonymous written form in a data storage component associated with the system; and a computing system having a processor to execute a search engine for receiving and executing queries for the searchable assets, wherein the execution is based, at least in part, on the contents of the vocabulary knowledge base. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A computer-implemented method for executing a search query, the method comprising:
-
receiving a search query including a textual expression, wherein the textual expression is written in a language that at least occasionally lacks discrete boundaries between words or autonomous language units; referencing a structured vocabulary knowledge base to determine whether the textual expression comprises a keyword or synonym associated with the structured vocabulary knowledge base, wherein the structured vocabulary knowledge base is for storing vocabulary information associated with a language having multiple orthographic forms or scripts, and wherien the structured vocabulary knowledge base is generated prior to receiving the search query by a repeatable method comprising; assigning an identifier to a semantic concept that is usable as a keyword or key phrase; identifying a main written form for the semantic concept, wherein the main written form is based on at least one of the multiple written forms; for at least one of the multiple written forms associated with the complex language, associating at least one synonymous written form with the semantic concept, wherein the synonymous written form is at least partially distinct from the main written form; and storing the identifier, the main written form, and the at least one synonymous written form in a data storage component associated with the vocabulary knowledge base; if the textual expression does not comprise a keyword, key phrase or synonym associated with the structured vocabulary knowledge base, performing segmentation on the textual expression, wherein the segmentation includes systematically splitting the textual expression into two or more segments, and identifying at least one keyword from the vocabulary knowledge based on the textual expression and the two or more segments; performing the search query using the at least one identified keyword; and providing for display results of the search query. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16, 17)
-
-
18. A method in a first computer system for retrieving a media content unit from a second computer system having a plurality of media content units that have been classified according to keyword terms of a structured vocabulary, comprising:
-
sending a request for a media content unit, the request specifying a search term; receiving an indication of at least one media content unit that corresponds to the specified search term, wherein the search term is located within the structured vocabulary and is used to determine at least one media content unit that corresponds to the search term, and wherein orthographic variations of the search term are automatically provided to assist in determining the at least one media content unit that corresponds to the search term, wherein the structured vocabulary is generated at the second computer system prior to the first computer sending the request by a repeatable method comprising; assigning an identifier to a semantic concept that is usable as a keyword or key phrase; identifying a main written form for the semantic concept, wherein the main written form is based on at least one of the multiple written forms; for at least one of the multiple written forms associated with the complex language, associating at least one synonymous written form with the semantic concept, wherein the synonymous written form is at least partially distinct from the main written form; and storing the identifier, the main written form, and the at least one synonymous written form in a data storage component associated with the vocabulary knowledge base; and
displaying the at least one media content unit on a display. - View Dependent Claims (19)
-
Specification