CUSTOM LOCAL SEARCH
A method described herein includes receiving a query that is configured by an issuer of the query to be utilized to retrieve a portion of local search content, wherein the local search content comprises a first dataset and a second data set, wherein the first dataset comprises a first entity and the second dataset comprises a second entity, and wherein the query comprises a first token and a second token. The method also includes parsing the query such that the first token of the query is mapped to the first entity in the first dataset and the second token of the query is mapped to the second entity in the second dataset. The method also includes returning search results based at least in part upon the parsing of the query.
- 1-20. -20. (canceled)
- 21. A system that facilitates performing a search in a map application, the system comprising:
at least one processor; and memory storing instructions that when executed by the at least one processor cause the system to perform a set of operations comprising; displaying within the map application a plurality of entities; receiving, from an end user, a selection of an entity from the plurality of entities; based on receiving the selection of the entity, displaying a user interface configured to receive attributes about the selected entity; receiving, via the user interface, updated entity information about the entity; storing the updated entity information in a first dataset; receiving, in the map application, a query; based on the received query, executing a search across at least the first dataset; generating search results for the executed search, wherein the search results include the updated entity information; and display, within the map application, the generated search results.
- View Dependent Claims (22, 23, 24, 25, 26, 27)
- 28. A computer-implemented method for executing a search in a map application, the method comprising:
displaying within the map application a plurality of entities; receiving, from an end user, a selection of an entity from the plurality of entities; receiving, via the user interface, updated entity information about the selected entity; storing the updated entity information in a first dataset; receiving, in the map application, a query; based on the received query, executing a search across at least the first dataset; displaying, within the map application, the generated search results.
- View Dependent Claims (29, 30, 31, 32, 33, 34)
- 35. A computer-implemented method comprising:
displaying a map having a plurality of entities; receiving, from an end user, a selection of an entity from the plurality of entities; based on receiving the selection of the entity, displaying a user interface configured to receive attributes about the selected entity from the end user; receiving, via the user interface, updated entity information about the entity; storing the updated entity information in a first dataset of a plurality of datasets; receiving a query; identifying the first dataset from the plurality of datasets; based on the received query, executing a search across at least the first dataset; displaying the generated search results.
- View Dependent Claims (36, 37, 38, 39, 40)
This application is a Continuation of U.S. patent application Ser. No. 12/615,289, filed on Nov. 10, 2009, and entitled “CUSTOM LOCAL SEARCH”. The entirety of this application is incorporated herein by reference.
Local search tools accessible by way of a browser are commonplace on the Internet. A user of such a tool directs the browser to a web page that corresponds to the local search tool by providing a Uniform Resource Locator (URL) that corresponds to the web page to the browser. The user then provides the local search tool with a location that is of interest to the user, which typically is the name of a city. Thereafter, the user can issue a query to the search tool, wherein the query is in the form of a name of a business/service or type of business/service (e.g., “pizza restaurants”). Typically, a search tool is configured to provide the user with a list of relevant organizations and corresponding contact data, including telephone numbers of the organizations, addresses of the organizations, etc. The search tool searches over and returns what is to be referred to herein as “yellow page data.”
Typically, search tools receive yellow page data from consolidators who, in turn, generally rely upon specialized yellow page data collecting firms. The aggregated data is then considered as an aggregate pool of data on organizations, and searches are performed over this aggregate pool. The content of the yellow page data, therefore, is limited to that provided by the consolidators and/or yellow page data collecting firms. Additionally, the yellow page data is structured in such a manner that queries of a limited type and/or structure can be executed over such data. Still further, incorrect spelling of locations or services and/or otherwise imprecise queries can cause incorrect or no results to be returned to the user.
The following is a brief summary of subject matter that is described in greater detail herein. This summary is not intended to be limiting as to the scope of the claims.
Described herein are various technologies pertaining to local search in general, and custom local search in particular. Local search refers to searching for certain entities/services/attributes over a particular geographic location, such as a city, a street, an intersection of streets, etc. Custom local search refers to searching for certain entities/services/attributes over a particular geographic region, wherein the entities/services/attributes are included in custom content. Custom content refers to content provided by an independent entity, such as an end user, a community group, or other suitable person/group that is typically not associated with providing yellow page data to search engines.
In an example, a community group may wish to provide a searchable site that maintains a user-provided list of pedestrian friendly features in a particular city. Such features may be, for instance, jogging trails, water fountains, playgrounds, etc. This data (referred to as “custom data” herein) may be provided to a system that facilitates searching over such data in a particular format—each entity (in this example) may comprise a shape and an attribute that describes the entity. The shape can be indicative of a geographic region pertaining to the entity (e.g., may be a polygon describing a location of the entity on a map). The system that receives the data may perform one or more preprocessing actions to cause the data to be searchable by others, including generating precomputed indexes and attribute footprints that describe the union of geometries of entities that share a particular attribute. The data can be placed in the form of a logically independent dataset, and the system can update the dataset (including the indexes and footprints) when additional data is received by the community group. The system described herein supports searching over multiple of these datasets, which can be referred to herein as “custom datasets”.
The system can additionally support or have access to one or more contextual datasets, which can comprise detailed geographic data (e.g., street-level geographic data). This geographic data can be vector data, which may include precise geometry of streets, landmarks, localities, boundaries, etc. A free-form textual query configured to retrieve data pertaining to a geographic location can be received, and such textual query can be parsed such that different subsequences of tokens in the query are mapped to different sets of entities in the datasets (the custom dataset(s) and the contextual dataset(s)). The subsequences can be mapped to entities in the dataset(s) utilizing any suitable technique; however, subsequences of tokens may not be mapped simultaneously to different entities in different datasets. Spatial processing can be undertaken to determine a substantially optimal interpretation of the query, and custom local results can be returned to the issuer of the query.
In an example, the query may be “Seattle water fountains near 4th ave. and Pike St.” “Seattle” can map to boundary data in the contextual dataset, “4th ave.” may map to certain vector data in the contextual dataset, and “Pike St.” may map to other certain vector data in the contextual dataset. “Water fountains” may map to a set of entities that have an attribute “fountain” in the custom dataset. Using spatial processing (e.g., locating the intersection of 4th Avenue and Pike Street in the contextual database), mapping of subsequences of query tokens to particular entities in the datasets can be undertaken, and results can be displayed to the issuer of the query on a display screen of a computer device.
Other aspects will be appreciated upon reading and understanding the attached figures and description.
Various technologies pertaining to local search in general and custom local search in particular will now be described with reference to the drawings, where like reference numerals represent like elements throughout. In addition, several functional block diagrams of example systems are illustrated and described herein for purposes of explanation; however, it is to be understood that functionality that is described as being carried out by certain system components may be performed by multiple components. Similarly, for instance, a component may be configured to perform functionality that is described as being carried out by multiple components.
With reference to
The system 100 comprises at least one contextual dataset 102, which can include detailed geographic data. For example, the contextual dataset 102 may comprise detailed geographic vector data, such as precise geometry of streets, landmarks, localities, boundaries, amongst other data. Furthermore, the contextual dataset 102 may include three-dimensional data (e.g., that describes buildings, rooms, . . . ). The system 100 can additionally include one or more custom datasets 104a-b (referred to herein individually and collectively as custom datasets 104) that can be generated by many different individuals or groups. The custom datasets 104 can include data pertaining to one or more geographic regions, wherein the data is provided by an end user or a group of end users. For example, a group of end users may wish to provide a list of pedestrian-friendly features in a city. In another example, an organization may wish to provide local search over sites in a database maintained by the organization, such that freeform text queries that are robust with respect to spelling variations can be executed, such as “Seatle Main Street scluptures” (Seattle and sculptures are misspelled).
The system 100 further includes a receiver component 106 that receives a query from an individual or program that wishes to perform a search over data in one or more of the custom datasets 104 with respect to location data in the contextual dataset 102. The query can be a free-form text query that comprises an indication of a geographic region pertaining to the query and a portion of the query that describes entities of interest. Some example forms that queries can take are shown in the following example queries: “Fort ruins along the river Ganges”, “Ganesha Temples in Malleshwaram Bangalore”, “Seattle kid friendly parks”, “Seattle water fountains near 4th ave and Pike St”, and “I-405 north webcams in Kirkland.” Moreover, prior to processing the query, the query can be translated and/or transliterated. For instance, the query can be submitted in a first language and at least a portion of the query can be translated to a second language and thereafter processed.
Each term of the query can be referred to as a token. Thus, six tokens exist in the query “Fort ruins along the river Ganges” (Fort, ruins, along, the, river, Ganges). Subsequences of tokens of the query can be any contiguous combination of tokens in such a query. Thus, a subsequence of tokens may be “Fort Ruins”. Furthermore, a subsequence of tokens may include a single token.
An interpreter component 108 is in communication with the receiver component 106 and can parse the query such that a first subsequence of tokens in the query is mapped to a first entity in the contextual dataset 102 and a second subsequence of tokens in the query is mapped to a second entity in one or more of the custom datasets 104. Tokens in the first subsequence of tokens are not included in tokens in the second subsequence of tokens. Entities in the datasets, as will be described in greater detail below, can include a shape that is indicative of location pertaining to the entity and one or more attributes that describe the entity. Thus, for instance, an entity representative of a water fountain in the custom dataset 104 may include a geographic shape (e.g., defined by latitude—longitude coordinates) and an attribute “drinking fountain”. An entity representative of a street (e.g., Main Street in Seattle) can comprise a shape indicative of the location of the street (e.g., defined by a series of latitude—longitude coordinates) and an attribute “Main Street”. Thus, continuing with these examples, if the query received by the receiver component is “Water Fountains on Main Street in Seattle”, the interpreter component 108 can map the subsequence of tokens “Water Fountains” to at least one entity in one of the custom datasets 104 that has the attribute “drinking fountain” assigned thereto and the interpreter component 108 can map the subsequence of tokens “Main Street Seattle” to at least one entity in the contextual dataset 102 with the attribute “Main Street” assigned thereto.
The interpreter component 108, as will be described in greater detail below, can undertake such mapping numerous times with respect to various different subsequences of tokens and attributes in the contextual dataset 102 and the custom datasets 104. These different mappings can be referred to herein as query interpretations. The query interpretations can be refined and ranked. A most highly ranked query interpretation can be utilized to provide results to the issuer of the query. For example, the results may include a map of a region (e.g., from the contextual dataset 102) with graphical indicia thereon describing elements pertaining to the query (e.g., from the custom datasets 104). In another example, the results may include a list of entities corresponding to a certain geographic region. Other manners of organizing and presenting search results to the issuer of the query are contemplated and are intended to fall under the scope of the hereto-appended claims.
Turning now to
The custom dataset 104 may also comprise an index 211 and an attribute footprint 211a that can be computed offline (e.g., prior to execution of a query). For each attribute in the custom dataset 104, an attribute footprint can be computed, wherein an attribute footprint is an approximate representation of the union of geometries of all entities in the custom dataset 104 that share such alit ibute. The index 211 can comprise an entity lookup index that supports efficient lookup of multiple entities by attribute. The index 211 can also comprise fuzzy text index that supports approximate lookup of attributes (e.g., which allows misspellings to be taken into consideration).
The contextual dataset 102 also comprises a plurality of entities 212-214 (a first entity through a Pth entity). As described above, the contextual dataset 102 may be a relatively large dataset that comprises detailed street-level data. Each of the entities 212-214 in the contextual dataset 102 can comprise a shape 216, 216a, attributes 218, 218a, and unique identifiers 220, 220a. In the contextual dataset 102, the shapes 216 and 216a can describe shape of a street, monument, boundary line, etc. The attributes can be names, descriptors, or the like. Moreover, the contextual dataset 102 can comprise at least one attribute footprint 221 and an index 221a (similar to the index 211 and attribute footprint 211a in the custom dataset 104). As can be ascertained from reviewing
Now referring to
The query 300 can also include text-query 304, which can include embedded references to attributes (names) from custom datasets and/or contextual datasets. These embedded references can be in any suitable order, and can possibly include misspellings and “nearness” operators, such as “x near y” or other suitable “nearness” operator. Furthermore, the text query 304 may include other operators, such as “along”, “within X minutes of”, “behind”, or other operators that are descriptive in terms of geographic location.
The query 300 can also include a geometric scope 306, which indicates a geographic region pertaining to the query 300. For example, the geometric scope 306 could be or include a name of a city, a name of a state, a name of a street or intersection, a name of a boundary, a name of a monument that pertains to a certain geographic region, etc. In another example, the geometric scope 306 can be inferred based upon a current view of a map provided to the user.
With reference now to
The interpreter component 108 can further include a refinement component 404 that can perform query interpretation refinement. During query interpretation refinement, the refinement component 404 can identify specific sets of entities that have attributes (that were identified in the previous phase) and that intersect the approximate region of interest identified in the previous phase. This information can be used to construct precise regions of interest, taking user-supplied per-query geometric parameters into account.
The interpreter component 108 can additionally comprise a ranker component 406 that ranks multiple possible refined query interpretations. The ranker component 406 can take into consideration various factors including textual similarity between terms in the query with terms of matched entities as well as factors computed from specific entities that make up a result set. Operation of the query interpreter component 402, the refinement component 404, and the ranker component 406 are now described in greater detail.
With more detail pertaining to the query interpreter component 402, such component 402 can receive a query and can output one or more partial query interpretations (PIs). Turning briefly to
Referring now to
A query subsequence qi-j considered by the query interpreter component 402 can represent a list of contiguous tokens (qi, qi+1, . . . qj) of the query Q. The query interpreter component 402 can generate a plurality of lists of what can be referred to herein as “match candidates” (MCs), one from each dataset considered (e.g., one from one or more of the custom datasets 104 each and one from the contextual dataset 102). Each MC (qi−j, name, D) maps query subsequence qi−j to an attribute name in the dataset D. MCs are candidate PI mappings, such as that shown in
The query interpreter component 402 may then select a MC pertaining to one of the custom datasets 104, which can be referred to as CustMC. The query interpreter component 402 may then perform a filter operation with respect to MCs pertaining to the contextual dataset 102, such that MCs pertaining to the contextual dataset 102 that are compatible with the CustMC are retained (the uncompatible MCs are not considered). To be compatible, the CustMC and a MC pertaining to the contextual dataset 102 spatially overlap and are derived from non-overlapping subsequences of the query Q. This filtering causes an anchored MC list to be generated for each CustMC (the anchored MC list comprises MCs pertaining to the contextual dataset 102 that are compatible with CustMC).
The query interpreter component 402 can utilize the anchored MC list in connection with generating a PI. Given the anchored MC list, an algorithm can be called that starts with an empty PI, an empty solution set pertaining to the query, and the approximate region of interest (the focus). The focus can be a large region (e.g., the whole world) or a smaller region provided by the issuer of the query. The query interpreter component 402 can order the MCs in the anchored MC list in order of decreasing “promise” (e.g., in order of decreasing fuzzy-text match score). In other words, MCs in the anchored MC list with attributes that match more closely with subsequences in the query can be placed earlier on the list. It is to be understood that other orderings are contemplated and intended to fall under the scope of the hereto-appended claims.
Thereafter, each MC in the anchored MC list can be added to the current PI (the initially empty PI). For each MC in the MC anchor list, the query interpreter component 402 can compute a new partial interpretation that is the addition of the current MC in the MC anchor list being considered, a new focus that is the spatial intersection of the footprint of the current MC and the old focus, and a new MC anchor list with incompatible MCs filtered. Computation of the spatial intersection can result in a more narrow focus. The query interpreter component 402 computes the new MC anchor list by receiving the MC anchor list and removing MCs that are either textually incompatible or spatially incompatible with the current PI. MCs are spatially incompatible if their footprints do not overlap with the newly computed focus. Two MCs are textually incompatible if their matched subsequences include a common token or tokens from the input query. If the new list of anchored MCs is empty, the partial interpretation cannot be more fully expanded, implying that a viable interpretation has been discovered. This partial interpretation can be added to the solution set of possible partial interpretations of the query. If the new list of anchored MCs includes additional MCs, then a next MC can be added to the current PI and the process can repeat. Moreover, this process can repeat for each MC in custom datasets pertaining to the query. Therefore, the query interpreter component 402 can compute a plurality of PIs with respect to the query.
The refinement component 404 can receive the PIs and output query refinements corresponding to the PIs. Specifically, the refinement component 404 can undertake the refinements by identifying entities that make up a refined interpretation. In a refined interpretation, each mapping references a specific entity (and there may be multiple subsets of entities in cases that more than one entity corresponds to an attribute). Referring briefly to
The refinement component 404 can generate refined interpretations as follows. First, the refinement component 404 can build a list of entities, one list for each attribute in the PI. The refinement component 404 can construct each list by looking up entities with the particular attribute from its corresponding dataset and which are spatially within the focus of the PI. The refinement component 404 can use the precomputed spatial (attribute footprint) index to generate such a list. The refinement component 404 can then accumulate subsets of entities from the lists of entities previously generated by taking one entity from each list. Each subset includes entities that mutually overlap spatially—the refinement component 404 can detect overlapping entities through utilization of shape information associated with individual entities. The refinement component 404 can also compute a final region of interest pertaining to the query. This can be undertaken by intersecting the shapes of the intersecting entities and growing the resultant shape by a query-specific boundary width. In another example, the region of interest can be computed by growing the final focus by a per-query boundary width. Thereafter, the refinement component 404 can look up custom dataset entities given the computed region of interest. The refinement component 404 can locate candidate unique identifiers from all datasets (including custom datasets) involved in the query. The refinement component 404 may then filter such list, pruning out entities that lie outside the computed region of interest. In another example, the region of interest can be utilized directly to lookup a separate spatial database (one maintained per custom dataset) to retrieve unique identifiers.
The interpreter component 108 can also include a ranker component 406 that ranks refined interpretations output by the refinement component 404. For instance, the ranker component 406 can utilize heuristics that take into consideration a plurality of factors when ranking refined interpretations, including but not limited to a weighted edit-distance score between query subsequences and matched attributes, the count of matched attributes to query subsequence pairs, the fraction of a query that is unmatched, etc. Of course, the ranker component 506 can utilize any suitable technologies in connection with ranking refined interpretations.
The output provided to a user can comprise entities corresponding to a most highly ranked refined interpretation. That is, the set of entities returned to the user can include entities from the custom dataset that fall within the region of interest for the query. The user can also receive contextual data from the contextual dataset that provides context for the entities from the custom dataset.
With reference now to
The system 800 also includes a receiver component 810 that can receive new custom datasets from an end user or group of end users and/or updates to one or more of the datasets 804-808 in the data repository 802. In an example, an end user can access a map application through utilization of a browser, and select (using a mouse) locations on the map that pertain to certain entities. Furthermore, the map application may have a graphical user interface associated therewith that is configured to receive attributes from the end user that describe the entities. The end user may then make a selection (e.g., select a push-button shown on a web page) that causes data to be uploaded from a client computing device used by the end user.
A creator component 812 can receive data uploaded by the end user and can cause a new dataset to be created in the data repository 802. The creator component 812 can allocate space in the data repository 802 to store a new dataset or to update an existing dataset. The creator component 812 can compute a fuzzy text index as well as a spatial footprint for attributes/entities in the new/updated dataset. The spatial footprint(s) and indexes can be stored in connection with their respective datasets for utilization during query interpretation.
With reference now to
Moreover, the acts described herein may be computer-executable instructions that can be implemented by one or more processors and/or stored on a computer-readable medium or media. The computer-executable instructions may include a routine, a sub-routine, programs, a thread of execution, and/or the like. Still further, results of acts of the methodologies may be stored in a computer-readable medium, displayed on a display device, and/or the like.
Referring now to
At 906, a MC is selected from MCLCUST and used as an anchor MC. Selection of this MC can be the beginning of a partial interpretation. A compatible MC from MCLCTXT can then be selected to grow the PI at 908. A compatible MC, in an example, can be an MC that is textually non-overlapping with the anchor MC. At 910, once the compatible MC has been selected, a new focus can be generated, wherein the new focus can be a spatial intersection of an existing focus and the spatial footprint of the compatible MC. An initial focus can be a large region or a smaller region inferred from content of the query or a current view of a map provided to a user.
At 912, MCs in MCLCUST that are incompatible with the current partial interpretation can be labelled such that they are not considered for the current partial interpretation of the query. In this example, incompatible can refer to MCs that have textual overlap with any MC in the current partial interpretation as well as MCs that do not spatially intersect with the newly generated focus.
At 914, a determination is made if there are any more MCs in MCLCTXT to be considered for the current partial interpretation (if there are any MCs that are compatible with the current partial interpretation). If there are additional MCs in MCLCTXT that are compatible with the current partial interpretation, then the methodology 900 returns to 908.
If there are no more MCs in MCLCTXT to be considered for the current partial interpretation, such partial interpretation can be output as a partial interpretation for the query, and the methodology can proceed to 916. Additionally or alternatively, a recursive depth-first exploration can be undertaken as described above, where a MC is dropped from the MCList. At 916, a determination is made regarding whether there are any other MCs in MCLCUST to be considered as an anchor MC. If there are additional MCs in MCLCUST to be considered as anchor MCs, then the methodology 900 returns to 906. If there are no additional MCs in MCLCUST to consider, then the methodology 900 completes at 918.
With reference now to
At 1010, the subset(s) of entities are output as refined interpretations. At 1012, a determination is made regarding whether any additional partial interpretations are to be refined. If there are additional partial interpretations, then the methodology 1000 returns to 1004. If there are no additional partial interpretations, then the methodology 1000 completes at 1012.
Now referring to
The computing device 1100 additionally includes a data store 1108 that is accessible by the processor 1102 by way of the system bus 1106. The data store 1108 may include executable instructions, contextual datasets, custom datasets, precomputed indexes, query token subsequences, etc. The computing device 1100 also includes an input interface 1110 that allows external devices to communicate with the computing device 1100. For instance, the input interface 1110 may be used to receive instructions from an external computer device, from a user, etc. The computing device 1100 also includes an output interface 1112 that interfaces the computing device 1100 with one or more external devices. For example, the computing device 1100 may display text, images, etc. by way of the output interface 1112.
Additionally, while illustrated as a single system, it is to be understood that the computing device 1100 may be a distributed system. Thus, for instance, several devices may be in communication by way of a network connection and may collectively perform tasks described as being performed by the computing device 1100.
As used herein, the terms “component” and “system” are intended to encompass hardware, software, or a combination of hardware and software. Thus, for example, a system or component may be a process, a process executing on a processor, or a processor. Additionally, a component or system may be localized on a single device or distributed across several devices.
It is noted that several examples have been provided for purposes of explanation. These examples are not to be construed as limiting the hereto-appended claims. Additionally, it may be recognized that the examples provided herein may be permutated while still falling under the scope of the claims.