×

Determining word boundary likelihoods in potentially incomplete text

  • US 8,930,399 B1
  • Filed: 01/11/2013
  • Issued: 01/06/2015
  • Est. Priority Date: 11/22/2010
  • Status: Active Grant
First Claim
Patent Images

1. A computer implemented method, comprising:

  • accessing stored queries, each query being one or more characters in a first sequence constituting one or more words in a second sequence;

    for each query;

    selecting query sequences from the query, each query sequence being at least a portion of a word n-gram, the word n-gram being a subsequence of up to n words selected from the second sequence of words of the query, and for each selected query sequence;

    determining a query sequence key for the selected query sequence;

    determining a word boundary likelihood that represents a likelihood that the selected query sequence terminates at a word boundary, the word boundary likelihood being based on a second likelihood that query sequences that are the same as the selected query sequence are one of an end portion of a completed query or a portion of a query sequence that includes a space character as a next character, wherein the second likelihood is based on a word boundary count for the query sequence, the word boundary count being based on a number of the queries for which the query sequence includes a space character as a next character; and

    associating, in a data storage device, the word boundary likelihood with the query sequence key.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×