Smart selection of text spans

US 9,436,918 B2
Filed: 04/04/2014
Issued: 09/06/2016
Est. Priority Date: 10/07/2013
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented process for predicting a text span forming either a single word or a series of two or more words that a user intended to select, comprising:

using a computer to perform the following process actions;

receiving a document comprising a string of characters;

receiving a location pointer indicating a particular location in the document;

inputting the document and the location pointer to a plurality of different candidate text span generation methods;

receiving a ranked list of one or more scored candidate text spans from each of the different candidate text span generation methods;

using a machine-learned ensemble model to re-score each of the scored candidate text spans received from each of the different candidate text span generation methods, the ensemble model being trained using a machine learning method and features from a dataset of true intended user text span selections; and

receiving a ranked list of re-scored candidate text spans from the ensemble model.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A text span forming either a single word or a series of two or more words that a user intended to select is predicted. A document and a location pointer that indicates a particular location in the document are received and input to different candidate text span generation methods. A ranked list of one or more scored candidate text spans is received from each of the different candidate text span generation methods. A machine-learned ensemble model is used to re-score each of the scored candidate text spans that is received from each of the different candidate text span generation methods. The ensemble model is trained using a machine learning method and features from a dataset of true intended user text span selections. A ranked list of re-scored candidate text spans is received from the ensemble model.

201 Citations

20 Claims

1. A computer-implemented process for predicting a text span forming either a single word or a series of two or more words that a user intended to select, comprising:
- using a computer to perform the following process actions;
  
  receiving a document comprising a string of characters;
  
  receiving a location pointer indicating a particular location in the document;
  
  inputting the document and the location pointer to a plurality of different candidate text span generation methods;
  
  receiving a ranked list of one or more scored candidate text spans from each of the different candidate text span generation methods;
  
  using a machine-learned ensemble model to re-score each of the scored candidate text spans received from each of the different candidate text span generation methods, the ensemble model being trained using a machine learning method and features from a dataset of true intended user text span selections; and
  
  receiving a ranked list of re-scored candidate text spans from the ensemble model.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
- - 2. The process of claim 1, wherein the location pointer comprises a character offset indicating a specific character that the user selected in the document.
  - 3. The process of claim 1, further comprising the actions of:
    - identifying the candidate text span in the ranked list of re-scored candidate text spans having the highest score; and
      
      displaying said identified candidate text span to the user as a prediction of the text span that they intended to select.
  - 4. The process of claim 3, wherein said identified candidate text span comprises a phrase comprising two or more words.
  - 5. The process of claim 1, further comprising the actions of:
    - identifying two or more of the candidate text spans in the ranked list of re-scored candidate text spans having the highest scores; and
      
      displaying said identified candidate text spans to the user as proposed predictions of the text span they intended to select.
  - 6. The process of claim 1, wherein the different candidate text span generation methods comprise either:
    - a plurality of different linguistic unit detector methods;
      
      ora plurality of different heuristic methods;
      
      ora combination of one or more different linguistic unit detector methods and one or more different heuristic methods.
  - 7. The process of claim 6, wherein the location pointer identifies a word that the user selected in the document, and the different linguistic unit detector methods comprise a hyperlink intent model method which uses a machine-learned hyperlink intent model to identify candidate text spans that subsume said identified word.
  - 8. The process of claim 6, wherein the location pointer identifies a word that the user selected in the document, and the different linguistic unit detector methods comprise one or more different named entity recognizer methods each of which identifies a candidate text span comprising a named entity that subsumes said identified word.
  - 9. The process of claim 6, wherein the location pointer identifies a word that the user selected in the document, and the different linguistic unit detector methods comprise one or more different noun phrase detector methods each of which identifies candidate text spans comprising noun phrases that subsume said identified word.
  - 10. The process of claim 6, wherein the location pointer identifies a word that the user selected in the document, and the different linguistic unit detector methods comprise a knowledge base lookup method which uses a Web graph to identify candidate text spans comprising either named entities that subsume said identified word, or noun phrases that subsume said identified word, or concepts that subsume said identified word, the Web graph comprising information from one or more different knowledge bases.
  - 11. The process of claim 6, wherein the location pointer identifies a word that the user selected in the document, and the different heuristic methods comprise a heuristic which assumes that said identified word is the text span that the user intended to select.
  - 12. The process of claim 6, wherein the location pointer identifies a word that the user selected in the document, and the different heuristic methods comprise a capitalization-based heuristic which, whenever said identified word is capitalized, evaluates the string of characters to the left of said identified word and the string of characters to the right of said identified word, and expands said identified word to the longest possible uninterrupted sequence of capitalized words.
  - 13. The process of claim 1, wherein the dataset of true intended user text span selections is either,(a) constructed using a large-scale crowd-sourcing method, or(b) augmented with a testset of simulated user text span selections, or(c) both (a) and (b).
  - 14. The process of claim 1, wherein the computer is touch-enabled and comprises a touch-sensitive display screen, the document is displayed on said screen, and the location pointer is generated by the user touching said screen on top of the particular location in the document.

15. A computer-implemented process for predicting a text span forming either a single word or a series of two or more words that a user intended to select, comprising:
- using a computer to perform the following process actions;
  
  receiving a document comprising a string of characters;
  
  receiving a location pointer indicating a particular location in the document;
  
  inputting the document and the location pointer to a machine-learned hyperlink intent model; and
  
  receiving a ranked list of scored candidate text spans from the hyperlink intent model.
- View Dependent Claims (16, 17, 18, 19)
- - 16. The process of claim 15, wherein the location pointer identifies a word that the user selected in the document, and the action of receiving a ranked list of scored candidate text spans from the hyperlink intent model comprises the actions of:
    - (a) assigning said identified word to be a current candidate text span;
      
      (b) evaluating the expansion of the current candidate text span one word to the left thereof, said evaluation comprising the actions of using the hyperlink intent model and a leftward binary classifier to score said leftward expansion, and storing said leftward expansion and its score in the ranked list of scored candidate text spans;
      
      (c) evaluating the expansion of the current candidate text span one word to the right thereof, said evaluation comprising the actions of using the hyperlink intent model and a rightward binary classifier to score said rightward expansion, and storing said rightward expansion and its score in the ranked list of scored candidate text spans;
      
      (d) selecting the greater of the score for expanding the current candidate text span one word to the left thereof and the score for expanding the current candidate text span one word to the right thereof;
      
      (e) whenever said selected score is greater than a prescribed threshold, assigning the expansion corresponding to said selected score to be the current candidate text span, and repeating actions (b)-(e).
  - 17. The process of claim 16, wherein,the leftward binary classifier uses logistic regression and a leftward set of features comprising features which are computed over the current candidate text span, features which are computed over the one word to the left of the current candidate text span, and features which are computed over another word that is immediately to the left of said one word to the left, andthe rightward binary classifier uses logistic regression and a rightward set of features comprising features which are computed over the current candidate text span, features which are computed over the one word to the right of the current candidate text span, and features which are computed over another word that is immediately to the right of said one word to the right.
  - 18. The process of claim 15, wherein the hyperlink intent model is trained using a set of training data that is automatically generated from anchor texts which are randomly sampled from a knowledge base, said training data comprising both positive training examples and negative training examples.
  - 19. The process of claim 15, further comprising the actions of:
    - using a machine-learned ensemble model to re-score each of the scored candidate text spans, the ensemble model being trained using a machine learning method and features from a dataset of true intended user text span selections; and
      
      receiving a ranked list of re-scored candidate text spans from the ensemble model.

20. A system for predicting a text span forming either a single word or a series of two or more words that a user intended to select, comprising:
- a computing device comprising a display device; and
  
  a computer program having program modules executable by the computing device, the computing device being directed by the program modules of the computer program to,receive a document comprising text,receive a location pointer indicating a particular location in the document;
  
  input the document and the location pointer to a plurality of different candidate text span generation methods comprising one or more different linguistic unit detector methods and one or more different heuristic methods,receive a ranked list of one or more scored candidate text spans from each of the different candidate text span generation methods,use a machine-learned ensemble model to re-score each of the scored candidate text spans received from each of the different candidate text span generation methods, the ensemble model being trained using a machine learning method and features from a dataset of true intended user text span selections, said dataset being augmented with a testset of simulated user text span selections,receive a ranked list of re-scored candidate text spans from the ensemble model,identify the candidate text span in the ranked list of re-scored candidate text spans having the highest score, anddisplay said identified candidate text span on the display device as a prediction of the text span that the user intended to select.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Inventors
Pantel, Patrick, Gamon, Michael, Fuxman, Ariel Damian, Kohlmeier, Bernhard, Chilakamarri, Pradeep
Primary Examiner(s)
Hill, Stanley K
Assistant Examiner(s)
Fink, Thomas

Application Number

US14/245,646
Publication Number

US 20150100524A1
Time in Patent Office

886 Days
Field of Search
US Class Current

1/1
CPC Class Codes

G06F 16/24578   using ranking

G06F 16/332   Query formulation

G06F 16/3322   using system suggestions G0...

G06F 16/9535   Search customisation based ...

G06F 3/04842   Selection of displayed obje...

G06N 20/00   Machine learning

G06N 20/20   Ensemble learning

G06N 3/04   Architecture, e.g. intercon...

G06N 7/01   Probabilistic graphical mod...

Smart selection of text spans

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

201 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Smart selection of text spans

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

201 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links