Automatic index creation for handwritten digital ink notes
First Claim
1. A method for generating an index for handwritten notes captured as digital ink in a computer, said method comprising the steps of:
- processing strokes of raw data for handwritten notes captured as digital ink in a computer to identify index terms, wherein an index term comprises at least one ink word, the index terms identified as one of at least two ink words with a pairwise distance within a predetermined threshold;
automatically generating an index for said index terms by linking said index terms to a location in said handwritten notes where said index terms are located; and
displaying, on an output display, said index for said handwritten notes.
4 Assignments
0 Petitions
Accused Products
Abstract
A system for automatically generating indexes for handwritten notes captured as digital ink in a computer is disclosed. Ink words are identified, and features of the ink words are computed. Pairwise distances or match scores, which measure the distance in the features between two ink words, are calculated. A clustering technique selects equivalence classes of ink words. Index terms, which are non-uniform through-out the notes, are selected from the equivalence classes of ink words. The system generates an index from the index terms, including displaying pages numbers where the index terms are located in the notes as well as hyper-linking the index terms. A technique to identify a threshold for use in clustering the ink words is also disclosed.
-
Citations
19 Claims
-
1. A method for generating an index for handwritten notes captured as digital ink in a computer, said method comprising the steps of:
-
processing strokes of raw data for handwritten notes captured as digital ink in a computer to identify index terms, wherein an index term comprises at least one ink word, the index terms identified as one of at least two ink words with a pairwise distance within a predetermined threshold;
automatically generating an index for said index terms by linking said index terms to a location in said handwritten notes where said index terms are located; and
displaying, on an output display, said index for said handwritten notes. - View Dependent Claims (2, 3)
the step of generating an index comprises the step of generating page numbers in said handwritten notes for said index terms; and
the step of displaying an index comprises the step of displaying said page numbers along with index terms.
-
-
3. The method as set forth in claim 1, wherein:
-
the step of generating an index comprises the step of generating hyper linked index terms to link said index terms to a location in said handwritten notes; and
the step of displaying an index comprises the step of displaying index terms as hyper linked text.
-
-
4. A method for generating an index for handwritten notes captured as digital ink in a computer, said method comprising the step of:
-
identifying a plurality of ink words from said handwritten notes;
generating at least one equivalence class of said ink words, wherein an equivalence class comprises at least two ink words with a pairwise distance within a predetermined threshold;
selecting at least one of said equivalence classes of ink words as index terms for said computer handwritten notes; and
automatically generating an index for said index terms selected to generate a link from said index terms of an equivalence class to a location in said handwritten notes where said index terms are located. - View Dependent Claims (5, 6, 7, 8)
generating a plurality of feature sequences based on time and spatial distances of strokes of raw data in said handwritten notes;
generating said pairwise distance for said pairs of ink words based on said feature sequences; and
clustering said ink words into said equivalence classes based on said pairwise distances.
-
-
6. The method as set forth in claim 5, wherein the step of clustering said ink words into said equivalence classes comprises the steps of:
-
identifying each of said ink words as an initial cluster;
selecting, to generate a single cluster, two clusters that comprise the closest pairwise distance;
selecting a threshold to define a maximum pairwise distance; and
repeating the step of selecting two clusters that comprise the closest pairwise distance, to generate a single cluster, until said closest pairwise distance exceeds said threshold.
-
-
7. The method as set forth in claim 6, wherein the step of selecting a threshold comprises the steps of:
-
generating a distribution curve that represents a relationship between a number of occurrences among pairs of said ink words at a particular pairwise distance;
identifying a knee of said distribution curve, τ
, by approximating said distribution curve with a first line of gradient 0 to τ
, and a second line comprising a constant gradient from said knee, τ
, throughout said distribution curve; and
selecting, as said threshold, said pairwise distance approximated by said knee of said distribution curve, τ
.
-
-
8. The method as set forth in claim 4, wherein the step of selecting at least one of said equivalence classes of ink words as index terms comprises the step of selecting index terms that occur non-uniform throughout-out said handwritten notes.
-
9. A method for identifying equivalence classes of ink words in handwritten notes entered into a computer, said method comprising the steps of:
-
identifying a plurality of ink words from handwritten notes;
identifying a plurality of features of said ink words;
generating a pairwise distance among said ink words based on said features;
generating a distribution curve that represents a relationship between a number of occurrences among pairs of said ink words in said handwritten notes at a particular pairwise distance;
identifying a knee of said distribution curve, τ
, by approximating said distribution curve with a first line of gradient 0 to τ
, and a second line comprising a constant gradient from said knee, τ
, throughout a greater pairwise distance on said distribution curve;
selecting, as a threshold for clustering, said pairwise distance approximated by said knee of said distribution curve, τ
; and
identifying at least one equivalence class of said ink words by generating clusters of said ink words within a pairwise distance of said threshold.
-
-
10. A computer readable medium comprising a plurality of instructions, which when executed by a computer, causing the computer to perform steps of:
-
processing stokes of raw data for handwritten notes captured as digital ink in a computer to identify index terms, wherein an index term comprises at least one ink word, the index terms identified as one of at least two ink words with a pairwise distance within a predetermined threshold;
automatically generating an index for said index terms by linking said index terms to a location in said handwritten notes where said index terms are located; and
displaying, on an output display, said index for said handwritten notes. - View Dependent Claims (11, 12)
the step of generating an index comprises the step of generating page numbers in said handwritten notes for said index terms; and
the step of displaying an index comprises the step of displaying said page numbers along with index terms.
-
-
12. The computer readable medium as set forth in claim 10, wherein:
-
the step of generating an index comprises the step of generating hyper linked index terms to link said index terms to a location in said handwritten notes; and
the step of displaying an index comprises the step of displaying index terms as hyper linked text.
-
-
13. A computer readable medium comprising a plurality of instructions, which when executed by a computer, causing the computer to perform steps of:
-
identifying a plurality of ink words from said handwritten notes;
automatically generating at least one equivalence class of said ink words, wherein an equivalence class comprises at least two ink words with a pairwise distance within a predetermined threshold;
selecting at least one of said equivalence classes of ink words as index terms for said computer handwritten notes; and
generating an index for said index terms selected to generate a link from said index terms of an equivalence class to a location in said handwritten notes where said index terms are located. - View Dependent Claims (14, 15, 16, 17)
generating a plurality of feature sequences based on time and spatial distances of strokes of raw data in said handwritten notes;
generating said pairwise distance for said pairs of ink words based on said feature sequences; and
clustering said ink words into said equivalence classes based on said pairwise distances.
-
-
15. The computer readable medium as set forth in claim 14, wherein the step of clustering said ink words into said equivalence classes comprises the steps of:
-
identifying each of said ink words as an initial cluster;
selecting, to generate a single cluster, two clusters that comprise the closest pairwise distance;
selecting a threshold to define a maximum pairwise distance; and
repeating the step of selecting two clusters that comprise the closest pairwise distance, to generate a single cluster, until said closest pairwise distance exceeds said threshold.
-
-
16. The computer readable medium as set forth in claim 15, wherein the step of selecting a threshold comprises the steps of:
-
generating a distribution curve that represents a relationship between a number of occurrences among pairs of said ink words at a particular pairwise distance;
identifying a knee of said distribution curve, τ
, by approximating said distribution curve with a first line of gradient 0 to τ
, and a second line comprising a constant gradient from said knee, τ
, throughout said distribution curve; and
selecting, as said threshold, said pairwise distance approximated by said knee of said distribution curve, τ
.
-
-
17. The computer readable medium as set forth in claim 13, wherein the step of selecting at least one of said equivalence classes of ink words as index terms comprises the step of selecting index terms that occur non-uniform throughout-out said handwritten notes.
-
18. A computer readable medium comprising a plurality of instructions, which when executed by a computer, causing the computer to perform steps of:
-
identifying a plurality of ink words from handwritten notes;
identifying a plurality of features of said ink words;
generating a pairwise distance among said ink words based on said features;
generating a distribution curve that represents a relationship between a number of occurrences among pairs of said ink words in said handwritten notes at a particular pairwise distance;
identifying a knee of said distribution curve, τ
, by approximating said distribution curve with a first line of gradient 0 to τ
, and a second line comprising a constant gradient from said knee, τ
, throughout a greater pairwise distance on said distribution curve;
selecting, as a threshold for clustering, said pairwise distance approximated by said knee of said distribution curve, τ
; and
identifying at least one equivalence class of said ink words by generating clusters of said ink words within a pairwise distance of said threshold.
-
-
19. A computer comprising:
-
a user input pen-based device for receiving stokes of raw data for handwritten notes;
processor unit, coupled to said user input pen-based device, for processing said stokes of raw data for handwritten notes to identify a plurality of index terms, wherein an index term comprises at least one ink word, the index terms identified as one of at least two ink words with a pairwise distance within a predetermined threshold, and for automatically generating an index for said index terms by linking said index terms to a location in said handwritten notes where said index terms are located; and
an output display, coupled to said processor unit, for displaying said index for said handwritten notes.
-
Specification