Concept based cross media indexing and retrieval of speech documents

US 7,716,221 B2
Filed: 06/01/2007
Issued: 05/11/2010
Est. Priority Date: 06/02/2006
Status: Active Grant

First Claim

Patent Images

1. A method of cross media indexing, registering and retrieving speech documents, the method comprising:

a computing device pre-processing a set of training documents, including at least creating training document metadata;

the computing device constructing a terms-phonemes/document matrix from the training document metadata where rows are created for the terms and phonemes contained in the set of training documents and columns are created for each training document;

the computing device normalizing entries in the terms-phonemes/document matrix;

the computing device computing a vector space from the training documents by computing from the terms-phonemes/document matrix and storing the vector space in a catalog; and

the computing device computing vectors for new documents and adding the vectors to the vector space without computing a new vector space in response to adding the vectors.

View all claims

5 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Indexing, searching, and retrieving the content of speech documents (including but not limited to recorded books, audio broadcasts, recorded conversations) is accomplished by finding and retrieving speech documents that are related to a query term at a conceptual level, even if the speech documents does not contain the spoken (or textual) query terms. Concept-based cross-media information retrieval is used. A term-phoneme/document matrix is constructed from a training set of documents. Documents are then added to the matrix constructed from the training data. Singular Value Decomposition is used to compute a vector space from the term-phoneme/document matrix. The result is a lower-dimensional numerical space where term-phoneme and document vectors are related conceptually as nearest neighbors. A query engine computes a cosine value between the query vector and all other vectors in the space and returns a list of those term-phonemes and/or documents with the highest cosine value.

16 Citations

View as Search Results

32 Claims

1. A method of cross media indexing, registering and retrieving speech documents, the method comprising:
- a computing device pre-processing a set of training documents, including at least creating training document metadata;
  
  the computing device constructing a terms-phonemes/document matrix from the training document metadata where rows are created for the terms and phonemes contained in the set of training documents and columns are created for each training document;
  
  the computing device normalizing entries in the terms-phonemes/document matrix;
  
  the computing device computing a vector space from the training documents by computing from the terms-phonemes/document matrix and storing the vector space in a catalog; and
  
  the computing device computing vectors for new documents and adding the vectors to the vector space without computing a new vector space in response to adding the vectors.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1, wherein creating training document metadata comprises creating a record for each document in the set of training documents, the metadata comprising at least one of document terms and phonemes counts, document type, creation date, or location.
  - 3. The method of claim 1, wherein pre-processing comprises:
    - the computing device transcribing phonetically speech documents in the set of training documents into an intermediate representative language, thereby creating phonetic transcriptions;
      
      the computing device converting the training documents from native format to UTF-8 format; and
      
      the computing device segmenting the training documents.
  - 4. The method of claim 3, wherein segmenting comprises tokenizing the phonetic transcriptions and the converted documents to create counts for index terms and phonemes.
  - 5. The method of claim 1, wherein computing a vector space comprises using a Singular Value Decomposition technique.
  - 6. The method of claim 1, wherein computing vectors for new documents comprises creating a term-phoneme vector for each new document by summing weighted vectors for words and phonemes contained in each new document.
  - 7. The method of claim 1 further comprising:
    - the computing device searching the vector space for vectors that are close to a vector computed for one or more query terms or phonemes; and
      
      the computing device providing a list of documents associated with vectors in the vector space that are closest to the vector computed for the one or more query terms or phonemes.
  - 8. The method of claim 7, wherein searching the vector space comprises:
    - computing a cosine value between a query vector and the vectors in the vector space; and
      
      returning a list of documents having vectors with the highest cosine values.

9. An apparatus comprising:
- a pre-processor configured to register a set of training documents, including creating metadata comprising at least document terms and phonemes counts;
  
  a computing device configured to;
  
  compute a terms-phonemes/document matrix from the metadata;
  
  normalize the terms-phonemes/document matrix; and
  
  compute a vector space from the normalized terms-phonemes/ document matrix;
  
  the pre-processor further configured to compute vectors for new documents and to add the vectors to the vector space without computing a new vector space in response to adding the vectors to the vector space;
  
  a query engine configured to search the vector space for vectors that are close to a vector computed for one or more query terms or phonemes; and
  
  an interface configured to provide a list of documents associated with vectors in the vector space that are closest to the vector computed for the one or more query terms or phonemes.
- View Dependent Claims (10, 11, 12, 13, 14, 15)
- - 10. The apparatus of claim 9, wherein the training document metadata for each training document further comprises at least one of document type, creation date, or location.
  - 11. The apparatus of claim 9, wherein the pre-processor is further configured to:
    - phonetically transcribe speech documents in the set of training documents into an intermediate representative language, thereby creating phonetic transcriptions;
      
      convert the set of training documents from native format to UTF-8 format; and
      
      segment each document in the set of training documents.
  - 12. The apparatus of claim 11, wherein the pre-processor configured to segment each document comprises the pre-processor configured to tokenize the phonetic transcriptions and the converted documents to create counts for index terms and phonemes.
  - 13. The apparatus of claim 9, wherein the vectors for new documents comprise a summation of weighted vectors for words or phonemes contained in a new document.
  - 14. The apparatus of claim 9, wherein the query engine is configured to compute a cosine value between a query vector and the vectors in the concept vector space, andwherein the interface is configured to provide documents having vectors with the highest cosine values.
  - 15. The apparatus of claim 9, wherein the computing device is configured to perform a singular value decomposition on the terms-phonemes/document matrix.

16. A tangible computer readable medium having instructions stored thereon, the instructions configured to cause a computing device to:
- pre-process a set of training documents, including at least creating training document metadata;
  
  construct a terms-phonemes/document matrix from the training document metadata where rows are created for the terms and phonemes contained in the set of training documents and a column is created for each training document;
  
  normalize entries in the terms-phonemes/document matrix;
  
  compute a vector space from the training documents by computing from the terms-phonemes/document matrix; and
  
  compute vectors for new documents and adding the vectors to the vector space without computing a new vector space in response to adding the vectors.
- View Dependent Claims (17, 18, 19, 20, 21)
- - 17. The tangible computer readable medium of claim 16, wherein the instructions configured to cause the computing device to pre-process the set of training documents comprise instructions configured to cause the computing device to:
    - phonetically transcribe speech documents in the set of training documents into an intermediate representative language, thereby creating phonetic transcriptions;
      
      convert the training documents from native format to UTF-8 format; and
      
      segment each document in the set of training documents.
  - 18. The tangible computer readable medium of claim 17, wherein the instructions configured to cause the computing device to segment each document in the set of training documents comprise instructions configured to cause the computing device to tokenize the phonetic transcriptions and the converted documents to create counts for index terms and phonemes.
  - 19. The tangible computer readable medium of claim 16, wherein the instructions configured to cause the computing device to compute vectors for new documents comprise instructions configured to cause the computing device to create a term-phoneme vector for each new document by summing weighted vectors for words and phonemes contained in each new document.
  - 20. The tangible computer readable medium of claim 16, wherein the instructions are further configured to cause the computing device to:
    - search the vector space for vectors that are close to a vector computed for one or more query terms or phonemes; and
      
      provide a list of documents associated with vectors in the vector space that are closest to the vector computed for the one or more query terms or phonemes.
  - 21. The tangible computer readable medium of claim 20, wherein the instructions configured to cause the computing device to search the vector space for vectors that are close to a vector computed for one or more query terms or phonemes comprise instructions configured to cause the computing device to:
    - compute a cosine value between the vector computed for the one or more query terms or phonemes and the vectors in the vector space; and
      
      return a list of documents having vectors with the highest cosine values.

22. An apparatus comprising:
- a memory;
  
  a processor device in communication with the memory and configured, in combination with the memory, to;
  
  pre-process a set of training documents, including at least creating training document metadata;
  
  construct a terms-phonemes/document matrix from the training document metadata where rows are created for the terms and phonemes contained in the set of training documents and columns are created for each training document;
  
  normalize entries in the terms-phonemes/document matrix;
  
  compute a vector space from the training documents by computing from the terms-phonemes/document matrix; and
  
  compute vectors for new documents and adding the vectors to the vector space without computing a new vector space in response to adding the vectors.
- View Dependent Claims (23, 24, 25)
- - 23. The apparatus of claim 22, wherein the training document metadata for each training document further comprises at least one of document terms and phonemes counts, document type, creation date, or location.
  - 24. The apparatus of claim 22, wherein the vectors for new documents comprise a summation of weighted vectors for words or phonemes contained in each new document.
  - 25. The apparatus of claim 22, wherein the processor device is further configured to:
    - search the vector space for vectors that are close to a vector computed for one or more query terms or phonemes; and
      
      provide a list of documents associated with vectors in the vector space that are closest to the vector computed for the one or more query terms or phonemes.

26. A method comprising:
- a computing device pre-processing a set of training documents, including at least tokenizing the set of training documents to create counts for index terms and phonemes;
  
  the computing device constructing a terms-phonemes/document matrix from the counts;
  
  the computing device normalizing entries in the terms-phonemes/document matrix;
  
  the computing device computing a vector space from the normalized terms-phonemes/document matrix;
  
  the computing device computing vectors for new documents and adding the vectors to the vector space.
- View Dependent Claims (27, 28, 29, 30)
- - 27. The method of claim 26, wherein pre-processing comprises:
    - the computing device transcribing phonetically any speech documents in the set of training documents into an intermediate representative language, thereby creating phonetic transcriptions;
      
      the computing device converting the set of training documents from native format to UTF-8 format; and
      
      wherein the computing device tokenizing the set of training documents comprises the computing device tokenizing the transcribed and converted documents.
  - 28. The method of claim 26, wherein computing vectors for new documents comprises creating a term-phoneme vector for each new document by summing weighted vectors for words and phonemes contained in each new document.
  - 29. The method of claim 26, further comprising:
    - the computing device searching the vector space for vectors that are close to a vector computed for one or more query terms or phonemes; and
      
      the computing device providing a list of documents associated with vectors in the vector space that are closest to the vector computed for the one or more query terms or phonemes.
  - 30. The method of claim 29, wherein searching the vector space comprises:
    - the computing device computing a cosine value between a query vector and the vectors in the vector space; and
      
      the computing device returning a list of documents having vectors with the highest cosine values.

31. An apparatus for cross media indexing, registering, and retrieving speech documents, the apparatus comprising:
- an interface configured to receive a set of documents;
  
  a pre-processor configured to transcribe the set of documents and tokenize the transcriptions to create counts for index terms and phonemes;
  
  a computing device configured to compute a term-phoneme/document matrix, normalize the term-phoneme/document matrix, and compute a vector space from the normalized term-phoneme/document matrix;
  
  a database configured to store the vector space and the counts;
  
  a query engine configured to search the vector space for documents closest to a query vector.
- View Dependent Claims (32)
- - 32. The apparatus of claim 31, wherein the interface is further configured to receive one or more new documents and wherein the computing device is further configured to add vectors of the one or more new documents to the vector space.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nytell Software LLC (Intellectual Ventures LLC)
Original Assignee
Telcordia Licensing Co LLC (Telefonaktiebolaget LM Ericsson)
Inventors
Bassu, Devasis, Egan, Dennis E., Behrens, Clifford A.
Primary Examiner(s)
Wong; Don
Assistant Examiner(s)
Nguyen; Merilyn P

Application Number

US11/809,455
Publication Number

US 20070299838A1
Time in Patent Office

1,075 Days
Field of Search

707 1- 3, 707/5, 707/6, 707100-102, 707/104.1, 704/9, 704/254, 704/255
US Class Current

707/736
CPC Class Codes

G06F 16/3343 using phonetics

G06F 16/685 using automatically derived...

Concept based cross media indexing and retrieval of speech documents

First Claim

5 Assignments

0 Petitions

Accused Products

Abstract

16 Citations

32 Claims

Specification

Use Cases

Quick Links

Others

Concept based cross media indexing and retrieval of speech documents

First Claim

5 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

16 Citations

32 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others