×

Trans-lingual representation of text documents

  • US 8,738,354 B2
  • Filed: 06/19/2009
  • Issued: 05/27/2014
  • Est. Priority Date: 06/19/2009
  • Status: Active Grant
First Claim
Patent Images

1. A method comprising:

  • accepting first language data, wherein the first language data comprises first documents in a first language and the first documents are associated with multiple topics;

    accepting second language data, wherein the second language data comprises second documents in a second language that is different than the first language, wherein the second documents in the second language are also associated with at least some of the multiple topics and the first language data and second language data collectively comprise pairs of documents that are on the same topic;

    obtaining a first document-term matrix from the first language data, wherein the first document-term matrix comprises a plurality of first rows and different first rows of the first document-term matrix correspond to different first documents in the first language;

    obtaining a second document-term matrix from the second language data, wherein the second document-term matrix comprises a plurality of second rows and different second rows of the second document-term matrix correspond to different second documents in the second language; and

    applying an algorithm to the first document-term matrix to produce a first stored matrix for the first language and to the second document-term matrix to produce a second stored matrix for the second language,wherein;

    multiplying the first stored matrix by the first document-term matrix produces a plurality of first translingual text representation vectors,multiplying the second stored matrix by the second document-term matrix produces a plurality of second translingual text representation vectors, andapplying the algorithm comprises adjusting the first stored matrix and the second stored matrix to thereby reduce distances between individual first translingual text representation vectors and individual second translingual text representation vectors for the pairs of documents that are on the same topic,wherein at least the applying the algorithm is performed by a computer.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×