TRANS-LINGUAL REPRESENTATION OF TEXT DOCUMENTS
First Claim
Patent Images
1. A method of creating a trans-lingual text representation comprising:
- accepting first language data, wherein the first language data comprises a plurality of documents in a first language;
accepting second language data, wherein the second language data comprises a plurality of documents in a second language, wherein each document in a second language is comparable to a corresponding document in the first language;
creating a first document-term matrix from the first language data, comprising a plurality of rows, each of said row corresponding to one of a plurality of documents in a first language;
creating a second document-term matrix from the second language data, comprising a plurality of rows, each of said rows corresponding to one of a plurality of documents in a second language;
applying an algorithm to the first matrix and the second matrix to produce a translingual text representation, wherein the translingual text representation comprises a plurality of vectors, each vector corresponding to either one row in the first document-term matrix or one row in the second document-term matrix, wherein the algorithm;
minimizes the distance between pairs of translingual text representation vectors which correspond to a document in a first language and a document in a second language that is comparable to the document in the first language; and
,maximizes the distance between pairs of translingual text representation vectors which do not correspond to a document in a first language and a document in a second language that is comparable to the document in the first language.
2 Assignments
0 Petitions
Accused Products
Abstract
A method of creating translingual text representations takes in documents in a first language and in a second language and creates a matrix using the words in the documents to represent which words are present in which language. An algorithm is applied to each matrix such that like documents are placed close to each other and unlike documents are moved far from each other.
59 Citations
18 Claims
-
1. A method of creating a trans-lingual text representation comprising:
-
accepting first language data, wherein the first language data comprises a plurality of documents in a first language; accepting second language data, wherein the second language data comprises a plurality of documents in a second language, wherein each document in a second language is comparable to a corresponding document in the first language; creating a first document-term matrix from the first language data, comprising a plurality of rows, each of said row corresponding to one of a plurality of documents in a first language; creating a second document-term matrix from the second language data, comprising a plurality of rows, each of said rows corresponding to one of a plurality of documents in a second language; applying an algorithm to the first matrix and the second matrix to produce a translingual text representation, wherein the translingual text representation comprises a plurality of vectors, each vector corresponding to either one row in the first document-term matrix or one row in the second document-term matrix, wherein the algorithm; minimizes the distance between pairs of translingual text representation vectors which correspond to a document in a first language and a document in a second language that is comparable to the document in the first language; and
,maximizes the distance between pairs of translingual text representation vectors which do not correspond to a document in a first language and a document in a second language that is comparable to the document in the first language. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A computer storage medium comprising computer executable instructions for creating a trans-lingual text representation, the computer executable instructions comprising instructions for:
-
accepting first language data, wherein the first language data comprises a plurality of documents in a first language; accepting second language data, wherein the second language data comprises a plurality of documents in a second language, wherein each document in a second language is comparable to a corresponding document in the first language; creating a first document-term matrix from the first language data, comprising a plurality of rows, each of said row corresponding to one of a plurality of documents in a first language; creating a second document-term matrix from the second language data, comprising a plurality of rows, each of said rows corresponding to one of a plurality of documents in a second language; applying an algorithm to the first matrix and the second matrix to produce a translingual text representation, wherein the translingual text representation comprises a plurality of vectors, each vector corresponding to either one row in the first document-term matrix or one row in the second document-term matrix, wherein the algorithm; minimizes the distance between pairs of translingual text representation vectors which correspond to a document in a first language and a document in a second language that is comparable to the document in the first language; and
,maximizes the distance between pairs of translingual text representation vectors which do not correspond to a document in a first language and a document in a second language that is comparable to the document in the first language. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. A computer system comprising a processor for executing computer executable instructions, a memory for assisting execution of the computer executable instructions and an input/output circuit, the computer executable instructions comprising instructions for
accepting first language data, wherein the first language data comprises a plurality of documents in a first language; -
accepting second language data, wherein the second language data comprises a plurality of documents in a second language, wherein each document in a second language is comparable to a corresponding document in the first language; creating a first document-term matrix from the first language data, comprising a plurality of rows, each of said row corresponding to one of a plurality of documents in a first language; creating a second document-term matrix from the second language data, comprising a plurality of rows, each of said rows corresponding to one of a plurality of documents in a second language; applying an algorithm to the first matrix and the second matrix to produce a translingual text representation, wherein the translingual text representation comprises a plurality of vectors, each vector corresponding to either one row in the first document-term matrix or one row in the second document-term matrix, wherein the algorithm; minimizes the distance between pairs of translingual text representation vectors which correspond to a document in a first language and a document in a second language that is comparable to the document in the first language; and
,maximizes the distance between pairs of translingual text representation vectors which do not correspond to a document in a first language and a document in a second language that is comparable to the document in the first language. - View Dependent Claims (14, 15, 16, 17, 18)
-
Specification