DETERMINING INTENT FROM MULTIMODAL CONTENT EMBEDDED IN A COMMON GEOMETRIC SPACE
First Claim
Patent Images
1. A method of creating a semantic embedding space for multimodal content for determining intent of content, the method comprising:
- for each of a plurality of content of the multimodal content, creating a respective, first modality feature vector representative of content of the multimodal content having a first modality using a first machine learning model;
for each of a plurality of content of the multimodal content, creating a respective, second modality feature vector representative of content of the multimodal content having a second modality using a second machine learning model;
for each of a plurality of first modality feature vector and second modality feature vector multimodal content pairs, forming a combined multimodal feature vector from the first modality feature vector and the second modality feature vector;
for at least one first modality feature vector and second modality feature vector multimodal content pair, assigning at least one taxonomy class of intent; and
semantically embedding the respective, combined multimodal feature vectors in a common geometric space, wherein embedded combined multimodal feature vectors having related intent are closer together in the common geometric space than unrelated multimodal feature vectors.
1 Assignment
0 Petitions
Accused Products
Abstract
Inferring multimodal content intent in a common geometric space in order to improve recognition of influential impacts of content includes mapping the multimodal content in a common geometric space by embedding a multimodal feature vector representing a first modality of the multimodal content and a second modality of the multimodal content and inferring intent of the multimodal content mapped into the common geometric space such that connections between multimodal content result in an improvement in recognition of the influential impact of the multimodal content.
-
Citations
20 Claims
-
1. A method of creating a semantic embedding space for multimodal content for determining intent of content, the method comprising:
-
for each of a plurality of content of the multimodal content, creating a respective, first modality feature vector representative of content of the multimodal content having a first modality using a first machine learning model; for each of a plurality of content of the multimodal content, creating a respective, second modality feature vector representative of content of the multimodal content having a second modality using a second machine learning model; for each of a plurality of first modality feature vector and second modality feature vector multimodal content pairs, forming a combined multimodal feature vector from the first modality feature vector and the second modality feature vector; for at least one first modality feature vector and second modality feature vector multimodal content pair, assigning at least one taxonomy class of intent; and semantically embedding the respective, combined multimodal feature vectors in a common geometric space, wherein embedded combined multimodal feature vectors having related intent are closer together in the common geometric space than unrelated multimodal feature vectors. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A method of creating a semantic embedding space for multimodal content for determining intent of content, the method comprising:
-
for each of a plurality of content of the multimodal content, creating a respective, first modality feature vector representative of content of the multimodal content having a first modality using a first machine learning model; for each of a plurality of content of the multimodal content, creating a respective, second modality feature vector representative of content of the multimodal content having a second modality using a second machine learning model; for each of a plurality of first modality feature vector and second modality feature vector multimodal content pairs, forming a combined multimodal feature vector from the first modality feature vector and the second modality feature vector; for at least one first modality feature vector and second modality feature vector multimodal content pair, assigning at least one taxonomy class of intent; projecting the combined multimodal feature vector into the common geometric space; and inferring an intent of the multimodal content represented by the combined multimodal feature vector based on the projection of the multimodal feature vector in the common geometric space and a classifier. - View Dependent Claims (14, 15, 16)
-
-
17. A non-transitory computer-readable medium having stored thereon at least one program, the at least one program including instructions which, when executed by a processor, cause the processor to perform a method of creating a semantic embedding space for multimodal content for determining intent of content, comprising:
-
for each of a plurality of content of the multimodal content, creating a respective, first modality feature vector representative of content of the multimodal content having a first modality using a first machine learning model; for each of a plurality of content of the multimodal content, creating a respective, second modality feature vector representative of content of the multimodal content having a second modality using a second machine learning model; for each of a plurality of first modality feature vector and second modality feature vector multimodal content pairs, forming a combined multimodal feature vector from the first modality feature vector and the second modality feature vector; for at least one first modality feature vector and second modality feature vector multimodal content pair, assigning at least one taxonomy class of intent; and semantically embedding the respective, combined multimodal feature vectors in a common geometric space, wherein embedded combined multimodal feature vectors having related intent are closer together in the common geometric space than unrelated multimodal feature vectors. - View Dependent Claims (18, 19, 20)
-
Specification