Generation of a semantic model from textual listings
First Claim
1. A system comprising:
- a processing device to;
identify main concept words and attribute words in textual listings;
cluster words, in the textual listings, based on at least one of the main concept words or the attribute words according to at least one clustering rule,the at least one clustering rule including at least one of;
a first rule preventing clustering of words based on a frequency of appearance of words in a same textual listing,a second rule preventing clustering of a quantitative attribute word with a qualitative attribute word, ora third rule indicating clustering of two words when characters of a first word, of the two words, are included in a second word of the two words; and
provide, after clustering the words, the main concept words and the attribute words as at least a portion of a semantic model,the semantic model being used for subsequent clustering.
1 Assignment
0 Petitions
Accused Products
Abstract
A corpus of textual listings is received and main concept words and attribute words therein are identified via an iterative process of parsing listings and expanding a semantic model. During the parsing phase, the corpus of textual listings is parsed to tag one or more head noun words and/or one or more identifier words in each listing based on previously identified main concept words or using a head noun identification rule. Once substantially each listing in the corpus has been parsed in this manner, the expansion phase assigns head noun words as main concept words and modifier words as attribute words, where possible. During the next iteration, the newly identified main concept words and/or attribute words are used to further parse the listings. These iterations are repeated until a termination condition is reached. Remaining words in the corpus are clustered based on the main concept words and attribute words.
-
Citations
20 Claims
-
1. A system comprising:
a processing device to; identify main concept words and attribute words in textual listings; cluster words, in the textual listings, based on at least one of the main concept words or the attribute words according to at least one clustering rule, the at least one clustering rule including at least one of; a first rule preventing clustering of words based on a frequency of appearance of words in a same textual listing, a second rule preventing clustering of a quantitative attribute word with a qualitative attribute word, or a third rule indicating clustering of two words when characters of a first word, of the two words, are included in a second word of the two words; and provide, after clustering the words, the main concept words and the attribute words as at least a portion of a semantic model, the semantic model being used for subsequent clustering. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
8. A non-transitory computer-readable medium storing instructions, the instructions comprising:
one or more instructions that, when executed by one or more processors, cause the one or more processors to; identify main concept tokens and attribute tokens in text corpora; cluster tokens, in the text corpora, based on at least one of the main concept tokens or the attribute tokens according to at least one clustering rule, the at least one clustering rule including at least one of; a first rule associated with a frequency of appearance of tokens in a same text corpora, a second rule associated with a type of an attribute token, or a third rule associated with characters of a first token and a second token; and provide, after clustering the tokens, the main concept tokens and the attribute tokens as at least a portion of a semantic model, the semantic model being used for subsequent clustering. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
15. A method, comprising:
-
identifying, by a device, main concepts and attributes in listing corpora; clustering, by the device, words, in the listing corpora, based on at least one of the main concepts or the attributes according to one or more rules, the one or more rules including one or more of; a first rule preventing clustering of words based on a frequency of appearance of words in a same listing corpora, a second rule preventing clustering of a quantitative attribute word with a qualitative attribute word, or a third rule indicating clustering of two words when characters of a first word, of the two words, are included in a second word of the two words; and providing, by the device, after clustering the words, the main concept words and the attribute words as at least a portion of a semantic model, the semantic model being used for subsequent clustering. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification