Generation of nickname dictionary based on analysis of user communications
First Claim
Patent Images
1. A method comprising:
- accessing, by one or more computing devices, a data store of messages of a select message type to generate data relevant to associations between names associated with target users associated with a social network system and words in the messages, each message being directed to a target user and comprising one or more words, each target user being associated with one or more names;
performing, by one or more computing devices, statistical analysis of the data to identify associations between at least one name associated with at least one of the target users and one or more words identified in the messages, the statistical analysis being based, at least in part, on;
a number of occurrences of a candidate word under consideration in messages directed to target users sharing a same name; and
a number of occurrences of words other than the candidate word in messages directed to target users sharing the same name; and
constructing, by one or more computing devices, a data structure indicating confidence-of-nickname associations between the at least one name and the one or more words-based at least in part on the statistical analysis.
3 Assignments
0 Petitions
Accused Products
Abstract
Methods, apparatuses and systems for generating a nickname dictionary that includes associations between names of users and candidate nicknames based on statistical analysis of user communications observed at a network communications facility, such as a social network system, an email provider and the like.
57 Citations
23 Claims
-
1. A method comprising:
-
accessing, by one or more computing devices, a data store of messages of a select message type to generate data relevant to associations between names associated with target users associated with a social network system and words in the messages, each message being directed to a target user and comprising one or more words, each target user being associated with one or more names; performing, by one or more computing devices, statistical analysis of the data to identify associations between at least one name associated with at least one of the target users and one or more words identified in the messages, the statistical analysis being based, at least in part, on; a number of occurrences of a candidate word under consideration in messages directed to target users sharing a same name; and a number of occurrences of words other than the candidate word in messages directed to target users sharing the same name; and constructing, by one or more computing devices, a data structure indicating confidence-of-nickname associations between the at least one name and the one or more words-based at least in part on the statistical analysis.
-
-
2. The method of claim 1 wherein the data relevant to associations between names associated with target users and words in the messages comprises statistical attribute data.
-
3. The method of claim 1 wherein the select message type corresponds to birthdays of the target users of the respective messages.
-
4. The method of claim 1 wherein the messages are wall posts on profile pages of a social network system.
-
5. The method of claim 1 wherein the messages are electronic mail messages directed to target users associated with a social network system.
-
6. The method of claim 1 wherein the accessing the data store comprises constructing a data structure that groups the messages by one of the names associated with the target users.
-
7. The method of claim 1 wherein the performing statistical analysis of the data comprises applying one or more thresholds to the statistical attribute data.
-
8. The method of claim 2 wherein the performing statistical analysis of the data comprises applying the data to a statistical analysis module to identify a confidence of the nickname association between the at least one name and the one or more words identified in the messages.
-
9. The method of claim 8 wherein the statistical analysis module implements Fisher'"'"'s exact test.
-
10. The method of claim 1 further comprising consulting the data structure to modify at least one search operation directed to search queries including names.
-
11. The method of claim 10 wherein the search operation comprises ranking of results including nicknames that are returned in response to queries including names.
-
12. The method of claim 1 wherein the select message type corresponds to a high density of name use in messages.
-
13. An apparatus comprising:
-
one or more processors; and a memory operably coupled to the processors comprising instructions executable by the processors, the processors being operable when executing the instructions to; access a data store of messages of a select message type to generate data relevant to associations between names associated with target users associated with a social network system and words in the messages, each message being directed to a target user and comprising one or more words, each target user being associated with one or more names; perform statistical analysis of the data to identify associations between at least one name associated with at least one of the target users and one or more words identified in the messages, the statistical analysis being based, at least in part, on; a number of occurrences of a candidate word under consideration in messages directed to target users sharing a same name; and a number of occurrences of words other than the candidate word in messages directed to target users sharing the same name; and construct a data structure indicating confidence-of-nickname associations between the at least one name and the one or more words-based at least in part on the statistical analysis.
-
-
14. The apparatus of claim 13 wherein the data relevant to associations between names associated with target users and words in the messages comprises statistical attribute data.
-
15. The apparatus of claim 13 wherein the select message type corresponds to birthdays of the target users of the respective messages.
-
16. The apparatus of claim 13 wherein the one or more code modules further comprise computer-readable instructions operative to cause the one or more processors to construct a data structure that groups the messages one of the names associated with the target users.
-
17. A method comprising:
-
accessing a data store of messages; generating statistical counts relevant to name-word associations, wherein each message is directed to a target user and includes one or more words, wherein each target user is associated with one or more names, and wherein the statistical counts, for a given name-candidate word pair, comprise; a first count of the association occurrences between the name and the candidate word, a second count of the association occurrences between the name and all words other than candidate word, a third count of the association occurrences between the candidate name and all other words than the candidate word, and a fourth count of the number of occurrences of words other than the candidate word that have at least one association occurrence with a name, including the candidate name, that has at least one association occurrence with the candidate word; applying, for one or more pairs of a candidate name and a candidate word, a statistical algorithm to the statistical counts to determine a confidence-of-nickname association between the candidate name and candidate word; and constructing a data structure indicating the confidence-of-nickname association between the candidate name and the candidate word.
-
-
18. The method of claim 17 wherein the statistical algorithm is Fisher'"'"'s exact test.
-
19. The method of claim 17 wherein the messages are filtered against a select message type during the accessing step.
-
20. The method of claim 19 wherein the select message type is defined relative to a plurality of attributes including a message channel type and a temporal attribute.
-
21. The method of claim 20 wherein the temporal attribute is a time range.
-
22. The method of claim 20 wherein the temporal attribute is defined relative to birthdays of target users.
-
23. The method of claim 17 wherein the accessing the data store comprises generating a message table including the one or more words of one or more messages in the data store grouped by target user name.
Specification