METHOD AND SYSTEM FOR PROVIDING ACCESS TO INFORMATION OF POTENTIAL INTEREST TO A USER
First Claim
Patent Images
1. A method for extracting a sentence from on incoming stream of text corresponding to a program, the method comprising:
- retrieving end-of-sentence punctuation marks for a language identified for the incoming stream of text;
locating punctuation marks in the incoming stream of text that match one of more of the retrieved end-of-sentence punctuation marks;
comparing characters around the located punctuation marks to a list of word-punctuation pairs for the identified language to determine when a located punctuation mark is a valid end-of-sentence punctuation marks as opposed to an invalid one not to be considered an end-of-sentence punctuation mark despite its presence in the retrieved end-of-sentence punctuation marks for the identified language; and
for any located valid punctuation marks, identifying a group of words between located valid punctuation marks as sentences.
1 Assignment
0 Petitions
Accused Products
Abstract
The present invention provides a method and system for providing access to information of potential interest to a user. Closed-caption information is analyzed to find related information on the Internet. User interactions with a TV which receives programming including closed-caption information are monitored to determine user interests or topics.
-
Citations
20 Claims
-
1. A method for extracting a sentence from on incoming stream of text corresponding to a program, the method comprising:
-
retrieving end-of-sentence punctuation marks for a language identified for the incoming stream of text; locating punctuation marks in the incoming stream of text that match one of more of the retrieved end-of-sentence punctuation marks; comparing characters around the located punctuation marks to a list of word-punctuation pairs for the identified language to determine when a located punctuation mark is a valid end-of-sentence punctuation marks as opposed to an invalid one not to be considered an end-of-sentence punctuation mark despite its presence in the retrieved end-of-sentence punctuation marks for the identified language; and for any located valid punctuation marks, identifying a group of words between located valid punctuation marks as sentences. - View Dependent Claims (2, 3, 4)
-
-
5. A method for identifying a language of an incoming stream of text corresponding to a program, the method comprising:
-
comparing the incoming stream of text against one or more character sets, each of the character sets identifying characters used in a different language; identifying stop words in the incoming stream of text and comparing the identified stop words with stop words corresponding to one or more languages; and identifying a particular language for the incoming stream of text based on a marched character set and identified stop words corresponding to the particular language. - View Dependent Claims (6, 7, 8)
-
-
9. A method for validating a topic extracted from a stream of text corresponding to a program, the method comprising:
-
locally validating the topic by comparing the topic against one or more local word lists; and remotely validating the topic by submitting the topic as a query to an Internet search engine and comparing the number of results received from the Internet search engine to a predefined threshold. - View Dependent Claims (10, 11, 12)
-
-
13. An apparatus comprising:
-
a closed-caption decoder configured to decode a raw closed caption stream for a program and produce closed caption text; a language detection module configured to determine a language for the closed caption text; a sentence detection module configured to determine sentences within the closed caption text; a tagger configured to tag keywords based in the closed caption text based on the determined language and based on determined sentences; a topic extractor configured to extract topics based on the tagged keywords; and a validation module configured to validate the extracted topics. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20)
-
Specification