Text sample entry group formulation

US 9,535,983 B2
Filed: 10/29/2013
Issued: 01/03/2017
Est. Priority Date: 10/29/2013
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

an act of accessing a set of text samples, each having a corresponding text sample identifier;

for each of at least some of the set of text samples, an act of preparing the text sample, the act of preparing the text sample comprising;

an act of parsing a plurality of text components from the text sample; and

for each of at least some of the parsed plurality of text components, an act of identifying the text component, the act of identifying the text component comprising;

an act of determining if the text component is already correlated to a text component identifier, the text component identifier representing the content while being distinguished from the content;

if the text component is already correlated to a text component identifier, assigning the text component identifier to the text component and such that when two text components are the same then the two text components will be assigned a same text component identifier;

if the text component is not already correlated to a text component identifier, assigning a new text component identifier to the text component; and

an act of creating a text component entry comprising a) the text sampleidentifier for the text sample from which the text component was parsed, and b) the assigned text component identifier;

an act of creating a text sample entry group comprising a plurality of text component entries corresponding to text components parsed from the text sample, and such that the plurality of text component entries are sorted by sequence of the corresponding text component within the text sample; and

an act of storing a plurality of text sample entry groups created by performance of the act of preparing the text sample for each of the at least some of the set of text samples, wherein the pluarity of text samples entries are stored in a text component entry table that includes a duplicate set of text component entries having a same text sample identifier and component identifier pairing.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Storing text samples in a manner that the text samples may be quickly searched. The text samples are assigned a text sample identifier and are each parsed to thereby extract text components from the text samples. Text components that have the same content are assigned the same text component identifier. For each parsed text component, a text component entry is created that includes the assigned text component identifier as well as the text sample identifier for the text sample from which the text component was parsed. A text sample entry group is created for each text sample that contains the text component entries in sequence for the text components found within the text sample. The text sample entry groups are stored so as to be scannable during a future search.

8 Citations

20 Claims

1. A method comprising:
- an act of accessing a set of text samples, each having a corresponding text sample identifier;
  
  for each of at least some of the set of text samples, an act of preparing the text sample, the act of preparing the text sample comprising;
  
  an act of parsing a plurality of text components from the text sample; and
  
  for each of at least some of the parsed plurality of text components, an act of identifying the text component, the act of identifying the text component comprising;
  
  an act of determining if the text component is already correlated to a text component identifier, the text component identifier representing the content while being distinguished from the content;
  
  if the text component is already correlated to a text component identifier, assigning the text component identifier to the text component and such that when two text components are the same then the two text components will be assigned a same text component identifier;
  
  if the text component is not already correlated to a text component identifier, assigning a new text component identifier to the text component; and
  
  an act of creating a text component entry comprising a) the text sampleidentifier for the text sample from which the text component was parsed, and b) the assigned text component identifier;
  
  an act of creating a text sample entry group comprising a plurality of text component entries corresponding to text components parsed from the text sample, and such that the plurality of text component entries are sorted by sequence of the corresponding text component within the text sample; and
  
  an act of storing a plurality of text sample entry groups created by performance of the act of preparing the text sample for each of the at least some of the set of text samples, wherein the pluarity of text samples entries are stored in a text component entry table that includes a duplicate set of text component entries having a same text sample identifier and component identifier pairing.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
- - 2. The method in accordance with claim 1, wherein the act of storing comprises an act of storing the plurality of text sample entry groups in a predetermined ordering by text sample identifier.
  - 3. The method in accordance with claim 1, further comprising:
    - an act of performing a search on the plurality of text sample entry groups.
  - 4. The method in accordance with claim 3, wherein the act of performing a search comprises an act of performing a search for a sequence of text components, the method comprising:
    - an act of identifying a plurality of text components in the sequence of text components to be searched for;
      
      an act of scanning through the text sample identifiers of the plurality of text sample entry groups in search of a text component identifier associated with a first text component in the sequence of text components;
      
      whenever upon finding a text component identifier associated with the first text component during the act of scanning, performing the following;
      
      an act of confirming whether or not the found text component identifier in association with one or more text component identifiers that follow within the same text sample entry group collectively identify the sequence of text components to be searched for; and
      
      if the act of confirming confirms that the found text component identifier in association with the one or more text component identifiers that follow within the same text sample entry group do collectively identify the sequence of text component to be searched for, an act of using the corresponding text sample identifier to identify the text sample that includes the sequence to be searched for.
  - 5. The method in accordance with claim 3, wherein the act of performing a search comprises an act of performing a search for a text sample that includes a first particular text component, the method comprising:
    - an act of scanning through the text sample identifiers of the plurality of text sample entry groups in search of a first particular text component identifier that identifies the first particular text component;
      
      whenever upon finding the first particular text component identifier during the act of scanning, an act of using the corresponding text sample identifier to identify the text sample that includes the first particular text component.
  - 6. The method in accordance with claim 5, wherein the act of using the corresponding text sample identifier to identify the text sample that includes the first particular text component comprises:
    - an act of using a bit of a first bitmap to record that the text sample includes the first particular text component to be searched for, wherein the first bitmap has a corresponding bit for each text sample in the set of text samples.
  - 7. The method in accordance with claim 5, wherein the act of performing a search comprises an act of performing a search for a text sample that includes the first particular text component and a second particular text component,the act of scanning also performed in search of a second particular text component identifier that identifies the second particular text component;
    - whenever upon finding the second particular text component identifier during the act of scanning, an act of using the corresponding text sample identifier to identify the text sample that includes the second particular text component, the method further comprising;
      
      an act of identifying a result of the search as including at least some of the text samples that include the first particular text component and the second particular text component.
  - 8. The method in accordance with claim 7,wherein the act of using the corresponding text sample identifier to identify the text sample that includes the first particular text component comprises an act of using a bit of a first bitmap to record that the text sample includes the first particular text component, wherein the first bitmap has a corresponding bit for each text sample in the set of text samples, andwherein the act of using the corresponding text sample identifier to identify the text sample that includes the second particular text component comprises an act of using a bit of a second bitmap to record that the text sample includes the second particular text component, wherein the second bitmap also has a corresponding bit for each text sample in the set of text samples.
  - 9. The method in accordance with claim 8, wherein the act of identifying a result of the search as including at least some of the text samples that include the first particular text component and the second particular text component comprises:
    - an act of performing a bit-wise logical operation on the first bit map and the second bit map to formulate a resulting bit map, wherein the resulting bitmap also has a corresponding bit for each text sample in the set of text samples.
  - 10. The method in accordance with claim 7, wherein the act of performing a search comprises an act of performing a search for a text sample that includes the first particular text component and a second particular text component, but which does not include a third particular text component,the act of scanning also performed in search of a third particular text component identifier that identifies the third particular text component;
    - whenever upon finding the third particular text component identifier during the act of scanning, an act of using the corresponding text sample identifier to identify the text sample that includes the third particular text component, the method further comprising;
      
      an act of identifying a result of the search as including at least some of the text samples that include the first particular text component and the second particular text component, but which does not include the third particular text component.
  - 11. The method in accordance with claim 10,wherein the act of using the corresponding text sample identifier to identify the text sample that includes the first particular text component comprises an act of using a bit of a first bitmap to record that the text sample includes the first particular text component, wherein the first bitmap has a corresponding bit for each text sample in the set of text samples,wherein the act of using the corresponding text sample identifier to identify the text sample that includes the second particular text component comprises an act of using a bit of a second bitmap to record that the text sample includes the second particular text component, wherein the second bitmap also has a corresponding bit for each text sample in the set of text samples, andwherein the act of using the corresponding text sample identifier to identify the text sample that includes the third particular text component comprises an act of using a bit of a third bitmap to record that the text sample includes the third particular text component, wherein the third bitmap also has a corresponding bit for each text sample in the set of text samples.
  - 12. The method in accordance with claim 11, wherein the act of identifying a result of the search as including at least some of the text samples that include the first particular text component and the second particular text component, but which does not include the third particular text component, comprises:
    - an act of performing a bit-wise logical operation on the first bit map, the second bit map and the third bit map to formulate a resulting bit map, wherein the resulting bitmap also has a corresponding bit for each text sample in the set of text samples.
  - 13. The method in accordance with claim 5, wherein the act of performing a search comprises an act of performing a search for a text sample that includes the first particular text component, but which also does not include a second particular text component,the act of scanning also performed in search of a second particular text component identifier that identifies the second particular text component;
    - whenever upon finding the second particular text component identifier during the act of scanning, an act of using the corresponding text sample identifier to identify the text sample that includes the second particular text component, the method further comprising;
      
      an act of identifying a result of the search as including at least some of the text samples that include the first particular text component, but which does not include the second particular text component.
  - 14. The method in accordance with claim 13,wherein the act of using the corresponding text sample identifier to identify the text sample that includes the first particular text component comprises an act of using a bit of a first bitmap to record that the text sample includes the first particular text component, wherein the first bitmap has a corresponding bit for each text sample in the set of text samples, andwherein the act of using the corresponding text sample identifier to identify the text sample that includes the second particular text component comprises an act of using a bit of a second bitmap to record that the text sample includes the second particular text component, wherein the second bitmap also has a corresponding bit for each text sample in the set of text samples.
  - 15. The method in accordance with claim 14, wherein the act of identifying a result of the search as including at least some of the text samples that include the first particular text component, but which also does not include the second particular text component comprises:
    - an act of performing a bit-wise logical operation on the first bit map and the second bit map to formulate a resulting bit map, wherein the resulting bitmap also has a corresponding bit for each text sample in the set of text samples.

16. A computer program product comprising one or more computer-readable storage media having thereon computer-executable instructions that are structured such that, when executed by one or more processors of a computing system, cause the computing system to perform a method for storing representations of a set of text samples, each having a corresponding text sample identifier, the method comprising:
- an act of creating a content identification table that includes text components and corresponding text component identifier pairings, that act of creating including for each of at least some of the set of text samples;
  
  an act of parsing a plurality of text components from the text sample; and
  
  for each of at least some of the parsed plurality of text components, an act of identifying the text component, the act of identifying the text component comprising;
  
  an act of determining if the text component is already correlated to a text component identifier, the text component identifier representing the content while being distinguished from the content;
  
  if the text component is already correlated to a text component identifier, assigning the text component identifier to the text component;
  
  if the text component is not already correlated to a text component identifier, assigning a new text component identifier to the text component as a text component and corresponding text component identifier pairing; and
  
  an act of creating a text component entry comprising a) the text sample identifier for the text sample from which the text component was parsed, and b) the assigned text component identifier;
  
  while creating the content identification table, refraining from creating a new entry in the content identification table for a text component that is encountered that is the same as another text component in the content identification table, and such that the content identification table omits duplicate entries of the text component and corresponding text component identifier pairing; and
  
  an act of creating a text component entry table by at least;
  
  creating a text sample entry group comprising a plurality of text component entries corresponding to text components parsed from the text sample, and such that the plurality of text component entries are sorted by sequence of the corresponding text component within the text sample; and
  
  an act of storing a plurality of text sample entry groups created by performance of the act of preparing the text sample for each of the at least some of the set of text samples.
- View Dependent Claims (17)
- - 17. The computer program product in accordance with claim 16, wherein the act of storing comprises an act of storing the plurality of text sample entry groups in a predetermined ordering by text sample identifier.

18. A computing system comprising:
- at least one processor; and
  
  one or more storage device having stored computer-executable instructions that are executable by the at least one processor to cause the computing system to implement the following;
  
  an act of accessing a set of text samples, each having a corresponding text sample identifier;
  
  for each of at least some of the set of text samples, an act of preparing the text sample, the act of preparing the text sample comprising;
  
  an act of parsing a plurality of text components from the text sample; and
  
  for each of at least some of the parsed plurality of text components, an act of identifying the text component, the act of identifying the text component comprising;
  
  an act of determining if the text component is already correlated to a text component identifier, the text component identifier representing the content while being distinguished from the content;
  
  if the text component is already correlated to a text component identifier, assigning the text component identifier to the text component;
  
  if the text component is not already correlated to a text component identifier, assigning a new text component identifier to the text component; and
  
  an act of creating a text component entry comprising a) the text sample identifier for the text sample from which the text component was parsed, and b) the assigned text component identifier;
  
  an act of creating a text sample entry group comprising a plurality of text component entries corresponding to text components parsed from the text sample, and such that the plurality of text component entries are sorted by sequence of the corresponding text component within the text sample;
  
  an act of storing a plurality of text sample entry groups created by performance of the act of preparing the text sample for each of the at least some of the set of text samples, the plurality of text sample entry groups being stored in a predetermined ordering by text sample identifier,an act of performing a search on the plurality of sample entry groups, wherein the act of performing a search comprisies;
  
  scanning through the text sample identifiers of the plurality of text sample entry groups in search of a first particular text component identifier that identifies a first particular text component and a second particular text component identifier that identifies a second particular text component, and upon finding the first and second particular text component identifiers during the act of scanning, an act of using text sample identifiers corresponding to the first and second particular text component identifiers to identify one or more text samples that include the first particular text component and to identify one or more text samples that include the second particular text component, andan act of identifying a result of the search, the result including at least some of the text samples that include the first particular text component and the second particular text component, or identifying a result of the search as including at least some of the text samples that include the first particular text component but which omit the second particular text component.
- View Dependent Claims (19, 20)
- - 19. The computing system of claim 18, wherein the result includes the said at least some of the text samples that include the first particular text component and the second particular text component.
  - 20. The computing system of claim 18, wherein the result includes said at least some of the text samples that include the first particular text component but which omit the second particular text component.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Inventors
Petculescu, Cristian, Dumitru, Marius, Paraschiv, Vasile, Netz, Amir, Sanders, Paul Jonathon
Primary Examiner(s)
Bullock, Joshua

Application Number

US14/066,505
Publication Number

US 20150120730A1
Time in Patent Office

1,162 Days
Field of Search

707/737
US Class Current

1/1
CPC Class Codes

G06F 16/319   Inverted lists

G06F 16/3331   Query processing

G06F 16/35   Clustering; Classification

Text sample entry group formulation

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

8 Citations

20 Claims

Specification

Use Cases

Quick Links

Others

Text sample entry group formulation

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

8 Citations

20 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others