Processing text with domain-specific spreading activation methods

US 9,477,655 B2
Filed: 11/25/2014
Issued: 10/25/2016
Est. Priority Date: 01/04/2007
Status: Active Grant

First Claim

Patent Images

1. One or more non-transitory electronic memory devices including computer instructions for performing a method comprising:

using a central processing unit (CPU) connected via a network to a remote storage device, to process text documents stored in said memory device;

identifying, using the CPU, one or more of a plurality of groups of characters of a text in the text document as corresponding to at least one of a plurality of known words;

using the CPU for creating a list of the identified known words;

querying a first database contained in a second memory device to obtain a set of one or more semantic concepts associated with each of the identified known words, the first database comprising associations between the plurality of known words and a plurality of semantic concepts;

annotating, using the CPU, the list of identified known words with the first set of semantic concepts associated with each identified known word;

querying a second database contained in a third memory device to obtain a set of one or more episodic concepts associated with the set of semantic concepts, the second database comprising associations between a plurality of episodic concepts and at least one of the plurality of known words and the plurality of semantic concepts, the plurality of episodic concepts being separate from the plurality of semantic concepts;

creating, using the CPU, a semantic network having a plurality of nodes corresponding to the first and second sets of semantic and episodic concepts and weighted links between the first and second sets of semantic and episodic concepts;

utilizing, using the CPU, spreading activation algorithms to refine the weighted links in the semantic network; and

selecting, using the CPU, at least one of the concepts from the sets of semantic and episodic concepts based upon an associated weight for the at least one node derived from the step of utilizing spreading activation.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method for performing natural language processing of free text using domain-specific spreading activation. Embodiments of the present invention ontologize free text using an algorithm based on neurocognitive theory by simulating human recognition, semantic, and episodic memory approaches. Embodiments of the invention may be used to process clinical text for assignment of billing codes, analyze suicide notes or legal discovery materials, and for processing other collections of text. Further, embodiments of the invention may be used to more effectively search large databases, such as a database containing a large number of medical publications.

34 Citations

View as Search Results

19 Claims

1. One or more non-transitory electronic memory devices including computer instructions for performing a method comprising:
- using a central processing unit (CPU) connected via a network to a remote storage device, to process text documents stored in said memory device;
  
  identifying, using the CPU, one or more of a plurality of groups of characters of a text in the text document as corresponding to at least one of a plurality of known words;
  
  using the CPU for creating a list of the identified known words;
  
  querying a first database contained in a second memory device to obtain a set of one or more semantic concepts associated with each of the identified known words, the first database comprising associations between the plurality of known words and a plurality of semantic concepts;
  
  annotating, using the CPU, the list of identified known words with the first set of semantic concepts associated with each identified known word;
  
  querying a second database contained in a third memory device to obtain a set of one or more episodic concepts associated with the set of semantic concepts, the second database comprising associations between a plurality of episodic concepts and at least one of the plurality of known words and the plurality of semantic concepts, the plurality of episodic concepts being separate from the plurality of semantic concepts;
  
  creating, using the CPU, a semantic network having a plurality of nodes corresponding to the first and second sets of semantic and episodic concepts and weighted links between the first and second sets of semantic and episodic concepts;
  
  utilizing, using the CPU, spreading activation algorithms to refine the weighted links in the semantic network; and
  
  selecting, using the CPU, at least one of the concepts from the sets of semantic and episodic concepts based upon an associated weight for the at least one node derived from the step of utilizing spreading activation.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The one or more non-transitory memory devices of claim 1, wherein the computer instructions are further configured to prepare the text prior to the identifying step, by including at least one of tagging parts of speech, replacing abbreviations with words, and correcting misspelled words.
  - 3. The one or more non-transitory memory devices of claim 1, wherein the computer instructions are further configured to provide an output including the selected at least one of the concepts.
  - 4. The one or more non-transitory memory devices of claim 1, wherein the text comprises clinical free text.
  - 5. The one or more non-transitory memory devices of claim 4, wherein the clinical free text comprises pediatric clinical free text.
  - 6. The one or more non-transitory memory devices of claim 1, wherein the text comprises a plurality of documents and the computer instructions are further configured to identify a subset of the plurality of documents by identifying at least two documents having associations with the selected at least one of the concepts.
  - 7. The one or more non-transitory memory devices of claim 6, wherein the computer instructions are further configured to produce an output, the output including identification of one or more portions of each of the at least two documents having associations with the selected at least one of the concepts.
  - 8. The one or more non-transitory memory devices of claim 1, wherein the text comprises at least one suicide note and the computer instructions are further configured to evaluate the suicide note for concepts indicative of suicidal intent.

9. One or more non-transitory electronic memory devices including computer instructions for performing method for performing a method for processing a text containing natural language, the method comprising:
- using a central processing unit (CPU) connected via a network to a remote storage device to process text documents stored in said memory device;
  
  tagging, using the CPU, parts of speech in the text;
  
  recognizing, using the CPU, known words in the text;
  
  creating, using the CPU, a semantic network, the semantic network including at least one of the recognized known words and at least one relationship with at least one semantic concept associated with at least one of the recognized known words; and
  
  supplementing the semantic network by iteratively adding additional concepts and additional relationships to the semantic network until a termination requirement is met, each additional concept being associated with at least a prior one of the concepts and additional concepts in the semantic network by a respective additional relationship, at least one of the additional concepts being an episodic concept separate from the at least one semantic concept.
- View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
- - 10. The one or more non-transitory memory devices of claim 9, wherein the computer instructions are further configured to perform the following steps:
    - weighting each of the at least one relationships and each of the additional relationships with a weighting value reflecting the strength of each relationship and additional relationship;
      
      determining a minimum threshold weighting value; and
      
      terminating the iterative growth of any network node in which the weighting between the relationships and the additional relationships do not satisfy the minimum threshold weighting value.
  - 11. The one or more non-transitory memory devices of claim 10, wherein the computer instructions are further configured to perform the following steps:
    - comparing the at least one semantic concept and the additional concepts to a list of known relevant concepts to generate a list of identified relevant concepts; and
      
      providing an output based on at least one of a number and a significance of the identified relevant concepts.
  - 12. The one or more non-transitory memory devices of claim 11, wherein the output pertains to a probability of a particular occurrence.
  - 13. The one or more non-transitory memory devices of claim 12, wherein the text includes at least one suicide note and the particular occurrence is a suicide attempt.
  - 14. The one or more non-transitory memory devices of claim 10, wherein the text includes a plurality of documents and the computer instructions are further configured to perform the following steps:
    - receiving a query including a search concept; and
      
      displaying a list of documents including one or more of the plurality of documents that is associated with the at least one semantic concept and the additional concepts that matches the search concept.
  - 15. The one or more non-transitory memory devices of claim 14, wherein the list of documents is sorted by the weighting value pertaining to at least one relationship or additional relationship between the search concept and the corresponding recognized known word.
  - 16. The one or more non-transitory memory devices of claim 15, wherein the one or more episodic concepts are uniquely associated with a patient'"'"'s prior clinical history.

17. One or more non-transitory electronic memory devices including computer instructions for performing a method for processing natural language, comprising:
- using a central processing unit (CPU) connected via a network to a remote storage device to process text documents stored in said memory;
  
  identifying, using the CPU, one or more of a plurality of groups of characters of a text as corresponding to at least one of a plurality of known words;
  
  creating, using the CPU, a list of the identified known words;
  
  querying one or more databases to obtain a first set of semantic concepts associated with each of the identified known words, the one or more databases including associations between a plurality of known words and a plurality of concepts, and including quantitative values representative of a strength of a relationship between the plurality of concepts;
  
  annotating, using the CPU, the list of identified known words with the first set of semantic concepts associated with each identified known word;
  
  creating, using the CPU, a semantic network having a plurality of nodes corresponding to the first set of semantic concepts;
  
  iteratively expanding the semantic network with additional concepts taken from the one or more databases and linked to respective nodes in the semantic network to iteratively add new nodes to the semantic network for such additional concepts, each new node including a weighted link with an existing node, the additional concepts being separate from the first set of semantic concepts and including at least one episodic concept; and
  
  selecting, using the CPU, at least one of the concepts from the combination of the first set of concepts and the additional concepts based upon a value of the weighted link included with the node associated with the at least one selected concept.
- View Dependent Claims (18, 19)
- - 18. The one or more non-transitory memory devices of claim 17, wherein the computer instructions are further configured to perform the step of repeating the iteratively expanding continuously until a termination requirement is met.
  - 19. The one or more non-transitory memory devices of claim 18, wherein the termination requirement is a value of a weighted link falling below a predefined threshold.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Children's Hospital & Medical Center
Original Assignee
Children's Hospital & Medical Center
Inventors
Duch, Wlodzislaw, Grupp-Phelan, Jacqueline M., Sorter, Michael, Pestian, John P., Matykiewicz, Pawel, Glauser, Tracy A., Kowatch, Robert A.
Primary Examiner(s)
ORTIZ SANCHEZ, MICHAEL

Application Number

US14/553,562
Publication Number

US 20150081280A1
Time in Patent Office

700 Days
Field of Search
US Class Current

1/1
CPC Class Codes

G06F 16/3344   using natural language anal...

G06F 40/117   Tagging; Marking up details...

G06F 40/205   Parsing

G06F 40/232   Orthographic correction, e....

G06F 40/237   Lexical tools

G06F 40/253   Grammatical analysis; Style...

G06F 40/30   Semantic analysis

G06F 40/40   Processing or translation o...

G06Q 10/10   Office automation; Time man...

G16H 40/20   for the management or admin...

G16H 50/70   for mining of medical data,...

G16H 70/60   relating to pathologies

Processing text with domain-specific spreading activation methods

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

34 Citations

19 Claims

Specification

Use Cases

Quick Links

Others

Processing text with domain-specific spreading activation methods

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

34 Citations

19 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others