×

System and method of prediction through the use of latent semantic indexing

  • US 10,224,119 B1
  • Filed: 09/23/2014
  • Issued: 03/05/2019
  • Est. Priority Date: 11/25/2013
  • Status: Active Grant
First Claim
Patent Images

1. An optimizing process for optimizing a predictive modeling method implemented on a computer for predicting patient outcomes and conditions from medical database records of a population of patients, said predictive modeling method comprising the steps of:

  • (a) providing medical database records of a population of patients, each patient ofsaid population of patients having corresponding outcome health score values;

    (b) processing said medical database records by using Natural Language Processing;

    (c) building a patient document corpus from said medical database records processedby using Natural Language Processing;

    (d) weighting terms in said corpus by assigning a weight to each term in the corpus,said weight representing said term'"'"'s frequency for a patient'"'"'s document withrespect to said term'"'"'s frequency across all documents in said corpus;

    (e) Constructing a high-dimensional and sparse term-by-document matrix from saidweighted terms, each said term in said corpus being represented as a vector acrosssaid population of patients;

    (f) performing matrix factorization of said term-by-document matrix using LatentSemantic Indexing to reduce the dimensionality of said term-by-document matrixinto a lower-dimensional matrix concept space;

    (g) querying said lower-dimensional matrix concept space using a term orcombination of terms to produce a single ranking of patients in said corpus usinga similarity score;

    (h) given a single threshold of said similarity score, combining multiple said singlerankings to re-rank said population of patients in said corpus based on relatednessto multiple queries;

    (i) optimizing said predictive modeling method through iterative variation of certainparameters to achieve a best precision fit, said certain parameters comprising;

    (1) the number of patients used for each said query of said multiple queries;

    (2) said threshold of said similarity score;

    (3) a frequency of association to query of said patients of said corpus;

    (4) a recall value of said patients returned by said query; and

    (5) a precision value of said patients returned by said query; and

    transmitting data associated with optimized re-ranked population of patients to one or more practitioners.

View all claims
  • 0 Assignments
Timeline View
Assignment View
    ×
    ×