Natural language processing for extracting conveyance graphs
First Claim
1. A method for extracting conveyance records from unstructured text documents, the method comprising:
- obtaining, with one or more processors, a plurality of scanned, optical-character-recognized (OCR) documents, each having OCR-produced English language text describing, in unstructured form, one or more conveyances of interest in real property, wherein each document is associated with metadata identifying a jurisdiction in which the respective real property is located;
determining, with one or more processors, for each of the documents, a respective jurisdiction based on the metadata;
selecting, with one or more processors, from a plurality of language processing models for the English language, a respective language processing model for each of the documents based on the respective determined jurisdiction,wherein a first language processing model is selected for at least some of the documents associated with a first jurisdiction and a second language processing model, different from the first language processing model, is selected for at least some of the documents associated with a second jurisdiction that is different from the first jurisdiction, andwherein each language processing model is configured to extract structured data from unstructured text, and wherein each language processing model is configured to detect different terminology used in different jurisdictions with different frequencies;
extracting, with one or more processors, for each of the documents, from the respective OCR-produced English language text describing, in unstructured form, one or more conveyances of interest in real property, a plurality of structured conveyance records from each of the plurality of documents by applying the language processing model selected for the respective document based on the jurisdiction associated with the document,wherein each extracted conveyance record identifies land in which an interest is conveyed by the respective document, identifies a grantor of the conveyance, identifies a grantee of the conveyance, and identifies the interest conveyed; and
storing, with one or more processors, the extracted, structured conveyance records in memory.
1 Assignment
0 Petitions
Accused Products
Abstract
Provided is a process for extracting conveyance records from unstructured text documents, the process including: obtaining, with one or more processors, a plurality of documents describing, in unstructured form, one or more conveyances of interest in real property; determining, with one or more processors, for each of the documents, a respective jurisdiction; selecting, with one or more processors, from a plurality of language processing models for the English language, a respective language processing model for each of the documents based on the respective determined jurisdiction; extracting, with one or more processors, for each of the documents, a plurality of structured conveyance records from each of the plurality of documents by applying the language processing model selected for the respective document based on the jurisdiction associated with the document; and storing, with one or more processors, the extracted, structured conveyance record in memory.
-
Citations
30 Claims
-
1. A method for extracting conveyance records from unstructured text documents, the method comprising:
-
obtaining, with one or more processors, a plurality of scanned, optical-character-recognized (OCR) documents, each having OCR-produced English language text describing, in unstructured form, one or more conveyances of interest in real property, wherein each document is associated with metadata identifying a jurisdiction in which the respective real property is located; determining, with one or more processors, for each of the documents, a respective jurisdiction based on the metadata; selecting, with one or more processors, from a plurality of language processing models for the English language, a respective language processing model for each of the documents based on the respective determined jurisdiction, wherein a first language processing model is selected for at least some of the documents associated with a first jurisdiction and a second language processing model, different from the first language processing model, is selected for at least some of the documents associated with a second jurisdiction that is different from the first jurisdiction, and wherein each language processing model is configured to extract structured data from unstructured text, and wherein each language processing model is configured to detect different terminology used in different jurisdictions with different frequencies; extracting, with one or more processors, for each of the documents, from the respective OCR-produced English language text describing, in unstructured form, one or more conveyances of interest in real property, a plurality of structured conveyance records from each of the plurality of documents by applying the language processing model selected for the respective document based on the jurisdiction associated with the document, wherein each extracted conveyance record identifies land in which an interest is conveyed by the respective document, identifies a grantor of the conveyance, identifies a grantee of the conveyance, and identifies the interest conveyed; and storing, with one or more processors, the extracted, structured conveyance records in memory. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
-
-
20. A system configured to extract conveyance records from unstructured text documents, the system comprising:
-
one or more computer processors; and storage media, storing machine-readable instructions that, when executed by at least some of the one or more processors, cause operations comprising; obtaining a plurality of scanned, optical-character-recognized (OCR) documents, each having OCR-produced English language text describing, in unstructured form, one or more conveyances of interest in real property, wherein each document is associated with metadata identifying a jurisdiction in which the respective real property is located; determining, for each of the documents, a respective jurisdiction based on the metadata; selecting from a plurality of language processing models for the English language, a respective language processing model for each of the documents based on the respective determined jurisdiction, wherein a first language processing model is selected for at least some of the documents associated with a first jurisdiction and a second language processing model, different from the first language processing model, is selected for at least some of the documents associated with a second jurisdiction that is different from the first jurisdiction, and wherein each language processing model is configured to extract structured data from unstructured text, and wherein each language processing model is configured to detect different terminology used in different jurisdictions with different frequencies; extracting, for each of the documents, from the respective OCR-produced English language text describing, in unstructured form, one or more conveyances of interest in real property, a plurality of structured conveyance records from each of the plurality of documents by applying the language processing model selected for the respective document based on the jurisdiction associated with the document, wherein each extracted conveyance record identifies a plot of land in which an interest is conveyed by the respective document, identifies a grantor of the conveyance, identifies an grantee of the conveyance, and identifies the interest conveyed; and storing the extracted structured conveyance records in memory. - View Dependent Claims (21, 22, 23, 24, 25, 26, 27, 28, 29, 30)
-
Specification