Method and system for searching words in documents written in a source language as transcript of words in an origin language

US 10,042,843 B2
Filed: 06/10/2015
Issued: 08/07/2018
Est. Priority Date: 06/15/2014
Status: Active Grant

First Claim

Patent Images

1. Computer implemented method for searching words in documents written in a source language, words which are not meaningful in said source language, but are transcript of meaningful words in an origin language, the method is comprised of two processes:

a) preparation process executed for each new document, the preparation process is comprised of the following steps;

i) reading the document;

ii) extracting unrecognized words in the source language;

iii) updating search indexes in the corpus for all document words;

iv) for each new unrecognized word in the source language;

1) removing prefixes and suffixes;

2) performing phonetic conversion;

3) checking frequency of unrecognized word spelling in System Hebraized Medical Lexicon (SHML);

4) defining the most frequent spelling of the unrecognized word as the central term and connect it to other allowable and close spellings of that term;

5) updating System Hebraized Medical Lexicon (SHML);

b) search process which is comprised of the following steps;

i) reading the search request and perform auto-complete for terms from the System Hebraized Medical Lexicon (SHML);

ii) generating phonetic conversion for all words in query;

iii) for each word;

1) searching for similar phonetics in the corpus and find central terms;

2) calculating the distance to the found similar words and order them in ascending order; and

3) displaying relevant documents according to the distance.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The invention relates to a method used by computers for searching words in documents written in a source language, which are not in the vocabulary of said source language, but are transcript of meaningful words in an origin language. The method is comprised of a preparation process and a search process. During the preparation process a database of unrecognized words in the source language is maintained, which contains, among other data, normalized phonetic conversion of the unrecognized word, as well as a corpus of all words of the documents in the search domain and indexes for efficient search. During search, a phonetic conversion and normalization is done for the search word, and the distance to similar phonetics words in the corpus is calculated. The found words in the corpus are arranged in ascending order, and the relevant documents are displayed.

8 Citations

4 Claims

1. Computer implemented method for searching words in documents written in a source language, words which are not meaningful in said source language, but are transcript of meaningful words in an origin language, the method is comprised of two processes:
- a) preparation process executed for each new document, the preparation process is comprised of the following steps;
  
  i) reading the document;
  
  ii) extracting unrecognized words in the source language;
  
  iii) updating search indexes in the corpus for all document words;
  
  iv) for each new unrecognized word in the source language;
  
  1) removing prefixes and suffixes;
  
  2) performing phonetic conversion;
  
  3) checking frequency of unrecognized word spelling in System Hebraized Medical Lexicon (SHML);
  
  4) defining the most frequent spelling of the unrecognized word as the central term and connect it to other allowable and close spellings of that term;
  
  5) updating System Hebraized Medical Lexicon (SHML);
  
  b) search process which is comprised of the following steps;
  
  i) reading the search request and perform auto-complete for terms from the System Hebraized Medical Lexicon (SHML);
  
  ii) generating phonetic conversion for all words in query;
  
  iii) for each word;
  
  1) searching for similar phonetics in the corpus and find central terms;
  
  2) calculating the distance to the found similar words and order them in ascending order; and
  
  3) displaying relevant documents according to the distance.
- View Dependent Claims (2, 3, 4)
- - 2. The method according to claim 1, where the source language is Hebrew, the origin language is either English or Latin.
  - 3. The method according to claim 1, where the unknown words in the source language are medical terms in the origin language.
  - 4. The method according to claim 1, where the words in the input query goes through an autocomplete procedure from the System Hebraized Medical Lexicon (SHML).

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Milagro-Ai Care Limited
Original Assignee
Opisoft Care Ltd.
Inventors
Alter, Alon, Tozhovez, Oksana
Primary Examiner(s)
ALBERTALLI, BRIAN LOUIS

Application Number

US15/317,767
Publication Number

US 20170116175A1
Time in Patent Office

1,154 Days
Field of Search
US Class Current
CPC Class Codes

G06F 16/31   Indexing; Data structures t...

G06F 16/316   Indexing structures

G06F 16/3343   using phonetics

G06F 16/338   Presentation of query results

G06F 16/93   Document management systems

G06F 40/129   Handling non-Latin characte...

G06F 40/242   Dictionaries

G06F 40/268   Morphological analysis

G06F 40/274   Converting codes to words; ...

G06F 40/284   Lexical analysis, e.g. toke...

G06F 40/40   Processing or translation o...

Method and system for searching words in documents written in a source language as transcript of words in an origin language

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

8 Citations

4 Claims

Specification

Solutions

Use Cases

Quick Links

Method and system for searching words in documents written in a source language as transcript of words in an origin language

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

8 Citations

4 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links