Document processing device, document processing method, and storage medium recording program therefor
First Claim
1. A document processing device comprising:
- a memory that stores syntax data which expresses syntax of character strings whose probability of being a title of a document is high or character strings whose probability of being a title of a document is low;
an input unit that inputs document data obtained by digitizing a document;
an extraction unit that analyzes document data input by the input unit and extracts character string data which expresses character strings;
syntax analyzing unit that analyzes the character string data extracted by the extraction unit and specifies the syntax of each character string contained in the document corresponding to the document data; and
specifying unit that specifies, from among the character string data extracted by the extraction unit, character string data that expresses a title of the document corresponding to the document data, based on results of specification by the syntax analyzing unit and content stored in the memory.
1 Assignment
0 Petitions
Accused Products
Abstract
The invention provides a document processing device including: a memory that stores syntax data expressing syntax of character strings whose probability of being a title of a document is high or-character strings whose probability of being a title of a document is low; an input unit that inputs document data obtained by digitizing a document; an extraction unit that analyzes the input document data and extracts character string data expressing character strings; a syntax analyzing unit that analyzes the extracted character string data and specifies the syntax of each character string contained in the document corresponding to the document data; and a specifying unit that specifies, from among the extracted character string data, character string data expressing a title of the document corresponding to the document data, based on results of specification by the syntax analyzing unit and content stored in the memory.
20 Citations
11 Claims
-
1. A document processing device comprising:
-
a memory that stores syntax data which expresses syntax of character strings whose probability of being a title of a document is high or character strings whose probability of being a title of a document is low;
an input unit that inputs document data obtained by digitizing a document;
an extraction unit that analyzes document data input by the input unit and extracts character string data which expresses character strings;
syntax analyzing unit that analyzes the character string data extracted by the extraction unit and specifies the syntax of each character string contained in the document corresponding to the document data; and
specifying unit that specifies, from among the character string data extracted by the extraction unit, character string data that expresses a title of the document corresponding to the document data, based on results of specification by the syntax analyzing unit and content stored in the memory. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A document processing method comprising:
-
storing in a memory, syntax data which expresses syntax of character strings whose probability of being a title of a document is high or character strings whose probability of being a title of a document is low;
inputting document data obtained by digitizing a document;
extracting character string data which expresses character strings by analyzing the input document data;
specifying a syntax of each character string contained in the document corresponding to the document data by analyzing the extracted character string data; and
specifying, from among the extracted character string data, character string data that expresses a title of the document corresponding to the document data, based on a result of the specification and content stored in the memory. - View Dependent Claims (7, 8, 9, 10)
-
-
11. A computer-readable storage medium recording a program for causing a computer to function as:
-
extraction means that, when document data obtained by digitizing a document is input, analyzes the document data and extracts character string data expressing character strings;
syntax analysis means for analyzing the character string data extracted by the extraction means and specifying the syntax of each character string contained in the document corresponding to the document data; and
specifying means for specifying, from among the character string data extracted by the extraction means, character string data that expresses a title of the document corresponding to the document data, based on results of specification by the syntax analysis means and syntax data stored in advance in the computer as data expressing the syntax of character strings whose probability of being a title of a document is high or character strings whose probability of being a title of a document is low.
-
Specification