Method and apparatus for configurable microplanning
First Claim
1. A computer-implemented method for transforming an input data stream comprising raw input data that is expressed in a non-linguistic format into a format that can be expressed linguistically in a textual output, the method comprising:
- receiving, using a processor, one or more rules written by a user via a user interface, wherein the user interface enables the user to define the one or more rules in at least a first format that is configured to hide linguistic complexities from the user;
converting, using the processor, the one or more rules in a first format into a set of lexicalization rules in a second format that is configured to be processed by a natural language generation system, wherein the set of lexicalization rules are specified using a microplanning rule specification language that comprises a set of message-level rules and a set of slot-level rules, wherein the microplanning rule specification language is configured to fill syntactic constituents using one or more message-level rules, and wherein message-level rules specify how an overall form of a phrase is to be constructed from message contents and slot-level rules specify how specific entities present in a message should be described or otherwise referred to;
generating, using the processor, at least one data structure based on the input data stream comprising raw input data that is expressed in a non-linguistic format, wherein the at least one data structure represents a phrase or a simple sentence and is created in an instance in which the input data stream comprises data that satisfies one or more predetermined requirements, wherein the one or more predetermined requirements are selected based in part on a domain associated with the textual output;
applying, using the natural language generation system operating on the processor, the set of lexicalization rules to the at least one data structure to generate a text specification, wherein the at least one data structure is generated based on the input data stream;
realizing, using the processor, the text specification to generate the textual output that linguistically describes at least a portion of the input data stream; and
outputting, using the processor, the textual output to the user interface.
4 Assignments
0 Petitions
Accused Products
Abstract
Methods, apparatuses, and computer program products are described herein that are configured to be embodied as a configurable microplanner. In some example embodiments, a method is provided that comprises accessing a document plan containing one or more messages. The method of this embodiment may also include generating a text specification containing one or more phrase specifications that correspond to the one or more messages in the document plan. The method of this embodiment may also include applying a set of lexicalization rules to each of the one or more messages to populate the one or more phrase specifications. In some example embodiments, the set of lexicalization rules are specified using a microplanning rule specification language that is configured to hide linguistic complexities from a user. In some example embodiments, genre parameters may also be used to specify constraints that provide default behaviors for the realization process.
331 Citations
22 Claims
-
1. A computer-implemented method for transforming an input data stream comprising raw input data that is expressed in a non-linguistic format into a format that can be expressed linguistically in a textual output, the method comprising:
-
receiving, using a processor, one or more rules written by a user via a user interface, wherein the user interface enables the user to define the one or more rules in at least a first format that is configured to hide linguistic complexities from the user; converting, using the processor, the one or more rules in a first format into a set of lexicalization rules in a second format that is configured to be processed by a natural language generation system, wherein the set of lexicalization rules are specified using a microplanning rule specification language that comprises a set of message-level rules and a set of slot-level rules, wherein the microplanning rule specification language is configured to fill syntactic constituents using one or more message-level rules, and wherein message-level rules specify how an overall form of a phrase is to be constructed from message contents and slot-level rules specify how specific entities present in a message should be described or otherwise referred to; generating, using the processor, at least one data structure based on the input data stream comprising raw input data that is expressed in a non-linguistic format, wherein the at least one data structure represents a phrase or a simple sentence and is created in an instance in which the input data stream comprises data that satisfies one or more predetermined requirements, wherein the one or more predetermined requirements are selected based in part on a domain associated with the textual output; applying, using the natural language generation system operating on the processor, the set of lexicalization rules to the at least one data structure to generate a text specification, wherein the at least one data structure is generated based on the input data stream; realizing, using the processor, the text specification to generate the textual output that linguistically describes at least a portion of the input data stream; and outputting, using the processor, the textual output to the user interface. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. An apparatus that is configured to transform an input data stream comprising raw input data that is expressed in a non-linguistic format into a format that can be expressed linguistically in a textual output, the apparatus comprising:
-
at least one processor; and at least one memory including computer program code, the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus to at least; receive one or more rules written by a user via a user interface, wherein the user interface enables the user to define the one or more rules in at least a first format that is configured to hide linguistic complexities from the user; convert the one or more rules in a first format into a set of lexicalization rules in a second format that is configured to be processed by a natural language generation system, wherein the set of lexicalization rules are specified using a microplanning rule specification language that comprises a set of message-level rules and a set of slot-level rules, wherein the microplanning rule specification language is configured to fill syntactic constituents using one or more message-level rules, and wherein message-level rules specify how an overall form of a phrase is to be constructed from message contents and slot-level rules specify how specific entities present in a message should be described or otherwise referred to; generate at least one data structure based on the input data stream comprising raw input data that is expressed in a non-linguistic format, wherein the at least one data structure represents a phrase or a simple sentence and is created in an instance in which the input data stream comprises data that satisfies one or more predetermined requirements, wherein the one or more predetermined requirements are selected based in part on a domain associated with the textual output; apply the set of lexicalization rules to the at least one data structure to generate a text specification in the natural language generation system, wherein the at least one data structure is generated based on the input data stream; realize the text specification to generate the textual output that linguistically describes at least a portion of the input data stream; and output the textual output to the user interface. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17)
-
-
18. An computer program product that is configured to transform an input data stream comprising raw input data that is expressed in a non-linguistic format into a format that can be expressed linguistically in a textual output, the computer program product comprising:
-
at least one computer readable non-transitory memory medium having program code instructions stored thereon, the program code instructions which when executed by an apparatus cause the apparatus at least to; receive one or more rules written by a user via a user interface, wherein the user interface enables the user to define the one or more rules in at least a first format that is configured to hide linguistic complexities from the user; convert the one or more rules in a first format into a set of lexicalization rules in a second format that is configured to be processed by a natural language generation system, wherein the set of lexicalization rules are specified using a microplanning rule specification language that comprises a set of message-level rules and a set of slot-level rules, wherein the microplanning rule specification language is configured to fill syntactic constituents using one or more message-level rules, and wherein message-level rules specify how an overall form of a phrase is to be constructed from message contents and slot-level rules specify how specific entities present in a message should be described or otherwise referred to; generate at least one data structure based on the input data stream comprising raw input data that is expressed in a non-linguistic format, wherein the at least one data structure represents a phrase or a simple sentence and is created in an instance in which the input data stream comprises data that satisfies one or more predetermined requirements, wherein the one or more predetermined requirements are selected based in part on a domain associated with the textual output; apply the set of lexicalization rules to the at least one data structure to generate a text specification in the natural language generation system, wherein the at least one data structure is generated based on the input data stream; realize the text specification to generate the textual output that linguistically describes at least a portion of the input data stream; and output the textual output to the user interface. - View Dependent Claims (19, 20, 21, 22)
-
Specification