ONLINE LEARNING OF MODEL PARAMETERS

0Associated
Cases 
0Associated
Defendants 
0Accused
Products 
0Forward
Citations 
0
Petitions 
1
Assignment
First Claim
1. A computerimplemented method comprising:
 obtaining a first target value in a target sequence and a feature vector corresponding to the first target value, the feature vector including a plurality of elements;
modifying the feature vector to obtain a modified feature vector by reducing an absolute value of at least one element of the feature vector;
generating an inverse Hessian matrix recursively from a previous inverse Hessian matrix using at least the feature vector and the modified feature vector; and
updating parameters of a model using the inverse Hessian matrix.
1 Assignment
0 Petitions
Accused Products
Abstract
Online learning of model parameters is performed by obtaining a first target value in a target sequence and a feature vector corresponding to the first target value. The feature vector includes a plurality of elements. The feature vector can be modified to obtain a modified feature vector by reducing an absolute value of at least one element of the feature vector. An inverse Hessian matrix can be generated recursively from a previous inverse Hessian matrix using at least the feature vector and the modified feature vector. Parameters of a model can be updated using the inverse Hessian matrix.
0 Citations
No References
No References
20 Claims
 1. A computerimplemented method comprising:
obtaining a first target value in a target sequence and a feature vector corresponding to the first target value, the feature vector including a plurality of elements; modifying the feature vector to obtain a modified feature vector by reducing an absolute value of at least one element of the feature vector; generating an inverse Hessian matrix recursively from a previous inverse Hessian matrix using at least the feature vector and the modified feature vector; and updating parameters of a model using the inverse Hessian matrix.  View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
 11. A computer program product including one or more computer readable storage mediums collectively storing program instructions that are executable by a processor or programmable circuitry to cause the processor or programmable circuitry to perform operations comprising:
obtaining a first target value in a target sequence and a feature vector corresponding to the first target value, the feature vector including a plurality of elements; modifying the feature vector to obtain a modified feature vector by reducing an absolute value of at least one element of the feature vector; generating an inverse Hessian matrix recursively from a previous inverse Hessian matrix using at least the feature vector and the modified feature vector; and updating parameters of a model using the inverse Hessian matrix.  View Dependent Claims (12, 13, 14, 15)
 16. An apparatus comprising:
a processor or a programmable circuitry; and one or more computer readable mediums collectively including instructions that, when executed by the processor or the programmable circuitry, cause the processor or the programmable circuitry to; obtain a first target value in a target sequence and a feature vector corresponding to the first target value, the feature vector including a plurality of elements; modify the feature vector to obtain a modified feature vector by reducing an absolute value of at least one element of the feature vector; generate an inverse Hessian matrix recursively from a previous inverse Hessian matrix using at least the feature vector and the modified feature vector; and update parameters of a model using the inverse Hessian matrix.  View Dependent Claims (17, 18, 19, 20)
1 Specification
The present invention relates to online learning of model parameters. More specifically, the present invention relates to an improvement of incremental learning of model parameters.
Learning a data or pattern sequence (e.g., timeseries data or numerical sequence) is frequently used for forecasting and anomaly detection in a variety of fields (e.g., predicting a stock price, finding a potential problem of a vehicle). Such pattern sequences are usually nonstationary, and thus, it is required to adopt online learning that will continuously update parameters of a prediction model while receiving new patterns. The accuracy of a prediction model can be improved if the prediction model is trained using available historical data of the pattern sequence every time a new pattern is observed. However, the computational cost can be prohibitively high in practical applications by using even the optimal currently available learning techniques to train the prediction model using the available historical data every time a new pattern is observed. Therefore, it is desired to improve the accuracy of incremental learning of a prediction model without repeating the training processes for each pattern in the available historical data
According to an embodiment of the present invention, a computerimplemented method is provided that includes obtaining a first target value in a target sequence and a feature vector corresponding to the first target value, the feature vector including a plurality of elements, modifying the feature vector to obtain a modified feature vector by reducing an absolute value of at least one element of the feature vector, generating an inverse Hessian matrix recursively from a previous inverse Hessian matrix using at least the feature vector and the modified feature vector, and updating parameters of a model using the inverse Hessian matrix.
According to another embodiment of the present invention, a computer program product is provided that includes one or more computer readable storage mediums collectively storing program instructions that are executable by a processor or programmable circuitry to cause the processor or programmable circuitry to perform operations including obtaining a first target value in a target sequence and a feature vector corresponding to the first target value, the feature vector including a plurality of elements, modifying the feature vector to obtain a modified feature vector by reducing an absolute value of at least one element of the feature vector, generating an inverse Hessian matrix recursively from a previous inverse Hessian matrix using at least the feature vector and the modified feature vector, and updating parameters of a model using the inverse Hessian matrix.
According to another embodiment of the present invention, an apparatus is provided that includes a processor or a programmable circuitry, and one or more computer readable mediums collectively including instructions that, when executed by the processor or the programmable circuitry, cause the processor or the programmable circuitry to obtain a first target value in a target sequence and a feature vector corresponding to the first target value, the feature vector including a plurality of elements, modify the feature vector to obtain a modified feature vector by reducing an absolute value of at least one element of the feature vector, generate an inverse Hessian matrix recursively from a previous inverse Hessian matrix using at least the feature vector and the modified feature vector, and update parameters of a model using the inverse Hessian matrix.
The summary clause does not necessarily describe all necessary features of the embodiments of the present invention. The present invention can also be a subcombination of the features described above.
The following description will provide details of preferred embodiments with reference to the following figures wherein:
Machine learning has become a basic function of many computer systems, such that machine learning and neural network optimized processors have been developed. However, both general purpose and machinelearningspecific computer systems require extensive training in order to be effective. Since the initial training provided to a machine learning system cannot lake all potential situations or anomalies into consideration, the training is often ongoing even during deployment of the model. Consequently, the machine learning system requires continuous updating (e.g., training) of the model parameters as new data is made available. Conventionally, such updating can often be time and resource consuming for the computer system, and thus, can become impractical once the model is deployed. The present invention provides computerimplemented methods, systems and program products that improve machine learning functionality in computer systems by implementing methods and systems for incrementally updating model parameters based on previous training, thus reducing time and processor resource requirements for maintaining uptodate model parameters reflecting current data.
Hereinafter, example embodiments of the present invention will be described. The example embodiments shall not limit the invention according to the claims, and the combinations of the features described in the embodiments are not necessarily essential to the invention.
In this embodiment, the target sequence relates or is expected to relate to an input sequence x_{0}, x_{1}, . . . , x_{t}. In this case, apparatus 100 learns the relationship between the input sequence and the target sequence. Each input pattern (or input values) x_{t }of the input sequence can be represented as a vector including N values (e.g., vector x_{t}=(x_{0}^{[t]}, x_{1}^{[t]}, . . . , x_{N}^{[t]})).
In this embodiment, apparatus 100 can obtain input pattern x_{t }and target value y_{t }at each time t, and predicts future (or succeeding) target value y_{t+1 }by calculating predicted target value ŷ_{t+1 }based on input patterns, model parameters, and other internal data received or generated before time t+1. Therefore, as shown in
In other embodiments, the target sequence may not relate or is not expected to relate to any input sequences. In such embodiments, apparatus 100 may not receive any input sequences for predicting future target value ŷ.
In this embodiment, the prediction model can be a linear model using a feature vector ϕ_{t}. In the linear model, a predicted target value ŷ_{t+1 }can be calculated based on a weighted sum of elements of the feature vector for the predicted target value (e.g., feature vector ϕ_{t }for time t corresponding to the time t+1 under prediction). For example, apparatus 100 can predict a target value at time t+1 by using an inner product of parameter vector at time t (e.g., θ_{t}) and a feature vector at time t (e.g., ϕ_{t}) as shown in the following expression (1).
ŷ_{t+1}=θ_{t}^{T }ϕ_{t} (1)
Feature vector ϕ_{t }can include a plurality of features (ϕ_{0}^{[t]}, ϕ_{1}^{[t]}, . . . , ϕ_{K1}^{[t]}) as vector elements. Each feature can be a function of at least one input pattern at time t or before time t, a function of time t, a function of the at least one input pattern and time t, or a constant. The following expression (2) shows an example of a feature vector ϕ_{t}.
In expression (2), the first element, ϕ_{0}^{[t]}, is a constant, 1, the second and third elements are functions of an input pattern, and the fourth and fifth elements are functions of two or more input patterns. Feature vector ϕ_{t }can also include a function of time t
or a function of at least one input pattern and time t.
Although model parameters are updated in online training in this embodiment, the model parameters are constant values of the model at each time t. In this embodiment, a constant element is multiplied by a constant model parameter and becomes a constant term or a portion of the constant term of the model function shown in expression (1). The constant term of the model function is also referred to as an intercept of the model, and an element of the feature vector that is a constant is also referred to as an intercept component of the model.
In this embodiment, these functions of the feature vector can be predetermined or prefixed before starting the online learning. The provider or the user of apparatus 100 can define the mathematical form of feature vector ϕ_{t }that is expected to achieve higher accuracy in predicting the target sequence depending on the practical application of apparatus 100. Since feature vector ϕ_{t }can be determined before receiving the target sequence, and the target sequence can be predicted by combining the feature functions ϕ_{0}(t, x_{0}, x_{1}, . . . , x_{t}), ϕ_{1}(t, x_{0}, x_{1}, . . . , x_{t}), . . . , ϕ_{K1}(t, x_{0}, x_{1}, . . . , x_{t}), it can be preferred to include a variety of feature functions in the feature vector ϕ_{t}.
An optimal goal for training a prediction model of apparatus 100 is to minimize a weighted mean squared error between the target sequence and the predicted target sequence as shown in the following expression (3). In expression (3), γ is a forgetting factor, and θ_{t+1 }is a parameter vector including trainable model parameters of the prediction model at time t+1.
Using expression (1), Expression (3) can be transformed into the following expression (4).
The formula in the square bracket in expression (4) is minimized when the partial derivative of the formula with respect to every element of parameter vector θ equals 0, as shown in the following expression (5).
Expression (5) can be transformed into the following expression (6).
Expression (6) can be further transformed into the following expression (7).
The matrix generated from a direct product of feature vector and feature vector on the left side of expression (7) can be regarded as a Hessian matrix H_{t+1}. The right side of expression (7) is based on a product of the target value and the feature vector at each time. By replacing the right side of expression (7) with a first vector h_{t+1}, expression (7) can be transformed into expression (8).
H_{t+1}θ=h_{t+1} (8)
Therefore, the parameter vector shown in expression (4) can be optimally calculated by expression (9), where H_{t+1}^{−1 }is an inverse Hessian matrix.
θ_{t+1}=H_{t+1}^{−1 }h_{t+1} (9)
However, to avoid overfitting, it can also be desirable to keep the model parameters smaller, or even as small as possible as long as the weighted mean squared error is also kept small. Therefore, it can be preferable to use expression (10) instead of using expression (3).
In expression (10), “Reg_term(θ)” is a regularization term having a smaller value if the absolute values of the model parameters become smaller. Apparatus 100 can perform online training of a prediction model that can incrementally update model parameters θ_{t+1 }without repeating, at each time t, training processes for each pair of a target value y_{t+1−d }and a predicted target value ŷ_{t+1−d }for all times d.
The operational flow of
At block S200, apparatus 100 obtains a first target value for time t+1 in the target sequence (e.g., target value y_{t+1}). In one implementation, apparatus 100 observes target value y_{t+1 }from one or more sensors, computers or other devices generating or receiving target value y_{t+1}. In other implementations, apparatus 100 reads target value y_{t+1 }from a memory or a storage storing the target sequence.
At block S210, apparatus 100 obtains a feature vector corresponding to the first target value (e.g., ϕ_{t }which is used at the same iteration of the operational flow). In one implementation, apparatus 100 receives or observes the newest input pattern corresponding to the first target value (e.g., x_{t}). Apparatus 100 has predetermined feature functions ϕ_{0}(t, x_{0}, x_{1}, . . . , x_{t}), ϕ_{1}(t, x_{0}, x_{1}, . . . , x_{t}), . . . , ϕ_{K1}(t, x_{0}, x_{1}, . . . , x_{t}), and calculates feature vector ϕ_{t }by calculating the predetermined feature functions based on the current time t and the current input sequence. In another implementation, apparatus 100 receives feature vector ϕ_{t }from one or more sequence generators, or other apparatuses outside of apparatus 100. In other embodiments, apparatus 100 has predetermined feature functions which are not based on the input sequence. In this case, apparatus 100 can calculate the predetermined feature functions without receiving or observing input sequences.
At block S220, apparatus 100 calculates a first vector (e.g., h_{t+1}) recursively from previous first vector h_{t }using first target value y_{t+1 }and feature vector ϕ_{t}. From the right side of expressions (7) and (8), the first vector can be updated or calculated from the previous first vector by multiplying the previous first vector by a forgetting factor and adding a product of the target value and the feature vector, as shown in the following expression (11). The first vector can be modified by, for example, adding other term that may not change the value of the first vector significantly.
h_{t+1}←γh_{t}+y_{t+1 }ϕ_{t} (11)
At block S230, apparatus 100 modifies feature vector ϕ_{t }to obtain a modified feature vector (e.g., {circumflex over (ϕ)}_{t}) by reducing an absolute value of at least one element of the feature vector. In this embodiment, apparatus 100 changes the at least one element of the feature vector to 0. This at least one element of the feature vector can be at least one intercept component of the model, also referred to as a constant feature. In another embodiment, apparatus 100 reduces an absolute value of at least one element of the feature vector by, for example, multiplying the at least one element of the feature vector by a reducing factor between 0 to 1 to obtain the modified feature vector. Modified feature vector {circumflex over (ϕ)}_{t }is used for implementing the regularization shown in expression (10) in the incremental learning of apparatus 100.
At block S240, apparatus 100 generates an inverse Hessian matrix (e.g., H_{t+1}^{−1}) recursively from a previous inverse Hessian matrix (e.g., H_{t}^{−1}) using at least the feature vector (e.g., ϕ_{t}) and the modified feature vector (e.g., {circumflex over (ϕ)}_{t}). In this embodiment, apparatus 100 calculates a temporal inverse Hessian matrix H′_{t+1}^{−1 }from the previous inverse Hessian matrix by using the feature vector. From the left side of expressions (7) and (8), apparatus 100 calculates the temporal inverse Hessian matrix as shown in the following expression (12).
To implement the regularization shown in expression (10), apparatus 100 calculates the inverse Hessian matrix from the temporal inverse Hessian matrix and the previous inverse Hessian matrix by using the modified feature vector as shown in the following expression (13), where λ is a weight for the regularization term.
At block S250, apparatus 100 updates parameters of the prediction model (e.g., θ_{t+1}) using the inverse Hessian matrix calculated in block S240. In this embodiment, apparatus 100 also uses the first vector to update the parameters. Apparatus 100 can calculate the updated parameters by multiplying the inverse Hessian matrix and the first vector, as shown in expression (9).
At block S260, apparatus 100 obtains a feature vector corresponding to a future target value (e.g., a feature vector ϕ_{t+1 }for calculating next target value y_{t+2}). Apparatus 100 can execute this step in the manner as described relating to block S210.
At block S270, apparatus 100 predicts the future target value in the target sequence by using the updated parameters and a feature vector corresponding to the future target value. In this embodiment, apparatus 100 predicts the future target value by calculating an inner product of the updated parameters and the feature vector corresponding to the future target value. Apparatus 100 can use the following expression (14) to calculate predicted target value ŷ_{t+2}.
ŷ_{t+2}=θ_{t+1}^{T }ϕ_{t+1} (14)
In this embodiment, apparatus 100 can calculate all parameters including first vector h_{t+1 }and inverse Hessian matrix H_{t+1}^{−1 }incrementally or recursively based on previous values, such as h_{t }and H_{t}^{−1}, without repeating the training processes for each pattern in the available historical data at each time step. Furthermore, apparatus 100 can update model parameters to decrease or minimize the weighted mean square error between the target sequence and the predicted target sequence by adopting expression (10).
By using the modified feature vector in generating the inverse Hessian matrix, apparatus 100 implements the regularization term in expression (10). More specifically, apparatus 100 uses the following expression (15) instead of using expression (3), where λ is a weight for the regularization term as shown in expression (13).
The second term in the square bracket in expression (15) is an L2 regularization term which gives decreasing effect to the model parameters. By zeroing or reducing an absolute value of at least one element of the feature vector, factors relating to the at least one element are not subtracted from temporal inverse Hessian matrix H′_{t+1}^{−1 }in expression (13), but factors relating to the other elements of the feature vector are subtracted from temporal inverse Hessian matrix H′_{t+1}^{−1}. Because the inverse Hessian matrix is reduced with respect to elements of the feature vector other than the at least one element, model parameters θ_{t+1 }are decreased with respect to elements of the feature vector other than the at least one element. The inverse Hessian matrix is not reduced with respect to the at least one element because a constant or an intercept of the predicted target sequence is preferable not to be reduced or changed from expression (3).
In other embodiments, apparatus 100 can reduce an absolute value of at least one intercept of at least one feature function ϕ_{k}(t, x_{0}, x_{1}, . . . , x_{t}), or change the at least one intercept of the at least one feature function to 0 at block S230. Apparatus 100 can reduce or change the intercept of every feature functions ϕ_{k}(t, x_{0}, x_{1}, . . . , x_{t}) for k=0, . . . , K−1.
In other embodiments, apparatus 100 can reduce or change at least one element of the feature vector at block S230 in order to keep factors relating to at least one important feature, which may not be an intercept component of the model, not subtracted from the inverse Hessian matrix at block S240.
As shown in expressions (7) and (8), the Hessian matrix is defined as shown in the following expression (16).
In the beginning or early timing near t=0, there is a possibility that the Hessian matrix and the inverse Hessian matrix will fluctuate greatly. In this situation, each matrix element of the Hessian matrix becomes a very small value, and then some matrix elements of the inverse Hessian matrix become very large value. Therefore, it can be difficult or not practical to update inverse Hessian matrix H_{t+1}^{−1 }from previous inverse Hessian matrix H_{t}^{−1}. In this situation, apparatus 100 can adopt the operations of
At block S300, apparatus 100 determines whether it is the initial generation time for the Hessian matrix (e.g., time t=0). If it is the initial generation time for the Hessian matrix, then apparatus 100 initializes Hessian matrix H_{0 }at block S310. From the definition shown in expression (16), apparatus 100 can initialize Hessian matrix H_{0 }as a zero matrix.
At block S320, apparatus 100 determines whether the current time is before threshold time T_{th}. If the current time is before the threshold time, apparatus 100 generates, at block S330, Hessian matrix H_{t+1 }recursively from previous Hessian matrix H_{t}. From the definition of expression (16), apparatus 100 can generate the Hessian matrix by using the following expression (17).
H_{t+1}=γH_{t}+ϕ_{t}ϕ_{t}^{T} (17)
In expression (17), Hessian matrix H_{t+1 }is calculated by multiplying previous Hessian matrix H_{t }by forgetting factor γ and adding feature vector ϕ_{t }multiplied by a transpose of feature vector ϕ_{t }(e.g., a direct product of feature vector ϕ_{t }and feature vector ϕ_{t}). Expression (12) is an expression for calculating an inverse matrix of Hessian matrix H_{t+1 }shown in expression (17).
To implement the L2 regularization of expression (15) also in the early timing, apparatus 100 can generate the Hessian matrix by using the following expression (18) instead of the expression (17).
H_{t+1}=γH_{t}+ϕ_{t}ϕ_{t}^{T}+λ{circumflex over (ϕ)}_{t}^{T}H_{t}{circumflex over (ϕ)}_{t} (18)
In expression (18), modified feature vector {circumflex over (ϕ)}_{t }multiplied by previous Hessian matrix H_{t}, a transpose of modified feature vector ϕ_{t}^{T}, and weight λ is further added to the Hessian matrix of expression (17). Expression (13) is an expression for calculating an inverse matrix of Hessian matrix H_{t+1 }of expression (18). In other embodiments, expression (18) can be modified by, for example, adding other terms or modifying the third term for modifying the regularization term of expression (10). Expression (13) is also modified for calculating an inverse matrix of Hessian matrix H_{t+1 }of modified expression (18).
At block S340, apparatus 100 inverts Hessian matrix H_{t+1 }to obtain inverse Hessian matrix H_{t+1}^{−1}.
If the current time is at or after the threshold time at block S320, apparatus 100 generates, at block S350, inverse Hessian matrix H_{t+1}^{−1 }from previous inverse Hessian matrix H_{t}^{−1 }as shown in
In this embodiment, apparatus 100 can generate the inverse Hessian matrix from the Hessian matrix and can avoid the difficulty of generating the inverse Hessian matrix from the previous inverse Hessian matrix. In other embodiments, apparatus 100 can skip block S340 until the current time becomes T_{th }and perform block S340 at time t=T_{th}. In these embodiments, apparatus 100 may not execute block S250, block S260, and block S270 before T_{th}, and then apparatus 100 may not predict the target values at early timings. This is acceptable because the accuracy of prediction at early timings is very low for most applications.
Apparatus 400 includes obtaining section 410, calculating section 420, modifying section 430, generating section 440, updating section 450, and predicting section 460. At each time t, obtaining section 410 obtains input pattern x_{t }and target value y_{t}. In this embodiment, obtaining section 410 performs operations of block S200 in
Obtaining section 410 stores a feature function for each feature of feature vector ϕ_{t}. For updating model parameters θ for time t+1, obtaining section 410 obtains feature vector ϕ_{t }by calculating each feature in ϕ_{t }based on predetermined functions which can input time t and/or the input sequence at or before time t. In this embodiment, obtaining section 410 performs operations of block S210 in
For prediction at time t+1, obtaining section 410 obtains feature vector ϕ_{t+1 }by calculating each feature in ϕ_{t+1}. In this embodiment, obtaining section 410 performs operations of block S260 in
Calculating section 420 is connected to obtaining section 410. Calculating section 420 stores first vector h_{t }and forgetting factor γ. At time t+1, calculating section 420 receives target value y_{t+1 }and feature vector ϕ_{t }from obtaining section 410, and calculates first vector h_{t+1 }for time t+1 recursively from a previous first vector h_{t }based on first target value y_{t+1 }and the feature vector ϕ_{t}. In this embodiment, calculating section 420 performs the operations of block S220 in
Modifying section 430 is connected to obtaining section 410. At time t+1, modifying section 430 receives feature vector ϕ_{t }from obtaining section 410, and modifies the feature vector to obtain a modified feature vector {circumflex over (ϕ)}_{t}. In this embodiment, modifying section 430 performs the operations of block S230 in
Generating section 440 is connected to obtaining section 410 and modifying section 430. Generating section 440 stores inverse Hessian matrix H_{t}^{−1}, forgetting factor γ, and weight λ. Generating section 440 can store Hessian matrix H_{t }if generating section 440 performs block S310, block S330, and block S340 in
Updating section 450 is connected to calculating section 420 and generating section 440. At time t+1, updating section 450 receives first vector h_{t+1 }from calculating section 420 and inverse Hessian matrix H_{t+1}^{−1 }from generating section 440 and updates model parameters θ_{t+1 }based on first vector h_{t+1 }and inverse Hessian matrix H_{t+1}^{−1}. In this embodiment, updating section 450 performs the operations of block S250 in
Predicting section 460 is connected to obtaining section 410 and updating section 450. At time t+1, predicting section 460 receives feature vector ϕ_{t+1 }from obtaining section 410 and model parameters θ_{t+1 }from updating section 450. Predicting section 460 predicts future target value y_{t+2 }by calculating predicted target value ŷ_{t+2 }for time t+2 based on feature vector ϕ_{t+1 }and model parameters θ_{t+1}. In this embodiment, predicting section 460 performs the operations of block S270 in
Generating section 500 includes generator 510, matrix inverter 520, first calculator 530, and second calculator 540. Generator 510 stores Hessian matrix H_{t }and forgetting factor γ. generator 510 receives feature vector ϕ_{t }at time t+1 if t is less than threshold time T_{th}, and generates Hessian matrix H_{t+1 }based on previous Hessian matrix H_{t }and feature vector ϕ_{t}. At time 0, generator 510 initializes Hessian matrix H_{0}. In this embodiment, generator 510 performs block S300, block S310, block S320, and block S330 of
Matrix inverter 520 is connected to generator 510. Matrix inverter 520 receives Hessian matrix H_{t+1 }at time t+1 and calculates an inverse matrix of Hessian matrix H_{t+1 }(e.g., inverse Hessian matrix H_{t+1}^{−1}). In this embodiment, matrix inverter 520 performs block S340 of
First calculator 530 is connected to matrix inverter 520 and second calculator 540. First calculator 530 stores inverse Hessian matrix H_{t}^{−1}. At time t+1, if t is equal to or more than threshold time T_{th}, then first calculator 530 receives feature vector ϕ_{t }and calculates temporal inverse Hessian matrix H′_{t+1}^{−1 }from previous inverse Hessian matrix H_{t}^{−1 }based on feature vector ϕ_{t}. In this embodiment, first calculator 530 performs block S240 of
At time t+1, if t is less than threshold time T_{th }or t is equal to T_{th}−1, then first calculator 530 receives inverse Hessian matrix H_{t+1}^{−1 }from matrix inverter 520, and update the inverse Hessian matrix stored in a memory of first calculator 530 with received Hessian matrix H_{t+1}^{−1}. If t is equal to or more than threshold time T_{th}, first calculator 530 receives inverse Hessian matrix H_{t+1}^{−1 }from second calculator 540, and updates the inverse Hessian matrix in the memory with received Hessian matrix H_{t+1}^{−1}.
Second calculator 540 is connected to first calculator 530. At time t+1, if t is equal to or more than threshold time T_{th}, then second calculator 540 calculates inverse Hessian matrix H_{t+1}^{−1 }from temporal inverse Hessian matrix H′_{t+1}^{−1 }based on modified feature vector {circumflex over (ϕ)}_{t}. In this embodiment, second calculator 540 performs block S240 of
Various embodiments of the present invention can be described with reference to flowcharts and block diagrams whose blocks can represent (1) steps of processes in which operations are performed or (2) sections of apparatuses responsible for performing operations. Certain steps and sections can be implemented by dedicated circuitry, programmable circuitry supplied with computerreadable instructions stored on computerreadable media, and/or processors supplied with computerreadable instructions stored on computerreadable media. Dedicated circuitry can include digital and/or analog hardware circuits and can include integrated circuits (IC) and/or discrete circuits. Programmable circuitry can include reconfigurable hardware circuits including logical AND, OR, XOR, NAND, NOR, and other logical operations, flipflops, registers, memory elements, etc., such as fieldprogrammable gate arrays (FPGA), programmable logic arrays (PLA), etc.
Computerreadable media can include any tangible device that can store instructions for execution by a suitable device, such that the computerreadable medium having instructions stored therein includes an article of manufacture including instructions which can be executed to create means for performing operations specified in the flowcharts or block diagrams. Examples of computerreadable media can include an electronic storage medium, a magnetic storage medium, an optical storage medium, an electromagnetic storage medium, a semiconductor storage medium, etc. More specific examples of computerreadable media can include a floppy disk, a diskette, a hard disk, a random access memory (RAM), a readonly memory (ROM), an erasable programmable readonly memory (EPROM or Flash memory), an electrically erasable programmable readonly memory (EEPROM), a static random access memory (SRAM), a compact disc readonly memory (CDROM), a digital versatile disk (DVD), a BLURAY® disc, a memory stick, an integrated circuit card, etc.
Computerreadable instructions can include assembler instructions, instructionsetarchitecture (ISA) instructions, machine instructions, machine dependent instructions, microcode, firmware instructions, statesetting data, or either source code or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, JAVA®, C++, etc., and conventional procedural programming languages, such as the “C” programming language or similar programming languages.
Computerreadable instructions can be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, or to programmable circuitry, locally or via a local area network (LAN), wide area network (WAN) such as the Internet, etc., to execute the computerreadable instructions to create means for performing operations specified in the flowcharts or block diagrams. Examples of processors include computer processors, processing units, microprocessors, digital signal processors, controllers, microcontrollers, etc.
The computer 1200 according to the present embodiment includes a CPU 1212, a RAM 1214, a graphics controller 1216, and a display device 1218, which are mutually connected by a host controller 1210. The computer 1200 also includes input/output units such as a communication interface 1222, a hard disk drive 1224, a DVDROM drive 1226 and an IC card drive, which are connected to the host controller 1210 via an input/output controller 1220. The computer also includes legacy input/output units such as a ROM 1230 and a keyboard 1242, which are connected to the input/output controller 1220 through an input/output chip 1240.
The CPU 1212 operates according to programs stored in the ROM 1230 and the RAM 1214, thereby controlling each unit. The graphics controller 1216 obtains image data generated by the CPU 1212 on a frame buffer or the like provided in the RAM 1214 or in itself, and causes the image data to be displayed on the display device 1218.
The communication interface 1222 communicates with other electronic devices via a network. The hard disk drive 1224 stores programs and data used by the CPU 1212 within the computer 1200. The DVDROM drive 1226 reads the programs or the data from the DVDROM 1201, and provides the hard disk drive 1224 with the programs or the data via the RAM 1214. The IC card drive reads programs and data from an IC card, and/or writes programs and data into the IC card.
The ROM 1230 stores therein a boot program or the like executed by the computer 1200 at the time of activation, and/or a program depending on the hardware of the computer 1200. The input/output chip 1240 can also connect various input/output units via a parallel port, a serial port, a keyboard port, a mouse port, and the like to the input/output controller 1220.
A program can be provided by computer readable media such as the DVDROM 1201 or the IC card. The program is read from the computer readable media, installed into the hard disk drive 1224, RAM 1214, or ROM 1230, which are also examples of computer readable media, and executed by the CPU 1212. The information processing described in these programs is read into the computer 1200, resulting in cooperation between a program and the abovementioned various types of hardware resources. An apparatus or method can be constituted by realizing the operation or processing of information in accordance with the usage of the computer 1200.
For example, when communication is performed between the computer 1200 and an external device, the CPU 1212 can execute a communication program loaded onto the RAM 1214 to instruct communication processing to the communication interface 1222, based on the processing described in the communication program. The communication interface 1222, under control of the CPU 1212, reads transmission data stored on a transmission buffering region provided in a recording medium such as the RAM 1214, the hard disk drive 1224, the DVDROM 1201, or the IC card, and transmits the read transmission data to a network or writes reception data received from a network to a reception buffering region or the like provided on the recording medium.
In addition, the CPU 1212 can cause all or a necessary portion of a file or a database to be read into the RAM 1214, the file or the database having been stored in an external recording medium such as the hard disk drive 1224, the DVDROM drive 1226 (DVDROM 1201), the IC card, etc., and perform various types of processing on the data on the RAM 1214. The CPU 1212 can then write back the processed data to the external recording medium.
Various types of information, such as various types of programs, data, tables, and databases, can be stored in the recording medium to undergo information processing. The CPU 1212 can perform various types of processing on the data read from the RAM 1214, which includes various types of operations, processing of information, condition judging, conditional branch, unconditional branch, search/replace of information, etc., as described throughout this disclosure and designated by an instruction sequence of programs, and writes the result back to the RAM 1214. In addition, the CPU 1212 can search for information in a file, a database, etc., in the recording medium. For example, when a plurality of entries, each having an attribute value of a first attribute associated with an attribute value of a second attribute, are stored in the recording medium, the CPU 1212 can search for an entry matching the condition whose attribute value of the first attribute is designated, from among the plurality of entries, and read the attribute value of the second attribute stored in the entry, thereby obtaining the attribute value of the second attribute associated with the first attribute satisfying the predetermined condition.
The aboveexplained program or software modules can be stored in the computer readable media on or near the computer 1200. In addition, a recording medium such as a hard disk or a RAM provided in a server system connected to a dedicated communication network or the Internet can be used as the computer readable media, thereby providing the program to the computer 1200 via the network.
While the embodiments of the present invention have been described, the technical scope of the invention is not limited to the above described embodiments. It will be apparent to persons skilled in the art that various alterations and improvements can be added to the abovedescribed embodiments. It should also be apparent from the scope of the claims that the embodiments added with such alterations or improvements are within the technical scope of the invention.
The operations, procedures, steps, and stages of each process performed by an apparatus, system, program, and method shown in the claims, embodiments, or diagrams can be performed in any order as long as the order is not indicated by “prior to,” “before,” or the like and as long as the output from a previous process is not used in a later process. Even if the process flow is described using phrases such as “first” or “next” in the claims, embodiments, or diagrams, it does not necessarily mean that the process must be performed in this order.