TRF Language Model
|
#include <hrf-sa-train.h>
Public Member Functions | |
SAfunc () | |
SAfunc (Model *pModel, CorpusBase *pTrain, CorpusBase *pValid=NULL, CorpusBase *pTest=NULL, int nMinibatch=100) | |
~SAfunc () | |
void | Reset (Model *pModel, CorpusBase *pTrain, CorpusBase *pValid=NULL, CorpusBase *pTest=NULL, int nMinibatch=100) |
reset More... | |
void | PrintInfo () |
print information More... | |
int | GetNgramFeatNum () const |
get the ngram feature number More... | |
int | GetVHmatSize () const |
get the VH mat number More... | |
int | GetCHmatSize () const |
get the CH mat number More... | |
int | GetHHmatSize () const |
get the HH mat number More... | |
int | GetWeightNum () const |
get the bias mat number More... | |
int | GetZetaNum () const |
get the zeta parameter number More... | |
void | RandSeq (Seq &seq, int nLen=-1) |
get a random sequence More... | |
void | GetParam (double *pdParams) |
get the parameters More... | |
void | GetEmpiricalFeatExp (Vec< double > &vExp) |
get the empirical variance of features More... | |
void | GetEmpiricalFeatVar (Vec< double > &vVar) |
claculate the empirical variance of features More... | |
int | GetEmpiricalExp (VecShell< double > &vExp, VecShell< double > &vExp2, Array< int > &aRandIdx) |
calculate the empirical expectation of given sequence More... | |
int | GetEmpiricalExp (VecShell< double > &vExp, VecShell< double > &vExp2) |
calculate the empirical expectation More... | |
int | GetSampleExp (VecShell< double > &vExp, VecShell< double > &vLen) |
calcualte the expectation of SA samples More... | |
void | PerfromCD (VecShell< double > &vEmpExp, VecShell< double > &vSamExp, VecShell< double > &vEmpExp2, VecShell< double > &vLen) |
perform CD process and get the expectation More... | |
void | PerfromSA (VecShell< double > &vEmpExp, VecShell< double > &vSamExp, VecShell< double > &vEmpExp2, VecShell< double > &vLen) |
perform SA process and get the expectation More... | |
double | GetSampleLL (CorpusBase *pCorpus, int nCalNum=-1, int method=0) |
perform SAMS, and then select the training sequences of the same length. More... | |
void | IterEnd (double *pFinalParams) |
do something at the end of the SA iteration More... | |
void | WriteModel (int nEpoch) |
Write Model. More... | |
virtual void | SetParam (double *pdParams) |
set the parameter. More... | |
virtual void | GetGradient (double *pdGradient) |
calculate the gradient g(x) More... | |
virtual double | GetValue () |
calculate the function value f(x) More... | |
virtual int | GetExtraValues (int t, double *pdValues) |
calculate extra values which will be print at each iteration More... | |
Public Member Functions inherited from hrf::MLfunc | |
MLfunc () | |
MLfunc (Model *pModel, CorpusBase *pTrain, CorpusBase *pValid=NULL, CorpusBase *pTest=NULL) | |
void | Reset (Model *pModel, CorpusBase *pTrain, CorpusBase *pValid=NULL, CorpusBase *pTest=NULL) |
Model * | GetModel () const |
void | GetParam (double *pdParams) |
virtual double | GetLL (CorpusBase *pCorpus, int nCalNum=-1) |
calculate the log-likelihood on corpus More... | |
Public Member Functions inherited from wb::Func | |
Func (int nParamNum=0) | |
void | SetParamNum (int n) |
setting the parameter number More... | |
int | GetParamNum () const |
get the paremeter number More... | |
Public Attributes | |
AISConfig | m_AISConfigForZ |
the AIS configuration for normalization More... | |
AISConfig | m_AISConfigForP |
the AIS configuration for calculating the LL. More... | |
int | m_nTrainHiddenSampleTimes |
the sample times for training sequence More... | |
int | m_nSampleHiddenSampleTimes |
the sample times for the hidden of samples More... | |
int | m_nCDSampleTimes |
the CD-n: the sample number. More... | |
int | m_nSASampleTimes |
the SA sample times More... | |
bool | m_bSAMSSample |
if using the sams sampling method More... | |
File | m_fdbg |
output the sample pi/zete information More... | |
File | m_fparm |
output the parameters of each iteration More... | |
File | m_fgrad |
output the gradient of each iteration More... | |
File | m_fvar |
output the variance at each iteration More... | |
File | m_fexp |
output the expectation of each iteartion More... | |
File | m_fsamp |
output all the samples More... | |
File | m_ftrain |
output all the training sequences More... | |
File | m_feat_mean |
output the empirical mean More... | |
File | m_feat_var |
output the empirical variance More... | |
bool | m_bPrintTrain |
output the LL on training set More... | |
bool | m_bPrintValie |
output the LL on valid set More... | |
bool | m_bPrintTest |
output the LL on test set More... | |
Public Attributes inherited from hrf::MLfunc | |
const char * | m_pathOutputModel |
Write to model during iteration. More... | |
Protected Attributes | |
int | m_nMiniBatchSample |
mini-batch for samples More... | |
int | m_nMiniBatchTraining |
mini-batch for training set More... | |
trf::CorpusRandSelect | m_TrainSelect |
random select the sequence from corpus More... | |
CorpusCache | m_TrainCache |
cache all the h of training sequences. More... | |
Vec< Prob > | m_samplePi |
the length distribution used for sample More... | |
Protected Attributes inherited from hrf::MLfunc | |
Model * | m_pModel |
HRF model. More... | |
CorpusBase * | m_pCorpusTrain |
training corpus More... | |
CorpusBase * | m_pCorpusValid |
valid corpus More... | |
CorpusBase * | m_pCorpusTest |
test corpus More... | |
Vec< PValue > | m_values |
Vec< Prob > | m_trainPi |
the length distribution in training corpus More... | |
Protected Attributes inherited from wb::Func | |
Solve * | m_pSolve |
Save the solve pointor. More... | |
int | m_nParamNum |
the parameter number More... | |
Friends | |
class | SAtrain |
Additional Inherited Members | |
Static Public Attributes inherited from wb::Func | |
static const int | cn_exvalue_max_num = 100 |
Definition at line 42 of file hrf-sa-train.h.
|
inline |
Definition at line 112 of file hrf-sa-train.h.
|
inline |
Definition at line 122 of file hrf-sa-train.h.
|
inline |
Definition at line 134 of file hrf-sa-train.h.
|
inline |
get the CH mat number
Definition at line 157 of file hrf-sa-train.h.
int hrf::SAfunc::GetEmpiricalExp | ( | VecShell< double > & | vExp, |
VecShell< double > & | vExp2, | ||
Array< int > & | aRandIdx | ||
) |
calculate the empirical expectation of given sequence
several times sampling
Definition at line 332 of file hrf-sa-train.cpp.
calculate the empirical expectation
Definition at line 396 of file hrf-sa-train.cpp.
void hrf::SAfunc::GetEmpiricalFeatExp | ( | Vec< double > & | vExp | ) |
get the empirical variance of features
claculate the empirical expectation of features
Definition at line 165 of file hrf-sa-train.cpp.
void hrf::SAfunc::GetEmpiricalFeatVar | ( | Vec< double > & | vVar | ) |
claculate the empirical variance of features
Count p[f^2]
Count p_l[f] As save p_l[f] for all the length cost too much memory. So we calculate each p_l[f] separately.
find all the sequence with length nLen
calcualte p[f^2] - * p_l[f]^2
output the zero number
save
Definition at line 199 of file hrf-sa-train.cpp.
|
virtual |
calculate extra values which will be print at each iteration
[in] | k | iteration number form 1 to ... |
[out] | pdValues | Return the values needed to be outputed. The memory is allocated outside and the maximum size = cn_exvalue_max_num |
Reimplemented from hrf::MLfunc.
Definition at line 937 of file hrf-sa-train.cpp.
|
virtual |
calculate the gradient g(x)
Reimplemented from hrf::MLfunc.
Definition at line 816 of file hrf-sa-train.cpp.
|
inline |
get the HH mat number
Definition at line 159 of file hrf-sa-train.h.
|
inline |
get the ngram feature number
Definition at line 153 of file hrf-sa-train.h.
void hrf::SAfunc::GetParam | ( | double * | pdParams | ) |
get the parameters
Definition at line 134 of file hrf-sa-train.cpp.
calcualte the expectation of SA samples
< sample hidden
save the length count
save current length count
Definition at line 405 of file hrf-sa-train.cpp.
double hrf::SAfunc::GetSampleLL | ( | CorpusBase * | pCorpus, |
int | nCalNum = -1 , |
||
int | method = 0 |
||
) |
perform SAMS, and then select the training sequences of the same length.
Sample the most possible hidden and calculate the LL
Definition at line 756 of file hrf-sa-train.cpp.
|
inlinevirtual |
calculate the function value f(x)
Reimplemented from hrf::MLfunc.
Definition at line 200 of file hrf-sa-train.h.
|
inline |
get the VH mat number
Definition at line 155 of file hrf-sa-train.h.
|
inline |
get the bias mat number
get the nunber of all the weight up the exp
Definition at line 163 of file hrf-sa-train.h.
|
inline |
get the zeta parameter number
Definition at line 165 of file hrf-sa-train.h.
void hrf::SAfunc::IterEnd | ( | double * | pFinalParams | ) |
do something at the end of the SA iteration
Definition at line 796 of file hrf-sa-train.cpp.
void hrf::SAfunc::PerfromCD | ( | VecShell< double > & | vEmpExp, |
VecShell< double > & | vSamExp, | ||
VecShell< double > & | vEmpExp2, | ||
VecShell< double > & | vLen | ||
) |
perform CD process and get the expectation
save the length count
save current length count
Definition at line 485 of file hrf-sa-train.cpp.
void hrf::SAfunc::PerfromSA | ( | VecShell< double > & | vEmpExp, |
VecShell< double > & | vSamExp, | ||
VecShell< double > & | vEmpExp2, | ||
VecShell< double > & | vLen | ||
) |
perform SA process and get the expectation
record the length of the training sequence
several times sampling
count
< sample hidden
save the length count
save current length count
Definition at line 584 of file hrf-sa-train.cpp.
void hrf::SAfunc::PrintInfo | ( | ) |
print information
Definition at line 87 of file hrf-sa-train.cpp.
void hrf::SAfunc::RandSeq | ( | Seq & | seq, |
int | nLen = -1 |
||
) |
get a random sequence
Definition at line 104 of file hrf-sa-train.cpp.
void hrf::SAfunc::Reset | ( | Model * | pModel, |
CorpusBase * | pTrain, | ||
CorpusBase * | pValid = NULL , |
||
CorpusBase * | pTest = NULL , |
||
int | nMinibatch = 100 |
||
) |
reset
Definition at line 22 of file hrf-sa-train.cpp.
|
virtual |
void hrf::SAfunc::WriteModel | ( | int | nEpoch | ) |
Write Model.
Definition at line 802 of file hrf-sa-train.cpp.
|
friend |
Definition at line 44 of file hrf-sa-train.h.
AISConfig hrf::SAfunc::m_AISConfigForP |
the AIS configuration for calculating the LL.
Definition at line 89 of file hrf-sa-train.h.
AISConfig hrf::SAfunc::m_AISConfigForZ |
the AIS configuration for normalization
Definition at line 88 of file hrf-sa-train.h.
bool hrf::SAfunc::m_bPrintTest |
output the LL on test set
Definition at line 109 of file hrf-sa-train.h.
bool hrf::SAfunc::m_bPrintTrain |
output the LL on training set
Definition at line 107 of file hrf-sa-train.h.
bool hrf::SAfunc::m_bPrintValie |
output the LL on valid set
Definition at line 108 of file hrf-sa-train.h.
bool hrf::SAfunc::m_bSAMSSample |
if using the sams sampling method
Definition at line 94 of file hrf-sa-train.h.
File hrf::SAfunc::m_fdbg |
output the sample pi/zete information
Definition at line 97 of file hrf-sa-train.h.
File hrf::SAfunc::m_feat_mean |
output the empirical mean
Definition at line 104 of file hrf-sa-train.h.
File hrf::SAfunc::m_feat_var |
output the empirical variance
Definition at line 105 of file hrf-sa-train.h.
File hrf::SAfunc::m_fexp |
output the expectation of each iteartion
Definition at line 101 of file hrf-sa-train.h.
File hrf::SAfunc::m_fgrad |
output the gradient of each iteration
Definition at line 99 of file hrf-sa-train.h.
File hrf::SAfunc::m_fparm |
output the parameters of each iteration
Definition at line 98 of file hrf-sa-train.h.
File hrf::SAfunc::m_fsamp |
output all the samples
Definition at line 102 of file hrf-sa-train.h.
File hrf::SAfunc::m_ftrain |
output all the training sequences
Definition at line 103 of file hrf-sa-train.h.
File hrf::SAfunc::m_fvar |
output the variance at each iteration
Definition at line 100 of file hrf-sa-train.h.
int hrf::SAfunc::m_nCDSampleTimes |
the CD-n: the sample number.
Definition at line 92 of file hrf-sa-train.h.
|
protected |
mini-batch for samples
Definition at line 46 of file hrf-sa-train.h.
|
protected |
mini-batch for training set
Definition at line 47 of file hrf-sa-train.h.
int hrf::SAfunc::m_nSampleHiddenSampleTimes |
the sample times for the hidden of samples
Definition at line 91 of file hrf-sa-train.h.
int hrf::SAfunc::m_nSASampleTimes |
the SA sample times
Definition at line 93 of file hrf-sa-train.h.
int hrf::SAfunc::m_nTrainHiddenSampleTimes |
the sample times for training sequence
Definition at line 90 of file hrf-sa-train.h.
the length distribution used for sample
Definition at line 51 of file hrf-sa-train.h.
|
protected |
cache all the h of training sequences.
Definition at line 49 of file hrf-sa-train.h.
|
protected |
random select the sequence from corpus
Definition at line 48 of file hrf-sa-train.h.