|
TRF Language Model
|
#include <hrf-sa-train.h>
Public Member Functions | |
| SAfunc () | |
| SAfunc (Model *pModel, CorpusBase *pTrain, CorpusBase *pValid=NULL, CorpusBase *pTest=NULL, int nMinibatch=100) | |
| ~SAfunc () | |
| void | Reset (Model *pModel, CorpusBase *pTrain, CorpusBase *pValid=NULL, CorpusBase *pTest=NULL, int nMinibatch=100) |
| reset More... | |
| void | PrintInfo () |
| print information More... | |
| int | GetNgramFeatNum () const |
| get the ngram feature number More... | |
| int | GetVHmatSize () const |
| get the VH mat number More... | |
| int | GetCHmatSize () const |
| get the CH mat number More... | |
| int | GetHHmatSize () const |
| get the HH mat number More... | |
| int | GetWeightNum () const |
| get the bias mat number More... | |
| int | GetZetaNum () const |
| get the zeta parameter number More... | |
| void | RandSeq (Seq &seq, int nLen=-1) |
| get a random sequence More... | |
| void | GetParam (double *pdParams) |
| get the parameters More... | |
| void | GetEmpiricalFeatExp (Vec< double > &vExp) |
| get the empirical variance of features More... | |
| void | GetEmpiricalFeatVar (Vec< double > &vVar) |
| claculate the empirical variance of features More... | |
| int | GetEmpiricalExp (VecShell< double > &vExp, VecShell< double > &vExp2, Array< int > &aRandIdx) |
| calculate the empirical expectation of given sequence More... | |
| int | GetEmpiricalExp (VecShell< double > &vExp, VecShell< double > &vExp2) |
| calculate the empirical expectation More... | |
| int | GetSampleExp (VecShell< double > &vExp, VecShell< double > &vLen) |
| calcualte the expectation of SA samples More... | |
| void | PerfromCD (VecShell< double > &vEmpExp, VecShell< double > &vSamExp, VecShell< double > &vEmpExp2, VecShell< double > &vLen) |
| perform CD process and get the expectation More... | |
| void | PerfromSA (VecShell< double > &vEmpExp, VecShell< double > &vSamExp, VecShell< double > &vEmpExp2, VecShell< double > &vLen) |
| perform SA process and get the expectation More... | |
| double | GetSampleLL (CorpusBase *pCorpus, int nCalNum=-1, int method=0) |
| perform SAMS, and then select the training sequences of the same length. More... | |
| void | IterEnd (double *pFinalParams) |
| do something at the end of the SA iteration More... | |
| void | WriteModel (int nEpoch) |
| Write Model. More... | |
| virtual void | SetParam (double *pdParams) |
| set the parameter. More... | |
| virtual void | GetGradient (double *pdGradient) |
| calculate the gradient g(x) More... | |
| virtual double | GetValue () |
| calculate the function value f(x) More... | |
| virtual int | GetExtraValues (int t, double *pdValues) |
| calculate extra values which will be print at each iteration More... | |
Public Member Functions inherited from hrf::MLfunc | |
| MLfunc () | |
| MLfunc (Model *pModel, CorpusBase *pTrain, CorpusBase *pValid=NULL, CorpusBase *pTest=NULL) | |
| void | Reset (Model *pModel, CorpusBase *pTrain, CorpusBase *pValid=NULL, CorpusBase *pTest=NULL) |
| Model * | GetModel () const |
| void | GetParam (double *pdParams) |
| virtual double | GetLL (CorpusBase *pCorpus, int nCalNum=-1) |
| calculate the log-likelihood on corpus More... | |
Public Member Functions inherited from wb::Func | |
| Func (int nParamNum=0) | |
| void | SetParamNum (int n) |
| setting the parameter number More... | |
| int | GetParamNum () const |
| get the paremeter number More... | |
Public Attributes | |
| AISConfig | m_AISConfigForZ |
| the AIS configuration for normalization More... | |
| AISConfig | m_AISConfigForP |
| the AIS configuration for calculating the LL. More... | |
| int | m_nTrainHiddenSampleTimes |
| the sample times for training sequence More... | |
| int | m_nSampleHiddenSampleTimes |
| the sample times for the hidden of samples More... | |
| int | m_nCDSampleTimes |
| the CD-n: the sample number. More... | |
| int | m_nSASampleTimes |
| the SA sample times More... | |
| bool | m_bSAMSSample |
| if using the sams sampling method More... | |
| File | m_fdbg |
| output the sample pi/zete information More... | |
| File | m_fparm |
| output the parameters of each iteration More... | |
| File | m_fgrad |
| output the gradient of each iteration More... | |
| File | m_fvar |
| output the variance at each iteration More... | |
| File | m_fexp |
| output the expectation of each iteartion More... | |
| File | m_fsamp |
| output all the samples More... | |
| File | m_ftrain |
| output all the training sequences More... | |
| File | m_feat_mean |
| output the empirical mean More... | |
| File | m_feat_var |
| output the empirical variance More... | |
| bool | m_bPrintTrain |
| output the LL on training set More... | |
| bool | m_bPrintValie |
| output the LL on valid set More... | |
| bool | m_bPrintTest |
| output the LL on test set More... | |
Public Attributes inherited from hrf::MLfunc | |
| const char * | m_pathOutputModel |
| Write to model during iteration. More... | |
Protected Attributes | |
| int | m_nMiniBatchSample |
| mini-batch for samples More... | |
| int | m_nMiniBatchTraining |
| mini-batch for training set More... | |
| trf::CorpusRandSelect | m_TrainSelect |
| random select the sequence from corpus More... | |
| CorpusCache | m_TrainCache |
| cache all the h of training sequences. More... | |
| Vec< Prob > | m_samplePi |
| the length distribution used for sample More... | |
Protected Attributes inherited from hrf::MLfunc | |
| Model * | m_pModel |
| HRF model. More... | |
| CorpusBase * | m_pCorpusTrain |
| training corpus More... | |
| CorpusBase * | m_pCorpusValid |
| valid corpus More... | |
| CorpusBase * | m_pCorpusTest |
| test corpus More... | |
| Vec< PValue > | m_values |
| Vec< Prob > | m_trainPi |
| the length distribution in training corpus More... | |
Protected Attributes inherited from wb::Func | |
| Solve * | m_pSolve |
| Save the solve pointor. More... | |
| int | m_nParamNum |
| the parameter number More... | |
Friends | |
| class | SAtrain |
Additional Inherited Members | |
Static Public Attributes inherited from wb::Func | |
| static const int | cn_exvalue_max_num = 100 |
Definition at line 42 of file hrf-sa-train.h.
|
inline |
Definition at line 112 of file hrf-sa-train.h.
|
inline |
Definition at line 122 of file hrf-sa-train.h.
|
inline |
Definition at line 134 of file hrf-sa-train.h.
|
inline |
get the CH mat number
Definition at line 157 of file hrf-sa-train.h.
| int hrf::SAfunc::GetEmpiricalExp | ( | VecShell< double > & | vExp, |
| VecShell< double > & | vExp2, | ||
| Array< int > & | aRandIdx | ||
| ) |
calculate the empirical expectation of given sequence
several times sampling
Definition at line 332 of file hrf-sa-train.cpp.
calculate the empirical expectation
Definition at line 396 of file hrf-sa-train.cpp.
| void hrf::SAfunc::GetEmpiricalFeatExp | ( | Vec< double > & | vExp | ) |
get the empirical variance of features
claculate the empirical expectation of features
Definition at line 165 of file hrf-sa-train.cpp.
| void hrf::SAfunc::GetEmpiricalFeatVar | ( | Vec< double > & | vVar | ) |
claculate the empirical variance of features
Count p[f^2]
Count p_l[f] As save p_l[f] for all the length cost too much memory. So we calculate each p_l[f] separately.
find all the sequence with length nLen
calcualte p[f^2] - * p_l[f]^2
output the zero number
save
Definition at line 199 of file hrf-sa-train.cpp.
|
virtual |
calculate extra values which will be print at each iteration
| [in] | k | iteration number form 1 to ... |
| [out] | pdValues | Return the values needed to be outputed. The memory is allocated outside and the maximum size = cn_exvalue_max_num |
Reimplemented from hrf::MLfunc.
Definition at line 937 of file hrf-sa-train.cpp.
|
virtual |
calculate the gradient g(x)
Reimplemented from hrf::MLfunc.
Definition at line 816 of file hrf-sa-train.cpp.
|
inline |
get the HH mat number
Definition at line 159 of file hrf-sa-train.h.
|
inline |
get the ngram feature number
Definition at line 153 of file hrf-sa-train.h.
| void hrf::SAfunc::GetParam | ( | double * | pdParams | ) |
get the parameters
Definition at line 134 of file hrf-sa-train.cpp.
calcualte the expectation of SA samples
< sample hidden
save the length count
save current length count
Definition at line 405 of file hrf-sa-train.cpp.
| double hrf::SAfunc::GetSampleLL | ( | CorpusBase * | pCorpus, |
| int | nCalNum = -1, |
||
| int | method = 0 |
||
| ) |
perform SAMS, and then select the training sequences of the same length.
Sample the most possible hidden and calculate the LL
Definition at line 756 of file hrf-sa-train.cpp.
|
inlinevirtual |
calculate the function value f(x)
Reimplemented from hrf::MLfunc.
Definition at line 200 of file hrf-sa-train.h.
|
inline |
get the VH mat number
Definition at line 155 of file hrf-sa-train.h.
|
inline |
get the bias mat number
get the nunber of all the weight up the exp
Definition at line 163 of file hrf-sa-train.h.
|
inline |
get the zeta parameter number
Definition at line 165 of file hrf-sa-train.h.
| void hrf::SAfunc::IterEnd | ( | double * | pFinalParams | ) |
do something at the end of the SA iteration
Definition at line 796 of file hrf-sa-train.cpp.
| void hrf::SAfunc::PerfromCD | ( | VecShell< double > & | vEmpExp, |
| VecShell< double > & | vSamExp, | ||
| VecShell< double > & | vEmpExp2, | ||
| VecShell< double > & | vLen | ||
| ) |
perform CD process and get the expectation
save the length count
save current length count
Definition at line 485 of file hrf-sa-train.cpp.
| void hrf::SAfunc::PerfromSA | ( | VecShell< double > & | vEmpExp, |
| VecShell< double > & | vSamExp, | ||
| VecShell< double > & | vEmpExp2, | ||
| VecShell< double > & | vLen | ||
| ) |
perform SA process and get the expectation
record the length of the training sequence
several times sampling
count
< sample hidden
save the length count
save current length count
Definition at line 584 of file hrf-sa-train.cpp.
| void hrf::SAfunc::PrintInfo | ( | ) |
print information
Definition at line 87 of file hrf-sa-train.cpp.
| void hrf::SAfunc::RandSeq | ( | Seq & | seq, |
| int | nLen = -1 |
||
| ) |
get a random sequence
Definition at line 104 of file hrf-sa-train.cpp.
| void hrf::SAfunc::Reset | ( | Model * | pModel, |
| CorpusBase * | pTrain, | ||
| CorpusBase * | pValid = NULL, |
||
| CorpusBase * | pTest = NULL, |
||
| int | nMinibatch = 100 |
||
| ) |
reset
Definition at line 22 of file hrf-sa-train.cpp.
|
virtual |
| void hrf::SAfunc::WriteModel | ( | int | nEpoch | ) |
Write Model.
Definition at line 802 of file hrf-sa-train.cpp.
|
friend |
Definition at line 44 of file hrf-sa-train.h.
| AISConfig hrf::SAfunc::m_AISConfigForP |
the AIS configuration for calculating the LL.
Definition at line 89 of file hrf-sa-train.h.
| AISConfig hrf::SAfunc::m_AISConfigForZ |
the AIS configuration for normalization
Definition at line 88 of file hrf-sa-train.h.
| bool hrf::SAfunc::m_bPrintTest |
output the LL on test set
Definition at line 109 of file hrf-sa-train.h.
| bool hrf::SAfunc::m_bPrintTrain |
output the LL on training set
Definition at line 107 of file hrf-sa-train.h.
| bool hrf::SAfunc::m_bPrintValie |
output the LL on valid set
Definition at line 108 of file hrf-sa-train.h.
| bool hrf::SAfunc::m_bSAMSSample |
if using the sams sampling method
Definition at line 94 of file hrf-sa-train.h.
| File hrf::SAfunc::m_fdbg |
output the sample pi/zete information
Definition at line 97 of file hrf-sa-train.h.
| File hrf::SAfunc::m_feat_mean |
output the empirical mean
Definition at line 104 of file hrf-sa-train.h.
| File hrf::SAfunc::m_feat_var |
output the empirical variance
Definition at line 105 of file hrf-sa-train.h.
| File hrf::SAfunc::m_fexp |
output the expectation of each iteartion
Definition at line 101 of file hrf-sa-train.h.
| File hrf::SAfunc::m_fgrad |
output the gradient of each iteration
Definition at line 99 of file hrf-sa-train.h.
| File hrf::SAfunc::m_fparm |
output the parameters of each iteration
Definition at line 98 of file hrf-sa-train.h.
| File hrf::SAfunc::m_fsamp |
output all the samples
Definition at line 102 of file hrf-sa-train.h.
| File hrf::SAfunc::m_ftrain |
output all the training sequences
Definition at line 103 of file hrf-sa-train.h.
| File hrf::SAfunc::m_fvar |
output the variance at each iteration
Definition at line 100 of file hrf-sa-train.h.
| int hrf::SAfunc::m_nCDSampleTimes |
the CD-n: the sample number.
Definition at line 92 of file hrf-sa-train.h.
|
protected |
mini-batch for samples
Definition at line 46 of file hrf-sa-train.h.
|
protected |
mini-batch for training set
Definition at line 47 of file hrf-sa-train.h.
| int hrf::SAfunc::m_nSampleHiddenSampleTimes |
the sample times for the hidden of samples
Definition at line 91 of file hrf-sa-train.h.
| int hrf::SAfunc::m_nSASampleTimes |
the SA sample times
Definition at line 93 of file hrf-sa-train.h.
| int hrf::SAfunc::m_nTrainHiddenSampleTimes |
the sample times for training sequence
Definition at line 90 of file hrf-sa-train.h.
the length distribution used for sample
Definition at line 51 of file hrf-sa-train.h.
|
protected |
cache all the h of training sequences.
Definition at line 49 of file hrf-sa-train.h.
|
protected |
random select the sequence from corpus
Definition at line 48 of file hrf-sa-train.h.