Title: | Zero-Variance Control Variates |
---|---|
Description: | Stein control variates can be used to improve Monte Carlo estimates of expectations when the derivatives of the log target are available. This package implements a variety of such methods, including zero-variance control variates (ZV-CV, Mira et al. (2013) <doi:10.1007/s11222-012-9344-6>), regularised ZV-CV (South et al., 2018 <arXiv:1811.05073>), control functionals (CF, Oates et al. (2017) <doi:10.1111/rssb.12185>) and semi-exact control functionals (SECF, South et al., 2020 <arXiv:2002.00033>). ZV-CV is a parametric approach that is exact for (low order) polynomial integrands with Gaussian targets. CF is a non-parametric alternative that offers better than the standard Monte Carlo convergence rates. SECF has both a parametric and a non-parametric component and it offers the advantages of both for an additional computational cost. Functions for applying ZV-CV and CF to two estimators for the normalising constant of the posterior distribution in Bayesian statistics are also supplied in this package. The basic requirements for using the package are a set of samples, derivatives and function evaluations. |
Authors: | Leah F. South [aut, cre] |
Maintainer: | Leah F. South <[email protected]> |
License: | GPL (>= 2) |
Version: | 2.1.2 |
Built: | 2024-11-07 05:13:10 UTC |
Source: | https://github.com/leahprice/zvcv |
This function performs approximate semi-exact control functionals as described in South et al (2020). It uses a nystrom approximation and conjugate gradient to speed up SECF.
This is faster than SECF
for large . If you would like to choose
between different kernels using cross-validation, then you can use
aSECF_crossval
.
aSECF( integrands, samples, derivatives, polyorder = NULL, steinOrder = NULL, kernel_function = NULL, sigma = NULL, K0 = NULL, nystrom_inds = NULL, est_inds = NULL, apriori = NULL, conjugate_gradient = TRUE, reltol = 0.01, diagnostics = FALSE )
aSECF( integrands, samples, derivatives, polyorder = NULL, steinOrder = NULL, kernel_function = NULL, sigma = NULL, K0 = NULL, nystrom_inds = NULL, est_inds = NULL, apriori = NULL, conjugate_gradient = TRUE, reltol = 0.01, diagnostics = FALSE )
integrands |
An |
samples |
An |
derivatives |
An |
polyorder |
(optional) The order of the polynomial to be used in the parametric component, with a default of |
steinOrder |
(optional) This is the order of the Stein operator. The default is |
kernel_function |
(optional) Choose between "gaussian", "matern", "RQ", "product" or "prodsim". See below for further details. |
sigma |
(optional) The tuning parameters of the specified kernel. This involves a single length-scale parameter in "gaussian" and "RQ", a length-scale and a smoothness parameter in "matern" and two parameters in "product" and "prodsim". See below for further details. |
K0 |
(optional) The kernel matrix. One can specify either this or all of |
nystrom_inds |
(optional) The sample indices to be used in the Nystrom approximation. |
est_inds |
(optional) A vector of indices for the estimation-only samples. The default when |
apriori |
(optional) A vector containing the subset of parameter indices to use in the polynomial. Typically this argument would only be used if the dimension of the problem is very large or if prior information about parameter dependencies is known. The default is to use all parameters |
conjugate_gradient |
(optional) A flag for whether to perform conjugate gradient to further speed up the nystrom approximation (the default is true). |
reltol |
(optional) The relative tolerance for choosing when the stop conjugate gradient iterations (the default is 1e-02).
using |
diagnostics |
(optional) A flag for whether to return the necessary outputs for plotting or estimating using the fitted model. The default is |
A list with the following elements:
expectation
: The estimate(s) of the () expectations(s).
cond_no
: (Only if conjugate_gradient
= TRUE
) The condition number of the matrix being solved using conjugate gradient.
iter
: (Only if conjugate_gradient
= TRUE
) The number of conjugate gradient iterations
f_true
: (Only if est_inds
is not NULL
) The integrands for the evaluation set. This should be the same as integrands[setdiff(1:N,est_inds),].
f_hat
: (Only if est_inds
is not NULL
) The fitted values for the integrands in the evaluation set. This can be used to help assess the performance of the Gaussian process model.
a
: (Only if diagnostics
= TRUE
) The value of as described in South et al (2020), where predictions are of the form
for heldout K0 and Phi matrices and estimators using heldout samples are of the form
.
b
: (Only if diagnostics
= TRUE
) The value of as described in South et al (2020), where predictions are of the form
for heldout K0 and Phi matrices and estimators using heldout samples are of the form
.
ny_inds
: (Only if diagnostics
= TRUE
) The indices of the samples used in the nystrom approximation (this will match nystrom_inds if this argument was not NULL
).
, the kernel and the Stein orderThe kernel in Stein-based kernel methods is where
is a first or second order Stein operator in
and
is some generic kernel to be specified.
The Stein operators for distribution are defined as:
steinOrder=1
: (see e.g. Oates el al (2017))
steinOrder=2
: (see e.g. South el al (2020))
Here is the first order derivative wrt
and
is the Laplacian operator.
The generic kernels which are implemented in this package are listed below. Note that the input parameter sigma
defines the kernel parameters .
"gaussian"
: A Gaussian kernel,
"matern"
: A Matern kernel with ,
where ,
and
is the modified Bessel function of the second kind. Note that
is the length-scale parameter and
is the smoothness parameter (which defaults to 2.5 for
and 4.5 for
).
"RQ"
: A rational quadratic kernel,
"product"
: The product kernel that appears in Oates et al (2017) with
"prodsim"
: A slightly different product kernel with (see e.g. https://www.imperial.ac.uk/inference-group/projects/monte-carlo-methods/control-functionals/),
In the above equations, and
. For the last two kernels, the code only has implementations for
steinOrder
=1
. Each combination of steinOrder
and kernel_function
above is currently hard-coded but it may be possible to extend this to other kernels in future versions using autodiff. The calculations for the first three kernels above are detailed in South et al (2020).
Leah F. South
South, L. F., Karvonen, T., Nemeth, C., Girolami, M. and Oates, C. J. (2020). Semi-Exact Control Functionals From Sard's Method. https://arxiv.org/abs/2002.00033
See ZVCV for examples and related functions. See aSECF_crossval
for a function to choose between different kernels for this estimator.
This function chooses between a list of kernel tuning parameters (sigma_list
) or a list of K0 matrices (K0_list
) for
the approximate semi-exact control functionals method described in South et al (2020). The latter requires
calculating and storing kernel matrices using K0_fn
but it is more flexible
because it can be used to choose the Stein operator order and the kernel function, in addition
to its parameters. It is also faster to pre-specify K0_fn
.
For estimation with fixed kernel parameters, use aSECF
.
aSECF_crossval( integrands, samples, derivatives, polyorder = NULL, steinOrder = NULL, kernel_function = NULL, sigma_list = NULL, est_inds = NULL, apriori = NULL, num_nystrom = NULL, conjugate_gradient = TRUE, reltol = 0.01, folds = NULL, diagnostics = FALSE )
aSECF_crossval( integrands, samples, derivatives, polyorder = NULL, steinOrder = NULL, kernel_function = NULL, sigma_list = NULL, est_inds = NULL, apriori = NULL, num_nystrom = NULL, conjugate_gradient = TRUE, reltol = 0.01, folds = NULL, diagnostics = FALSE )
integrands |
An |
samples |
An |
derivatives |
An |
polyorder |
(optional) The order of the polynomial to be used in the parametric component, with a default of |
steinOrder |
(optional) This is the order of the Stein operator. The default is |
kernel_function |
(optional) Choose between "gaussian", "matern", "RQ", "product" or "prodsim". See below for further details. |
sigma_list |
(optional between this and |
est_inds |
(optional) A vector of indices for the estimation-only samples. The default when |
apriori |
(optional) A vector containing the subset of parameter indices to use in the polynomial. Typically this argument would only be used if the dimension of the problem is very large or if prior information about parameter dependencies is known. The default is to use all parameters |
num_nystrom |
(optional) The number of samples to use in the Nystrom approximation, with a default of ceiling(sqrt(N)). The nystrom indices cannot be passed in here because of the way the cross-validation has been set up. |
conjugate_gradient |
(optional) A flag for whether to perform conjugate gradient to further speed up the nystrom approximation (the default is true). |
reltol |
(optional) The relative tolerance for choosing when the stop conjugate gradient iterations (the default is 1e-02).
using |
folds |
(optional) The number of folds for cross-validation. The default is five. |
diagnostics |
(optional) A flag for whether to return the necessary outputs for plotting or estimating using the fitted model. The default is |
A list with the following elements:
expectation
: The estimate(s) of the () expectations(s).
mse
: A matrix of the cross-validation mean square prediction errors. The number of columns is the number of tuning options given and the number of rows is , the number of integrands of interest.
optinds
: The optimal indices from the list for each expectation.
cond_no
: (Only if conjugate_gradient
= TRUE
) The condition number of the matrix being solved using conjugate gradient.
iter
: (Only if conjugate_gradient
= TRUE
) The number of conjugate gradient iterations
f_true
: (Only if est_inds
is not NULL
) The integrands for the evaluation set. This should be the same as integrands[setdiff(1:N,est_inds),].
f_hat
: (Only if est_inds
is not NULL
) The fitted values for the integrands in the evaluation set. This can be used to help assess the performance of the Gaussian process model.
a
: (Only if diagnostics
= TRUE
) The value of as described in South et al (2020), where predictions are of the form
for heldout K0 and Phi matrices and estimators using heldout samples are of the form
.
b
: (Only if diagnostics
= TRUE
) The value of as described in South et al (2020), where predictions are of the form
for heldout K0 and Phi matrices and estimators using heldout samples are of the form
.
ny_inds
: (Only if diagnostics
= TRUE
) The indices of the samples used in the nystrom approximation (this will match nystrom_inds if this argument was not NULL
).
, the kernel and the Stein orderThe kernel in Stein-based kernel methods is where
is a first or second order Stein operator in
and
is some generic kernel to be specified.
The Stein operators for distribution are defined as:
steinOrder=1
: (see e.g. Oates el al (2017))
steinOrder=2
: (see e.g. South el al (2020))
Here is the first order derivative wrt
and
is the Laplacian operator.
The generic kernels which are implemented in this package are listed below. Note that the input parameter sigma
defines the kernel parameters .
"gaussian"
: A Gaussian kernel,
"matern"
: A Matern kernel with ,
where ,
and
is the modified Bessel function of the second kind. Note that
is the length-scale parameter and
is the smoothness parameter (which defaults to 2.5 for
and 4.5 for
).
"RQ"
: A rational quadratic kernel,
"product"
: The product kernel that appears in Oates et al (2017) with
"prodsim"
: A slightly different product kernel with (see e.g. https://www.imperial.ac.uk/inference-group/projects/monte-carlo-methods/control-functionals/),
In the above equations, and
. For the last two kernels, the code only has implementations for
steinOrder
=1
. Each combination of steinOrder
and kernel_function
above is currently hard-coded but it may be possible to extend this to other kernels in future versions using autodiff. The calculations for the first three kernels above are detailed in South et al (2020).
Leah F. South
South, L. F., Karvonen, T., Nemeth, C., Girolami, M. and Oates, C. J. (2020). Semi-Exact Control Functionals From Sard's Method. https://arxiv.org/abs/2002.00033
See ZVCV for examples and related functions. See aSECF_crossval
for a function to choose between different kernels for this estimator.
This function performs control functionals as described in Oates et al (2017).
To choose between different kernels using cross-validation, use CF_crossval
.
CF( integrands, samples, derivatives, steinOrder = NULL, kernel_function = NULL, sigma = NULL, K0 = NULL, est_inds = NULL, one_in_denom = FALSE, diagnostics = FALSE )
CF( integrands, samples, derivatives, steinOrder = NULL, kernel_function = NULL, sigma = NULL, K0 = NULL, est_inds = NULL, one_in_denom = FALSE, diagnostics = FALSE )
integrands |
An |
samples |
An |
derivatives |
An |
steinOrder |
(optional) This is the order of the Stein operator. The default is |
kernel_function |
(optional) Choose between "gaussian", "matern", "RQ", "product" or "prodsim". See below for further details. |
sigma |
(optional) The tuning parameters of the specified kernel. This involves a single length-scale parameter in "gaussian" and "RQ", a length-scale and a smoothness parameter in "matern" and two parameters in "product" and "prodsim". See below for further details. |
K0 |
(optional) The kernel matrix. One can specify either this or all of |
est_inds |
(optional) A vector of indices for the estimation-only samples. The default when |
one_in_denom |
(optional) Whether or not to include a |
diagnostics |
(optional) A flag for whether to return the necessary outputs for plotting or estimating using the fitted model. The default is |
A list with the following elements:
expectation
: The estimate(s) of the () expectation(s).
f_true
: (Only if est_inds
is not NULL
) The integrands for the evaluation set. This should be the same as integrands[setdiff(1:N,est_inds),].
f_hat
: (Only if est_inds
is not NULL
) The fitted values for the integrands in the evaluation set. This can be used to help assess the performance of the Gaussian process model.
a
: (Only if diagnostics
= TRUE
) The value of as described in South et al (2020), where predictions are of the form
for heldout K0 and estimators using heldout samples are of the form
.
b
: (Only if diagnostics
= TRUE
) The value of as described in South et al (2020), where predictions are of the form
for heldout K0 and estimators using heldout samples are of the form
.
ksd
: (Only if diagnostics
= TRUE
) An estimated kernel Stein discrepancy based on the fitted model that can be used for diagnostic purposes. See South et al (2020) for further details.
bound_const
: (Only if diagnostics
= TRUE
and est_inds
=NULL
) This is such that the absolute error for the estimator should be less than .
Solving the linear system in CF has complexity and is therefore not suited to large
. Using
will instead have an
cost in solving the linear system and an
cost in handling the remaining samples, where
is the length of
. This can be much cheaper for large
.
, the kernel and the Stein orderThe kernel in Stein-based kernel methods is where
is a first or second order Stein operator in
and
is some generic kernel to be specified.
The Stein operators for distribution are defined as:
steinOrder=1
: (see e.g. Oates el al (2017))
steinOrder=2
: (see e.g. South el al (2020))
Here is the first order derivative wrt
and
is the Laplacian operator.
The generic kernels which are implemented in this package are listed below. Note that the input parameter sigma
defines the kernel parameters .
"gaussian"
: A Gaussian kernel,
"matern"
: A Matern kernel with ,
where ,
and
is the modified Bessel function of the second kind. Note that
is the length-scale parameter and
is the smoothness parameter (which defaults to 2.5 for
and 4.5 for
).
"RQ"
: A rational quadratic kernel,
"product"
: The product kernel that appears in Oates et al (2017) with
"prodsim"
: A slightly different product kernel with (see e.g. https://www.imperial.ac.uk/inference-group/projects/monte-carlo-methods/control-functionals/),
In the above equations, and
. For the last two kernels, the code only has implementations for
steinOrder
=1
. Each combination of steinOrder
and kernel_function
above is currently hard-coded but it may be possible to extend this to other kernels in future versions using autodiff. The calculations for the first three kernels above are detailed in South et al (2020).
Leah F. South
Oates, C. J., Girolami, M. & Chopin, N. (2017). Control functionals for Monte Carlo integration. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 79(3), 695-718.
South, L. F., Karvonen, T., Nemeth, C., Girolami, M. and Oates, C. J. (2020). Semi-Exact Control Functionals From Sard's Method. https://arxiv.org/abs/2002.00033
See ZVCV for examples and related functions. See CF_crossval
for a function to choose between different kernels for this estimator.
This function chooses between a list of kernel tuning parameters (sigma_list
) or a list of K0 matrices (K0_list
) for
the control functionals method described in Oates et al (2017). The latter requires
calculating and storing kernel matrices using K0_fn
but it is more flexible
because it can be used to choose the Stein operator order and the kernel function, in addition
to its parameters. It is also faster to pre-specify K0_fn
.
For estimation with fixed kernel parameters, use CF
.
CF_crossval( integrands, samples, derivatives, steinOrder = NULL, kernel_function = NULL, sigma_list = NULL, K0_list = NULL, est_inds = NULL, log_weights = NULL, one_in_denom = FALSE, folds = NULL, diagnostics = FALSE )
CF_crossval( integrands, samples, derivatives, steinOrder = NULL, kernel_function = NULL, sigma_list = NULL, K0_list = NULL, est_inds = NULL, log_weights = NULL, one_in_denom = FALSE, folds = NULL, diagnostics = FALSE )
integrands |
An |
samples |
An |
derivatives |
An |
steinOrder |
(optional) This is the order of the Stein operator. The default is |
kernel_function |
(optional) Choose between "gaussian", "matern", "RQ", "product" or "prodsim". See below for further details. |
sigma_list |
(optional between this and |
K0_list |
(optional between this and |
est_inds |
(optional) A vector of indices for the estimation-only samples. The default when |
log_weights |
(optional) A vector of length |
one_in_denom |
(optional) Whether or not to include a |
folds |
(optional) The number of folds for cross-validation. The default is five. |
diagnostics |
(optional) A flag for whether to return the necessary outputs for plotting or estimating using the fitted model. The default is |
A list with the following elements:
expectation
: The estimate(s) of the () expectation(s).
mse
: A matrix of the cross-validation mean square prediction errors. The number of columns is the number of tuning options given and the number of rows is , the number of integrands of interest.
optinds
: The optimal indices from the list for each expectation.
f_true
: (Only if est_inds
is not NULL
) The integrands for the evaluation set. This should be the same as integrands[setdiff(1:N,est_inds),].
f_hat
: (Only if est_inds
is not NULL
) The fitted values for the integrands in the evaluation set. This can be used to help assess the performance of the Gaussian process model.
a
: (Only if diagnostics
= TRUE
) The value of as described in South et al (2020), where predictions are of the form
for heldout K0 and estimators using heldout samples are of the form
.
b
: (Only if diagnostics
= TRUE
) The value of as described in South et al (2020), where predictions are of the form
for heldout K0 and estimators using heldout samples are of the form
.
ksd
: (Only if diagnostics
= TRUE
) An estimated kernel Stein discrepancy based on the fitted model that can be used for diagnostic purposes. See South et al (2020) for further details.
bound_const
: (Only if diagnostics
= TRUE
and est_inds
=NULL
) This is such that the absolute error for the estimator should be less than .
Solving the linear system in CF has complexity and is therefore not suited to large
. Using
will instead have an
cost in solving the linear system and an
cost in handling the remaining samples, where
is the length of
. This can be much cheaper for large
.
, the kernel and the Stein orderThe kernel in Stein-based kernel methods is where
is a first or second order Stein operator in
and
is some generic kernel to be specified.
The Stein operators for distribution are defined as:
steinOrder=1
: (see e.g. Oates el al (2017))
steinOrder=2
: (see e.g. South el al (2020))
Here is the first order derivative wrt
and
is the Laplacian operator.
The generic kernels which are implemented in this package are listed below. Note that the input parameter sigma
defines the kernel parameters .
"gaussian"
: A Gaussian kernel,
"matern"
: A Matern kernel with ,
where ,
and
is the modified Bessel function of the second kind. Note that
is the length-scale parameter and
is the smoothness parameter (which defaults to 2.5 for
and 4.5 for
).
"RQ"
: A rational quadratic kernel,
"product"
: The product kernel that appears in Oates et al (2017) with
"prodsim"
: A slightly different product kernel with (see e.g. https://www.imperial.ac.uk/inference-group/projects/monte-carlo-methods/control-functionals/),
In the above equations, and
. For the last two kernels, the code only has implementations for
steinOrder
=1
. Each combination of steinOrder
and kernel_function
above is currently hard-coded but it may be possible to extend this to other kernels in future versions using autodiff. The calculations for the first three kernels above are detailed in South et al (2020).
Leah F. South
Oates, C. J., Girolami, M. & Chopin, N. (2017). Control functionals for Monte Carlo integration. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 79(3), 695-718.
South, L. F., Karvonen, T., Nemeth, C., Girolami, M. and Oates, C. J. (2020). Semi-Exact Control Functionals From Sard's Method. https://arxiv.org/abs/2002.00033
See ZVCV for examples and related functions. See CF
for a function to perform control functionals with fixed kernel specifications.
The functions evidence_CTI
and evidence_CTI_CF
can be used to improve upon the thermodynamic integration (TI) estimate of the normalising constant with ZV-CV and CF, respectively. The functions evidence_SMC
and evidence_SMC_CF
do the same thing for the sequential Monte Carlo (SMC) normalising constant identity.
evidence_CTI( samples, loglike, der_loglike, der_logprior, temperatures, temperatures_all, most_recent, est_inds, options, folds = 5 ) evidence_CTI_CF( samples, loglike, der_loglike, der_logprior, temperatures, temperatures_all, most_recent, est_inds, steinOrder, kernel_function, sigma_list, folds = 5 ) evidence_SMC( samples, loglike, der_loglike, der_logprior, temperatures, temperatures_all, most_recent, est_inds, options, folds = 5 ) evidence_SMC_CF( samples, loglike, der_loglike, der_logprior, temperatures, temperatures_all, most_recent, est_inds, steinOrder, kernel_function, sigma_list, folds = 5 )
evidence_CTI( samples, loglike, der_loglike, der_logprior, temperatures, temperatures_all, most_recent, est_inds, options, folds = 5 ) evidence_CTI_CF( samples, loglike, der_loglike, der_logprior, temperatures, temperatures_all, most_recent, est_inds, steinOrder, kernel_function, sigma_list, folds = 5 ) evidence_SMC( samples, loglike, der_loglike, der_logprior, temperatures, temperatures_all, most_recent, est_inds, options, folds = 5 ) evidence_SMC_CF( samples, loglike, der_loglike, der_logprior, temperatures, temperatures_all, most_recent, est_inds, steinOrder, kernel_function, sigma_list, folds = 5 )
samples |
An |
loglike |
An |
der_loglike |
An |
der_logprior |
An |
temperatures |
A vector of length |
temperatures_all |
An adjusted vector of length |
most_recent |
A vector of length |
est_inds |
(optional) A vector of indices for the estimation-only samples. The default when |
options |
A list of control variate specifications for ZV-CV. This can be a single list containing the elements below (the defaults are used for elements which are not specified). Alternatively, it can be a list of lists containing any or all of the elements below. Where the latter is used, the function |
folds |
The number of folds used in k-fold cross-validation for selecting the optimal control variate. For ZV-CV, this may include selection of the optimal polynomial order, regression type and subset of parameters depending on |
steinOrder |
(optional) This is the order of the Stein operator. The default is |
kernel_function |
(optional) Choose between "gaussian", "matern", "RQ", "product" or "prodsim". See below for further details. |
sigma_list |
(optional between this and |
The function evidence_CTI
returns a list, containing the following components:
log_evidence_PS1
: The 1st order quadrature estimate for the log normalising constant
log_evidence_PS2
: The 2nd order quadrature estimate for the log normalising constant
regression_LL
: The set of
zvcv
type returns for the 1st order quadrature expectations
regression_vLL
: The set of
zvcv
type returns for the 2nd order quadrature expectations
The function evidence_CTI_CF
returns a list, containing the following components:
log_evidence_PS1
: The 1st order quadrature estimate for the log normalising constant
log_evidence_PS2
: The 2nd order quadrature estimate for the log normalising constant
regression_LL
: The set of
CF_crossval
type returns for the 1st order quadrature expectations
regression_vLL
: The set of
CF_crossval
type returns for the 2nd order quadrature expectations
selected_LL_CF
: The set of selected tuning parameters from
sigma_list
for the 1st order quadrature expectations.
selected_vLL_CF
: The set of selected tuning parameters from
sigma_list
for the 2nd order quadrature expectations.
The function evidence_SMC
returns a list, containing the following components:
log_evidence
: The logged SMC estimate for the normalising constant
regression_SMC
: The set of
zvcv
type returns for the expectations
The function evidence_SMC_CF
returns a list, containing the following components:
log_evidence
: The logged SMC estimate for the normalising constant
regression_SMC
: The set of
CF_crossval
type returns for the expectations
selected_CF
: The set of selected tuning parameters from
sigma_list
for the expectations
, the kernel and the Stein orderThe kernel in Stein-based kernel methods is where
is a first or second order Stein operator in
and
is some generic kernel to be specified.
The Stein operators for distribution are defined as:
steinOrder=1
: (see e.g. Oates el al (2017))
steinOrder=2
: (see e.g. South el al (2020))
Here is the first order derivative wrt
and
is the Laplacian operator.
The generic kernels which are implemented in this package are listed below. Note that the input parameter sigma
defines the kernel parameters .
"gaussian"
: A Gaussian kernel,
"matern"
: A Matern kernel with ,
where ,
and
is the modified Bessel function of the second kind. Note that
is the length-scale parameter and
is the smoothness parameter (which defaults to 2.5 for
and 4.5 for
).
"RQ"
: A rational quadratic kernel,
"product"
: The product kernel that appears in Oates et al (2017) with
"prodsim"
: A slightly different product kernel with (see e.g. https://www.imperial.ac.uk/inference-group/projects/monte-carlo-methods/control-functionals/),
In the above equations, and
. For the last two kernels, the code only has implementations for
steinOrder
=1
. Each combination of steinOrder
and kernel_function
above is currently hard-coded but it may be possible to extend this to other kernels in future versions using autodiff. The calculations for the first three kernels above are detailed in South et al (2020).
Leah F. South
Mira, A., Solgi, R., & Imparato, D. (2013). Zero variance Markov chain Monte Carlo for Bayesian estimators. Statistics and Computing, 23(5), 653-662.
South, L. F., Oates, C. J., Mira, A., & Drovandi, C. (2019). Regularised zero variance control variates for high-dimensional variance reduction. https://arxiv.org/abs/1811.05073
See an example at VDP
and see ZVCV for more package details. See Expand_Temperatures
for a function that can be used to find stricter (or less stricter) temperature schedules based on the conditional effective sample size.
This function is used to adjust the temperature schedule so that it is more (or less) strict than the original.
Expand_Temperatures( temperatures, loglike, rho, bisec_tol = .Machine$double.eps^0.25 )
Expand_Temperatures( temperatures, loglike, rho, bisec_tol = .Machine$double.eps^0.25 )
temperatures |
A vector of length |
loglike |
An |
rho |
The tolerance for the new temperatures. Temperatures are selected so that the conditional effective sample size (CESS) at each temperature is |
bisec_tol |
The tolerance for the bisection method used in selecting temperatures. The default is |
A list is returned, containing the following components:
temperatures_all
: The new set of temperatures of length .
relevant_samples
: A vector of length containing indices to show which particle sets the new temperatures are based on.
logw
: An by
matrix of log normalised weights of the particles
Leah F. South
South, L. F., Oates, C. J., Mira, A., & Drovandi, C. (2019). Regularised zero variance control variates for high-dimensional variance reduction. https://arxiv.org/abs/1811.05073
See evidence
for functions to estimate the evidence, VDP
for an example and ZVCV for more package details.
The function getX
is used to get the matrix of covariates for the regression based on a specified polynomial order.
getX(samples, derivatives, polyorder)
getX(samples, derivatives, polyorder)
samples |
An |
derivatives |
An |
polyorder |
The order of the polynomial. |
The design matrix for the regression (except for the column of 1's for the intercept).
Phi_fn
for a very similar function for use in semi-exact control functionals. The function Phi_fn
essentially gets the same matrix but with a column of ones added.
This function calculates the full matrix, which is a first or second order Stein operator applied to
a standard kernel.
The output of this function can be used as an argument to
CF
, CF_crossval
,
SECF
, SECF_crossval
, aSECF
and aSECF_crossval
.
The kernel matrix is automatically computed in all of the above methods, but it is faster to calculate
in advance when using more than one of the above functions and when using any of the crossval functions.
K0_fn( samples, derivatives, sigma, steinOrder, kernel_function, Z = NULL, nystrom_inds = NULL )
K0_fn( samples, derivatives, sigma, steinOrder, kernel_function, Z = NULL, nystrom_inds = NULL )
samples |
An |
derivatives |
An |
sigma |
The tuning parameters of the specified kernel. This involves a single length-scale parameter in "gaussian" and "RQ", a length-scale and a smoothness parameter in "matern" and two parameters in "product" and "prodsim". See below for further details. |
steinOrder |
This is the order of the Stein operator. The default is |
kernel_function |
Choose between "gaussian", "matern", "RQ", "product" or "prodsim". See below for further details. |
Z |
(optional) An |
nystrom_inds |
(optional) The sample indices to be used in the Nystrom approximation (for when using aSECF). |
An by
kernel matrix (or
by
where
is the length of
nystrom_inds
).
, the kernel and the Stein orderThe kernel in Stein-based kernel methods is where
is a first or second order Stein operator in
and
is some generic kernel to be specified.
The Stein operators for distribution are defined as:
steinOrder=1
: (see e.g. Oates el al (2017))
steinOrder=2
: (see e.g. South el al (2020))
Here is the first order derivative wrt
and
is the Laplacian operator.
The generic kernels which are implemented in this package are listed below. Note that the input parameter sigma
defines the kernel parameters .
"gaussian"
: A Gaussian kernel,
"matern"
: A Matern kernel with ,
where ,
and
is the modified Bessel function of the second kind. Note that
is the length-scale parameter and
is the smoothness parameter (which defaults to 2.5 for
and 4.5 for
).
"RQ"
: A rational quadratic kernel,
"product"
: The product kernel that appears in Oates et al (2017) with
"prodsim"
: A slightly different product kernel with (see e.g. https://www.imperial.ac.uk/inference-group/projects/monte-carlo-methods/control-functionals/),
In the above equations, and
. For the last two kernels, the code only has implementations for
steinOrder
=1
. Each combination of steinOrder
and kernel_function
above is currently hard-coded but it may be possible to extend this to other kernels in future versions using autodiff. The calculations for the first three kernels above are detailed in South et al (2020).
Leah F. South
Oates, C. J., Girolami, M. & Chopin, N. (2017). Control functionals for Monte Carlo integration. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 79(3), 695-718.
South, L. F., Karvonen, T., Nemeth, C., Girolami, M. and Oates, C. J. (2020). Semi-Exact Control Functionals From Sard's Method. https://arxiv.org/abs/2002.00033
The function logsumexp
is used for stable computation of log(sum(exp(x))), which is useful when summing weights for example.
logsumexp(x)
logsumexp(x)
x |
The values for which you want to compute log(sum(exp(x))) |
The stable result of log(sum(exp(x)))
See ZVCV for more package details.
This function calculates the median heuristic for use in e.g. the Gaussian, Matern and rational quadratic kernels.
medianTune(samples, Z = NULL)
medianTune(samples, Z = NULL)
samples |
An |
Z |
(optional) An NxN matrix of square norms, which can be calculated
using |
The median heuristic, which can then be used as the length-scale parameter in the Gaussian, Matern and rational quadratic kernels
Leah F. South
Garreau, D., Jitkrittum, W. and Kanagawa, M. (2017). Large sample analysis of the median heuristic. https://arxiv.org/abs/1707.07269
See medianTune
and K0_fn
for functions which use this.
This function finds the nearest symmetric positive definite matrix to the given matrix. It is used throughout the package to handle numerical issues in matrix inverses and cholesky decompositions.
nearPD(K0)
nearPD(K0)
K0 |
A square matrix |
The closest symmetric positive definite matrix to K0.
Adapted from Matlab code by John D'Errico
Higham, N. J. (1988). Computing a nearest symmetric positive semidefinite matrix. Linear Algebra and its Applications, 103, 103-118.
D'Errico, J. (2013). nearestSPD Matlab function. https://uk.mathworks.com/matlabcentral/fileexchange/42885-nearestspd.
This function calculates the matrix, which is a second order Stein operator applied
to a polynomial. See South et al (2020) for further details. This function is not required for
estimation but may be useful when evaluation samples are not initially available since
estimators using heldout samples are of the form
where
for heldout K0 and Phi matrices.
Phi_fn(samples, derivatives, polyorder = NULL, apriori = NULL)
Phi_fn(samples, derivatives, polyorder = NULL, apriori = NULL)
samples |
An |
derivatives |
An |
polyorder |
(optional) The order of the polynomial to be used in the parametric component, with a default of |
apriori |
(optional) A vector containing the subset of parameter indices to use in the polynomial. Typically this argument would only be used if the dimension of the problem is very large or if prior information about parameter dependencies is known. The default is to use all parameters |
An by
matrix (where Q is determined by the polynomial order and the apriori).
Leah F. South
South, L. F., Karvonen, T., Nemeth, C., Girolami, M. and Oates, C. J. (2020). Semi-Exact Control Functionals From Sard's Method. https://arxiv.org/abs/2002.00033
This function performs semi-exact control functionals as described in South et al (2020).
To choose between different kernels using cross-validation, use SECF_crossval
.
SECF( integrands, samples, derivatives, polyorder = NULL, steinOrder = NULL, kernel_function = NULL, sigma = NULL, K0 = NULL, est_inds = NULL, apriori = NULL, diagnostics = FALSE )
SECF( integrands, samples, derivatives, polyorder = NULL, steinOrder = NULL, kernel_function = NULL, sigma = NULL, K0 = NULL, est_inds = NULL, apriori = NULL, diagnostics = FALSE )
integrands |
An |
samples |
An |
derivatives |
An |
polyorder |
(optional) The order of the polynomial to be used in the parametric component, with a default of |
steinOrder |
(optional) This is the order of the Stein operator. The default is |
kernel_function |
(optional) Choose between "gaussian", "matern", "RQ", "product" or "prodsim". See below for further details. |
sigma |
(optional) The tuning parameters of the specified kernel. This involves a single length-scale parameter in "gaussian" and "RQ", a length-scale and a smoothness parameter in "matern" and two parameters in "product" and "prodsim". See below for further details. |
K0 |
(optional) The kernel matrix. One can specify either this or all of |
est_inds |
(optional) A vector of indices for the estimation-only samples. The default when |
apriori |
(optional) A vector containing the subset of parameter indices to use in the polynomial. Typically this argument would only be used if the dimension of the problem is very large or if prior information about parameter dependencies is known. The default is to use all parameters |
diagnostics |
(optional) A flag for whether to return the necessary outputs for plotting or estimating using the fitted model. The default is |
A list with the following elements:
expectation
: The estimate(s) of the () expectation(s).
f_true
: (Only if est_inds
is not NULL
) The integrands for the evaluation set. This should be the same as integrands[setdiff(1:N,est_inds),].
f_hat
: (Only if est_inds
is not NULL
) The fitted values for the integrands in the evaluation set. This can be used to help assess the performance of the Gaussian process model.
a
: (Only if diagnostics
= TRUE
) The value of as described in South et al (2020), where predictions are of the form
for heldout K0 and Phi matrices and estimators using heldout samples are of the form
.
b
: (Only if diagnostics
= TRUE
) The value of as described in South et al (2020), where predictions are of the form
for heldout K0 and Phi matrices and estimators using heldout samples are of the form
.
ksd
: (Only if diagnostics
= TRUE
) An estimated kernel Stein discrepancy based on the fitted model that can be used for diagnostic purposes. See South et al (2020) for further details.
bound_const
: (Only if diagnostics
= TRUE
and est_inds
=NULL
) This is such that the absolute error for the estimator should be less than .
Solving the linear system in SECF has complexity where
is the sample size and
is the number of terms in the polynomial.
Standard SECF is therefore not suited to large
. The method aSECF is designed for larger
and details can be found at
aSECF
and in South et al (2020).
An alternative would be to use which has
complexity in solving the linear system and
complexity in
handling the remaining samples, where
is the length of
. This can be much cheaper for small
but the estimation of the
Gaussian process model is only done using
samples and the evaluation of the integral only uses
samples.
, the kernel and the Stein orderThe kernel in Stein-based kernel methods is where
is a first or second order Stein operator in
and
is some generic kernel to be specified.
The Stein operators for distribution are defined as:
steinOrder=1
: (see e.g. Oates el al (2017))
steinOrder=2
: (see e.g. South el al (2020))
Here is the first order derivative wrt
and
is the Laplacian operator.
The generic kernels which are implemented in this package are listed below. Note that the input parameter sigma
defines the kernel parameters .
"gaussian"
: A Gaussian kernel,
"matern"
: A Matern kernel with ,
where ,
and
is the modified Bessel function of the second kind. Note that
is the length-scale parameter and
is the smoothness parameter (which defaults to 2.5 for
and 4.5 for
).
"RQ"
: A rational quadratic kernel,
"product"
: The product kernel that appears in Oates et al (2017) with
"prodsim"
: A slightly different product kernel with (see e.g. https://www.imperial.ac.uk/inference-group/projects/monte-carlo-methods/control-functionals/),
In the above equations, and
. For the last two kernels, the code only has implementations for
steinOrder
=1
. Each combination of steinOrder
and kernel_function
above is currently hard-coded but it may be possible to extend this to other kernels in future versions using autodiff. The calculations for the first three kernels above are detailed in South et al (2020).
Leah F. South
Oates, C. J., Girolami, M. & Chopin, N. (2017). Control functionals for Monte Carlo integration. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 79(3), 695-718.
South, L. F., Karvonen, T., Nemeth, C., Girolami, M. and Oates, C. J. (2020). Semi-Exact Control Functionals From Sard's Method. https://arxiv.org/abs/2002.00033
See ZVCV for examples and related functions. See SECF_crossval
for a function to choose between different kernels for this estimator.
This function chooses between a list of kernel tuning parameters (sigma_list
) or a list of K0 matrices (K0_list
) for
the semi-exact control functionals method described in South et al (2020). The latter requires
calculating and storing kernel matrices using K0_fn
but it is more flexible
because it can be used to choose the Stein operator order and the kernel function, in addition
to its parameters. It is also faster to pre-specify K0_fn
.
For estimation with fixed kernel parameters, use SECF
.
SECF_crossval( integrands, samples, derivatives, polyorder = NULL, steinOrder = NULL, kernel_function = NULL, sigma_list = NULL, K0_list = NULL, est_inds = NULL, apriori = NULL, folds = NULL, diagnostics = FALSE )
SECF_crossval( integrands, samples, derivatives, polyorder = NULL, steinOrder = NULL, kernel_function = NULL, sigma_list = NULL, K0_list = NULL, est_inds = NULL, apriori = NULL, folds = NULL, diagnostics = FALSE )
integrands |
An |
samples |
An |
derivatives |
An |
polyorder |
(optional) The order of the polynomial to be used in the parametric component, with a default of |
steinOrder |
(optional) This is the order of the Stein operator. The default is |
kernel_function |
(optional) Choose between "gaussian", "matern", "RQ", "product" or "prodsim". See below for further details. |
sigma_list |
(optional between this and |
K0_list |
(optional between this and |
est_inds |
(optional) A vector of indices for the estimation-only samples. The default when |
apriori |
(optional) A vector containing the subset of parameter indices to use in the polynomial. Typically this argument would only be used if the dimension of the problem is very large or if prior information about parameter dependencies is known. The default is to use all parameters |
folds |
(optional) The number of folds for cross-validation. The default is five. |
diagnostics |
(optional) A flag for whether to return the necessary outputs for plotting or estimating using the fitted model. The default is |
A list with the following elements:
expectation
: The estimate(s) of the () expectation(s).
mse
: A matrix of the cross-validation mean square prediction errors. The number of columns is the number of tuning options given and the number of rows is , the number of integrands of interest.
optinds
: The optimal indices from the list for each expectation.
f_true
: (Only if est_inds
is not NULL
) The integrands for the evaluation set. This should be the same as integrands[setdiff(1:N,est_inds),].
f_hat
: (Only if est_inds
is not NULL
) The fitted values for the integrands in the evaluation set. This can be used to help assess the performance of the Gaussian process model.
a
: (Only if diagnostics
= TRUE
) The value of as described in South et al (2020), where predictions are of the form
for heldout K0 and Phi matrices and estimators using heldout samples are of the form
.
b
: (Only if diagnostics
= TRUE
) The value of as described in South et al (2020), where predictions are of the form
for heldout K0 and Phi matrices and estimators using heldout samples are of the form
.
ksd
: (Only if diagnostics
= TRUE
) An estimated kernel Stein discrepancy based on the fitted model that can be used for diagnostic purposes. See South et al (2020) for further details.
bound_const
: (Only if diagnostics
= TRUE
and est_inds
=NULL
) This is such that the absolute error for the estimator should be less than .
Solving the linear system in SECF has complexity where
is the sample size and
is the number of terms in the polynomial.
Standard SECF is therefore not suited to large
. The method aSECF is designed for larger
and details can be found at
aSECF
and in South et al (2020).
An alternative would be to use which has
complexity in solving the linear system and
complexity in
handling the remaining samples, where
is the length of
. This can be much cheaper for large
but the estimation of the
Gaussian process model is only done using
samples and the evaluation of the integral only uses
samples.
, the kernel and the Stein orderThe kernel in Stein-based kernel methods is where
is a first or second order Stein operator in
and
is some generic kernel to be specified.
The Stein operators for distribution are defined as:
steinOrder=1
: (see e.g. Oates el al (2017))
steinOrder=2
: (see e.g. South el al (2020))
Here is the first order derivative wrt
and
is the Laplacian operator.
The generic kernels which are implemented in this package are listed below. Note that the input parameter sigma
defines the kernel parameters .
"gaussian"
: A Gaussian kernel,
"matern"
: A Matern kernel with ,
where ,
and
is the modified Bessel function of the second kind. Note that
is the length-scale parameter and
is the smoothness parameter (which defaults to 2.5 for
and 4.5 for
).
"RQ"
: A rational quadratic kernel,
"product"
: The product kernel that appears in Oates et al (2017) with
"prodsim"
: A slightly different product kernel with (see e.g. https://www.imperial.ac.uk/inference-group/projects/monte-carlo-methods/control-functionals/),
In the above equations, and
. For the last two kernels, the code only has implementations for
steinOrder
=1
. Each combination of steinOrder
and kernel_function
above is currently hard-coded but it may be possible to extend this to other kernels in future versions using autodiff. The calculations for the first three kernels above are detailed in South et al (2020).
Leah F. South
Oates, C. J., Girolami, M. & Chopin, N. (2017). Control functionals for Monte Carlo integration. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 79(3), 695-718.
South, L. F., Karvonen, T., Nemeth, C., Girolami, M. and Oates, C. J. (2020). Semi-Exact Control Functionals From Sard's Method. https://arxiv.org/abs/2002.00033
See ZVCV for examples and related functions. See SECF
for a function to perform semi-exact control functionals with fixed kernel specifications.
This function gets the matrix of square norms which is needed for all kernels. Calculating this can help to save time if you are also interested in calculating the median heuristic, handling multiple tuning parameters or trying other kernels.
squareNorm(samples, nystrom_inds = NULL)
squareNorm(samples, nystrom_inds = NULL)
samples |
An |
nystrom_inds |
The (optional) sample indices to be used in the Nystrom approximation (for when using aSECF). |
An by
matrix of squared norms between samples (or
by
where
is the length of
nystrom_inds
).
Leah F. South
See medianTune
and K0_fn
for functions which use this.
This example illustrates how ZV-CV can be used for post-processing of results from likelihood-annealing SMC. In particular, we use ZV-CV to estimate posterior expectations and the evidence for a single SMC run of this example based on the Van der Pol oscillatory differential equations (Van der Pol, 1926). Further details about this example and applications to ZV-CV can be found in Oates et al. (2017) and South et al. (2019).
Given that the focus of this R package is on ZV-CV, we assume that samples have already been obtained from SMC and put into the correct format. One could use the R package RcppSMC
or implement their own sampler in order to obtain results like this. The key is to make sure the derivatives of the log likelihood and log prior are stored, along with the inverse temperatures.
data(VDP)
data(VDP)
A list containing the following :
The size of the SMC population
The tolerance for the new temperatures, which are selected so that the CESS at each temperature is where
is the population size.
A vector of length of inverse power posterior temperatures
An by
by
matrix of samples from the
power posteriors, where
is the dimension of the target distribution. The samples are transformed to be on the log scale and all derivatives are with respect to log samples.
An by
matrix of log likelihood values corresponding to
samples
An by
matrix of log prior values corresponding to
samples
An by
by
matrix of the derivatives of the log likelihood with respect to the parameters, with parameter values corresponding to
samples
An by
by
matrix of the derivatives of the log prior with respect to the parameters, with parameter values corresponding to
samples
Oates, C. J., Girolami, M. & Chopin, N. (2017). Control functionals for Monte Carlo integration. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 79(3), 695-718.
South, L. F., Oates, C. J., Mira, A., & Drovandi, C. (2019). Regularised zero-variance control variates for high-dimensional variance reduction.
Van der Pol, B. (1926). On relaxation-oscillations. The London, Edinburgh and Dublin Philosophical Magazine and Journal of Science, 2(11), 978-992.
See ZVCV for more package details.
set.seed(1) # Load the SMC results data(VDP) # Set up the list of control variates to choose from options <- list() # Vanilla Monte Carlo options[[1]] <- list(polyorder = 0) # Standard ZV-CV with polynomial order selected through cross-validation options[[2]] <- list(polyorder = Inf, regul_reg = FALSE) ############################## # Posterior expectation - The true expectation is 0.9852 to 4 decimal places ############################## # Note the exp() because samples and derivatives were stored on the log scale # but we are interested in the expectation on the original scale posterior <- zvcv(exp(VDP$samples[,,8]), VDP$samples[,,8], VDP$der_loglike[,,8] + VDP$der_logprior[,,8], options = options) posterior$expectation # The posterior expectation estimate posterior$polyorder # The selected polynomial order ############################## # Evidence estimation - The true logged evidence is 10.36 to 2 decimal places ############################## # Getting additional temperatures based on maintaing a CESS of 0.91N rather than 0.9N. # The value 0.91 is used for speed but South et al. (2019) use 0.99. temp <- Expand_Temperatures(VDP$temperatures, VDP$loglike, 0.91) VDP$temperatures_new <- temp$temperatures_all # the new temperatures VDP$most_recent <- temp$relevant_samples # the samples associated with the new temperatures n_sigma <- 3 # For speed, South et al. (2019) uses 15 sigma_list <- as.list( 10^(0.5*seq(-3,4,length.out=n_sigma)) ) # Evidence estimation using the SMC identity Z_SMC <- evidence_SMC(VDP$samples, VDP$loglike, VDP$der_loglike, VDP$der_logprior, VDP$temperatures, VDP$temperatures_new, VDP$most_recent, options = options) Z_SMC$log_evidence # Evidence estimation using the SMC identity Z_SMC_CF <- evidence_SMC_CF(VDP$samples, VDP$loglike, VDP$der_loglike, VDP$der_logprior, VDP$temperatures, VDP$temperatures_new, VDP$most_recent, steinOrder = 2, kernel_function = "gaussian", sigma_list = sigma_list, folds = 2) Z_SMC_CF$log_evidence # Evidence estimation using the CTI identity Z_CTI <- evidence_CTI(VDP$samples, VDP$loglike, VDP$der_loglike, VDP$der_logprior, VDP$temperatures, VDP$temperatures_new, VDP$most_recent, options = options) Z_CTI$log_evidence_PS2 # Evidence estimation using the CTI identity Z_CTI_CF <- evidence_CTI_CF(VDP$samples, VDP$loglike, VDP$der_loglike, VDP$der_logprior, VDP$temperatures, VDP$temperatures_new, VDP$most_recent, steinOrder = 2, kernel_function = "gaussian", sigma_list = sigma_list, folds = 2) Z_CTI_CF$log_evidence_PS2
set.seed(1) # Load the SMC results data(VDP) # Set up the list of control variates to choose from options <- list() # Vanilla Monte Carlo options[[1]] <- list(polyorder = 0) # Standard ZV-CV with polynomial order selected through cross-validation options[[2]] <- list(polyorder = Inf, regul_reg = FALSE) ############################## # Posterior expectation - The true expectation is 0.9852 to 4 decimal places ############################## # Note the exp() because samples and derivatives were stored on the log scale # but we are interested in the expectation on the original scale posterior <- zvcv(exp(VDP$samples[,,8]), VDP$samples[,,8], VDP$der_loglike[,,8] + VDP$der_logprior[,,8], options = options) posterior$expectation # The posterior expectation estimate posterior$polyorder # The selected polynomial order ############################## # Evidence estimation - The true logged evidence is 10.36 to 2 decimal places ############################## # Getting additional temperatures based on maintaing a CESS of 0.91N rather than 0.9N. # The value 0.91 is used for speed but South et al. (2019) use 0.99. temp <- Expand_Temperatures(VDP$temperatures, VDP$loglike, 0.91) VDP$temperatures_new <- temp$temperatures_all # the new temperatures VDP$most_recent <- temp$relevant_samples # the samples associated with the new temperatures n_sigma <- 3 # For speed, South et al. (2019) uses 15 sigma_list <- as.list( 10^(0.5*seq(-3,4,length.out=n_sigma)) ) # Evidence estimation using the SMC identity Z_SMC <- evidence_SMC(VDP$samples, VDP$loglike, VDP$der_loglike, VDP$der_logprior, VDP$temperatures, VDP$temperatures_new, VDP$most_recent, options = options) Z_SMC$log_evidence # Evidence estimation using the SMC identity Z_SMC_CF <- evidence_SMC_CF(VDP$samples, VDP$loglike, VDP$der_loglike, VDP$der_logprior, VDP$temperatures, VDP$temperatures_new, VDP$most_recent, steinOrder = 2, kernel_function = "gaussian", sigma_list = sigma_list, folds = 2) Z_SMC_CF$log_evidence # Evidence estimation using the CTI identity Z_CTI <- evidence_CTI(VDP$samples, VDP$loglike, VDP$der_loglike, VDP$der_logprior, VDP$temperatures, VDP$temperatures_new, VDP$most_recent, options = options) Z_CTI$log_evidence_PS2 # Evidence estimation using the CTI identity Z_CTI_CF <- evidence_CTI_CF(VDP$samples, VDP$loglike, VDP$der_loglike, VDP$der_logprior, VDP$temperatures, VDP$temperatures_new, VDP$most_recent, steinOrder = 2, kernel_function = "gaussian", sigma_list = sigma_list, folds = 2) Z_CTI_CF$log_evidence_PS2
The function zvcv
is used to perform (regularised) ZV-CV given a set of samples, derivatives and function evaluations.
zvcv( integrand, samples, derivatives, log_weights, integrand_logged = FALSE, est_inds, options = list(polyorder = 2, regul_reg = TRUE, alpha_elnet = 1, nfolds = 10, apriori = (1:NCOL(samples)), intercept = TRUE, polyorder_max = Inf), folds = 5 )
zvcv( integrand, samples, derivatives, log_weights, integrand_logged = FALSE, est_inds, options = list(polyorder = 2, regul_reg = TRUE, alpha_elnet = 1, nfolds = 10, apriori = (1:NCOL(samples)), intercept = TRUE, polyorder_max = Inf), folds = 5 )
integrand |
An |
samples |
An |
derivatives |
An |
log_weights |
(optional) A vector of length |
integrand_logged |
(optional) Sometimes it is better to input the integrand on the logged scale for stability. If the actual integrand is the exponential of |
est_inds |
(optional) A vector of indices for the estimation-only samples. The default when |
options |
A list of control variate specifications. This can be a single list containing the elements below (the defaults are used for elements which are not specified). Alternatively, it can be a list of lists containing any or all of the elements below. Where the latter is used, the function
|
folds |
The number of folds used in k-fold cross-validation for selecting the optimal control variate. Depending on the |
A list is returned, containing the following components:
expectation
: The estimates of the expectations.
num_select
: The number of non-zero coefficients in the polynomial.
mse
: The mean square error for the evaluation set.
coefs
: The estimated coefficients for the regression (columns are for the different integrands).
integrand_logged
: The integrand_logged
input stored for reference.
est_inds
: The est_inds
input stored for reference.
polyorder
: The polyorder
value used in the final estimate.
regul_reg
: The regul_reg
flag used in the final estimate.
alpha_elnet
: The alpha_elnet
value used in the final estimate.
nfolds
: The nfolds
value used in the final estimate.
apriori
: The apriori
vector used in the final estimate.
intercept
: The intercept
flag used in the final estimate.
polyorder_max
: The polyorder_max
flag used in the final estimate, if multiple options
are specified.
Leah F. South
Mira, A., Solgi, R., & Imparato, D. (2013). Zero variance Markov chain Monte Carlo for Bayesian estimators. Statistics and Computing, 23(5), 653-662.
South, L. F., Oates, C. J., Mira, A., & Drovandi, C. (2019). Regularised zero variance control variates for high-dimensional variance reduction. https://arxiv.org/abs/1811.05073
See ZVCV and VDP
for additional examples. See evidence
for functions which use zvcv
to estimate the normalising constant of the posterior.
# An example where ZV-CV can result in zero-variance estimators # Estimating some expectations when theta is bivariate normally distributed with: mymean <- c(-1.5,1.5) mycov <- matrix(c(1,0.5,0.5,2),nrow=2) # Perfect draws from the target distribution (could be replaced with # approximate draws from e.g. MCMC or SMC) N <- 30 require(mvtnorm) set.seed(1) samples <- rmvnorm(N, mean = mymean, sigma = mycov) # derivatives of Gaussian wrt x derivatives <- t( apply(samples,1,function(x) -solve(mycov)%*%(x - mymean)) ) # The integrands are the marginal posterior means of theta, the variances and the # covariance (true values are c(-1.5,1.5,1,2,0.5)) integrand <- cbind(samples[,1],samples[,2],(samples[,1] - mymean[1])^2, (samples[,2] - mymean[2])^2, (samples[,1] - mymean[1])*(samples[,2] - mymean[2])) # Estimates without ZV-CV (i.e. vanilla Monte Carlo integration) # Vanilla Monte Carlo sprintf("%.15f",colMeans(integrand)) # ZV-CV with fixed specifications # For this example, polyorder = 1 with OLS is exact for the first two integrands and # polyorder = 2 with OLS is exact for the last three integrands # ZV-CV with 2nd order polynomial, OLS and a polynomial based on only x_1. # For diagonal mycov, this would be exact for the first and third expectations. sprintf("%.15f",zvcv(integrand, samples, derivatives, options = list(polyorder = 2, regul_reg = FALSE, apriori = 1))$expectation) # ZV-CV with 1st order polynomial and OLS (exact for the first two integrands) sprintf("%.15f",zvcv(integrand, samples, derivatives, options = list(polyorder = 1, regul_reg = FALSE))$expectation) # ZV-CV with 2nd order polynomial and OLS (exact for all) sprintf("%.15f",zvcv(integrand, samples, derivatives, options = list(polyorder = 2, regul_reg = FALSE))$expectation) # ZV-CV with cross validation myopts <- list(list(polyorder = Inf, regul_reg = FALSE),list(polyorder = Inf, nfolds = 4)) temp <- zvcv(integrand,samples,derivatives,options = myopts, folds = 2) temp$polyorder # The chosen control variate order temp$regul_reg # Flag for if the chosen control variate uses regularisation # Cross-val ZV-CV to choose the polynomial order and whether to perform OLS or LASSO sprintf("%.15f",temp$expectation) # Estimate based on the chosen control variate
# An example where ZV-CV can result in zero-variance estimators # Estimating some expectations when theta is bivariate normally distributed with: mymean <- c(-1.5,1.5) mycov <- matrix(c(1,0.5,0.5,2),nrow=2) # Perfect draws from the target distribution (could be replaced with # approximate draws from e.g. MCMC or SMC) N <- 30 require(mvtnorm) set.seed(1) samples <- rmvnorm(N, mean = mymean, sigma = mycov) # derivatives of Gaussian wrt x derivatives <- t( apply(samples,1,function(x) -solve(mycov)%*%(x - mymean)) ) # The integrands are the marginal posterior means of theta, the variances and the # covariance (true values are c(-1.5,1.5,1,2,0.5)) integrand <- cbind(samples[,1],samples[,2],(samples[,1] - mymean[1])^2, (samples[,2] - mymean[2])^2, (samples[,1] - mymean[1])*(samples[,2] - mymean[2])) # Estimates without ZV-CV (i.e. vanilla Monte Carlo integration) # Vanilla Monte Carlo sprintf("%.15f",colMeans(integrand)) # ZV-CV with fixed specifications # For this example, polyorder = 1 with OLS is exact for the first two integrands and # polyorder = 2 with OLS is exact for the last three integrands # ZV-CV with 2nd order polynomial, OLS and a polynomial based on only x_1. # For diagonal mycov, this would be exact for the first and third expectations. sprintf("%.15f",zvcv(integrand, samples, derivatives, options = list(polyorder = 2, regul_reg = FALSE, apriori = 1))$expectation) # ZV-CV with 1st order polynomial and OLS (exact for the first two integrands) sprintf("%.15f",zvcv(integrand, samples, derivatives, options = list(polyorder = 1, regul_reg = FALSE))$expectation) # ZV-CV with 2nd order polynomial and OLS (exact for all) sprintf("%.15f",zvcv(integrand, samples, derivatives, options = list(polyorder = 2, regul_reg = FALSE))$expectation) # ZV-CV with cross validation myopts <- list(list(polyorder = Inf, regul_reg = FALSE),list(polyorder = Inf, nfolds = 4)) temp <- zvcv(integrand,samples,derivatives,options = myopts, folds = 2) temp$polyorder # The chosen control variate order temp$regul_reg # Flag for if the chosen control variate uses regularisation # Cross-val ZV-CV to choose the polynomial order and whether to perform OLS or LASSO sprintf("%.15f",temp$expectation) # Estimate based on the chosen control variate
This package can be used to perform post-hoc variance reduction of Monte Carlo estimators when the derivatives of the log target are available.
The main functionality is available through the following functions.
All of these use a set of
-dimensional samples along with the associated derivatives of the log target.
You can evaluate posterior expectations of
functions.
zvcv
: For estimating expectations using (regularised) zero-variance control variates (ZV-CV, Mira et al, 2013; South et al, 2018).
This function can also be used to choose between various versions of ZV-CV using cross-validation.
CF
: For estimating expectations using control functionals (CF, Oates et al, 2017).
SECF
: For estimating expectations using semi-exact control functionals (SECF, South et al, 2020).
aSECF
: For estimating expectations using approximate semi-exact control functionals (aSECF, South et al, 2020).
CF_crossval
: CF with cross-validation tuning.
SECF_crossval
: SECF with cross-validation tuning.
aSECF_crossval
: aSECF with cross-validation tuning.
ZV-CV is exact for polynomials of order at most polyorder
under Gaussian targets and is fast for large (although
setting a limit on
polyorder
through polyorder_max
is recommended for large ).
CF is a non-parametric approach that offers better than the standard Monte Carlo convergence rates.
SECF has both a parametric and a non-parametric component and it offers the advantages of both for an additional computational cost. The cost of
SECF is reduced in aSECF using nystrom approximations and conjugate gradient.
getX
: Calculates the design matrix for ZV-CV (without the column of 1's for the intercept)
medianTune
: Calculates the median heuristic for use in e.g. the Gaussian, Matern and rational quadratic kernels. Using the median heuristic is an alternative to cross-validation.
K0_fn
: Calculates the matrix. The output of this function can be used as an argument to
CF
, CF_crossval
,
SECF
, SECF_crossval
, aSECF
and aSECF_crossval
.
The kernel matrix is automatically computed in all of the above methods, but it is faster to calculate
in advance when using more than one of the above functions and when using any of the crossval functions.
Phi_fn
: Calculates the Phi matrix for SECF and aSECF (similar to getX
but with different arguments and it includes the column of 1's)
squareNorm
: Gets the matrix of square norms which is needed for all kernels.
Calculating this can help to save time if you are also interested in calculating the median heuristic, handling multiple tuning parameters or trying other kernels.
nearPD
: Finds the nearest symmetric positive definite matrix to the given matrix, for handling numerical issues.
logsumexp
: Performs stable computation of the log sum of exponential (useful when handling the sum of weights)
The following functions are used to estimate the evidence (the normalisiing constant of the posterior) as described in South et al (2018). They are relevant when sequential Monte Carlo with an annealing schedule has been used to collect the samples, and therefore are not of interest to those who are interested in variance reduction based on vanilla MCMC.
evidence_CTI
and evidence_CTI_CF
: Functions to estimate the evidence using thermodynamic integration (TI) with ZV-CV and CF, respectively
evidence_SMC
and evidence_SMC_CF
: Function to estimate the evidence using the SMC evidence identity with ZV-CV and CF, respectively.
The function Expand_Temperatures
can be used to adjust the temperature schedule so that it is more (or less) strict than the original schedule of temperatures.
Leah F. South
Mira, A., Solgi, R., & Imparato, D. (2013). Zero variance Markov chain Monte Carlo for Bayesian estimators. Statistics and Computing, 23(5), 653-662.
South, L. F., Karvonen, T., Nemeth, C., Girolami, M. and Oates, C. J. (2020). Semi-Exact Control Functionals From Sard's Method. https://arxiv.org/abs/2002.00033
South, L. F., Oates, C. J., Mira, A., & Drovandi, C. (2018). Regularised zero-variance control variates for high-dimensional variance reduction. https://arxiv.org/abs/1811.05073
Useful links:
Report bugs at https://github.com/LeahPrice/ZVCV/issues
# A real data example using ZV-CV is available at \link{VDP}. # This involves estimating posterior expectations and the evidence from SMC samples. # The remainder of this section is duplicating (albeit with a different random # seed) Figure 2a of South et al. (2020). N_repeats <- 2 # For speed, the actual code uses 100 N_all <- 25 # For speed, the actual code uses c(10,25,50,100,250,500,1000) sigma_list <- list(10^(-1.5),10^(-1),10^(-0.5),1,10^(0.5),10) nfolds <- 4 # For speed, the actual code uses 10 folds <- 2 # For speed, the actual code uses 5 d <- 4 integrand_fn <- function(x){ return (1 + x[,2] + 0.1*x[,1]*x[,2]*x[,3] + sin(x[,1])*exp(-(x[,2]*x[,3])^2)) } results <- data.frame() for (N in N_all){ # identify the largest polynomial order that can be fit without regularisation for auto ZV-CV max_r <- 0 while (choose(d + max_r + 1,d)<((folds-1)/folds*N)){ max_r <- max_r + 1 } MC <- ZV1 <- ZV2 <- ZVchoose <- rep(NaN,N_repeats) CF <- SECF1 <- aSECF1 <- SECF2 <- aSECF2 <- rep(NaN,N_repeats) CF_medHeur <- SECF1_medHeur <- aSECF1_medHeur <- rep(NaN,N_repeats) SECF2_medHeur <- aSECF2_medHeur <- rep(NaN,N_repeats) for (i in 1:N_repeats){ x <- matrix(rnorm(N*d),ncol=d) u <- -x f <- integrand_fn(x) MC[i] <- mean(f) ZV1[i] <- zvcv(f,x,u,options=list(polyorder=1,regul_reg=FALSE))$expectation # Checking if the sample size is large enough to accommodation a second order polynomial if (N > choose(d+2,d)){ ZV2[i] <- zvcv(f,x,u,options=list(polyorder=2,regul_reg=FALSE))$expectation } myopts <- list(list(polyorder=Inf,regul_reg=FALSE,polyorder_max=max_r), list(polyorder=Inf,nfolds=nfolds)) ZVchoose[i] <- zvcv(f,x,u,options=myopts,folds = folds)$expectation # Calculating the kernel matrix in advance for CF and SECF K0_list <- list() for (j in 1:length(sigma_list)){ K0_list[[j]] <- K0_fn(x,u,sigma_list[[j]],steinOrder=2,kernel_function="RQ") } CF[i] <- CF_crossval(f,x,u,K0_list=K0_list,folds = folds)$expectation SECF1[i] <- SECF_crossval(f,x,u,K0_list=K0_list,folds = folds)$expectation aSECF1[i] <- aSECF_crossval(f,x,u,steinOrder=2,kernel_function="RQ", sigma_list=sigma_list,reltol=1e-05,folds = folds)$expectation if (max_r>=2){ SECF2[i] <- SECF_crossval(f,x,u,polyorder=2,K0_list=K0_list,folds = folds)$expectation aSECF2[i] <- aSECF_crossval(f,x,u,polyorder=2,steinOrder=2,kernel_function="RQ", sigma_list=sigma_list,reltol=1e-05,folds = folds)$expectation } medHeur <- medianTune(x) K0_medHeur <- K0_fn(x,u,medHeur,steinOrder=2,kernel_function="RQ") CF_medHeur[i] <- CF(f,x,u,K0=K0_medHeur)$expectation SECF1_medHeur[i] <- SECF(f,x,u,K0=K0_medHeur)$expectation aSECF1_medHeur[i] <- aSECF(f,x,u,steinOrder=2,kernel_function="RQ", sigma=medHeur,reltol=1e-05)$expectation if (max_r>=2){ SECF2_medHeur[i] <- SECF(f,x,u,polyorder=2,K0=K0_medHeur)$expectation aSECF2_medHeur[i] <- aSECF(f,x,u,polyorder=2,steinOrder=2,kernel_function="RQ", sigma=medHeur,reltol=1e-05)$expectation } # print(sprintf("--%d",i)) } # Adding the results to a data frame MSE_crude <- mean((MC - 1)^2) results <- rbind(results,data.frame(N=N, order = "1 or NA", efficiency = 1, type = "MC")) results <- rbind(results,data.frame(N=N, order = "1 or NA", efficiency = MSE_crude/mean((ZV1 - 1)^2), type = "ZV")) results <- rbind(results,data.frame(N=N, order = "2", efficiency = MSE_crude/mean((ZV2 - 1)^2), type = "ZV")) results <- rbind(results,data.frame(N=N, order = "1 or NA", efficiency = MSE_crude/mean((ZVchoose - 1)^2), type = "ZVchoose")) results <- rbind(results,data.frame(N=N, order = "1 or NA", efficiency = MSE_crude/mean((CF - 1)^2), type = "CF")) results <- rbind(results,data.frame(N=N, order = "1 or NA", efficiency = MSE_crude/mean((SECF1 - 1)^2), type = "SECF")) results <- rbind(results,data.frame(N=N, order = "1 or NA", efficiency = MSE_crude/mean((aSECF1 - 1)^2), type = "aSECF")) if (((folds-1)/folds*N) > choose(d+2,d)){ results <- rbind(results,data.frame(N=N, order = "2", efficiency = MSE_crude/mean((SECF2 - 1)^2), type = "SECF")) results <- rbind(results,data.frame(N=N, order = "2", efficiency = MSE_crude/mean((aSECF2 - 1)^2), type = "aSECF")) } results <- rbind(results,data.frame(N=N, order = "1 or NA", efficiency = MSE_crude/mean((CF_medHeur - 1)^2), type = "CF_medHeur")) results <- rbind(results,data.frame(N=N, order = "1 or NA", efficiency = MSE_crude/mean((SECF1_medHeur - 1)^2), type = "SECF_medHeur")) results <- rbind(results,data.frame(N=N, order = "1 or NA", efficiency = MSE_crude/mean((aSECF1_medHeur - 1)^2), type = "aSECF_medHeur")) if (((folds-1)/folds*N) > choose(d+2,d)){ results <- rbind(results,data.frame(N=N, order = "2", efficiency = MSE_crude/mean((SECF2_medHeur - 1)^2), type = "SECF_medHeur")) results <- rbind(results,data.frame(N=N, order = "2", efficiency = MSE_crude/mean((aSECF2_medHeur - 1)^2), type = "aSECF_medHeur")) } # print(N) } ## Not run: # Plotting results where cross-validation is used for kernel methods require(ggplot2) require(ggthemes) a <- ggplot(data=subset(results,!(type %in% c("CF_medHeur","SECF_medHeur", "aSECF_medHeur","SECF_medHeur","aSECF_medHeur"))), aes(x=N,y=efficiency,col=type,linetype=order)) + scale_color_pander() + ggtitle("") + geom_line(size=1.5) + scale_x_log10() + scale_y_log10() + annotation_logticks(base=10) + labs(x="N",y="Efficiency",color="Method", linetype="Polynomial Order") + theme_minimal(base_size = 15) + theme(legend.key.size = unit(0.5, "cm"),legend.key.width = unit(1, "cm")) + guides(linetype = guide_legend(override.aes = list(size=1),title.position = "top"), color = guide_legend(override.aes = list(size=1),title.position = "top")) print(a) # Plotting results where the median heuristic is used for kernel methods b <- ggplot(data=subset(results,!(type %in% c("CF","SECF","aSECF","SECF","aSECF"))), aes(x=N,y=efficiency,col=type,linetype=order)) + scale_color_pander() + ggtitle("") + geom_line(size=1.5) + scale_x_log10() + scale_y_log10() + annotation_logticks(base=10) + labs(x="N",y="Efficiency",color="Method", linetype="Polynomial Order") + theme_minimal(base_size = 15) + theme(legend.key.size = unit(0.5, "cm"),legend.key.width = unit(1, "cm")) + guides(linetype = guide_legend(override.aes = list(size=1),title.position = "top"), color = guide_legend(override.aes = list(size=1),title.position = "top")) print(b) ## End(Not run)
# A real data example using ZV-CV is available at \link{VDP}. # This involves estimating posterior expectations and the evidence from SMC samples. # The remainder of this section is duplicating (albeit with a different random # seed) Figure 2a of South et al. (2020). N_repeats <- 2 # For speed, the actual code uses 100 N_all <- 25 # For speed, the actual code uses c(10,25,50,100,250,500,1000) sigma_list <- list(10^(-1.5),10^(-1),10^(-0.5),1,10^(0.5),10) nfolds <- 4 # For speed, the actual code uses 10 folds <- 2 # For speed, the actual code uses 5 d <- 4 integrand_fn <- function(x){ return (1 + x[,2] + 0.1*x[,1]*x[,2]*x[,3] + sin(x[,1])*exp(-(x[,2]*x[,3])^2)) } results <- data.frame() for (N in N_all){ # identify the largest polynomial order that can be fit without regularisation for auto ZV-CV max_r <- 0 while (choose(d + max_r + 1,d)<((folds-1)/folds*N)){ max_r <- max_r + 1 } MC <- ZV1 <- ZV2 <- ZVchoose <- rep(NaN,N_repeats) CF <- SECF1 <- aSECF1 <- SECF2 <- aSECF2 <- rep(NaN,N_repeats) CF_medHeur <- SECF1_medHeur <- aSECF1_medHeur <- rep(NaN,N_repeats) SECF2_medHeur <- aSECF2_medHeur <- rep(NaN,N_repeats) for (i in 1:N_repeats){ x <- matrix(rnorm(N*d),ncol=d) u <- -x f <- integrand_fn(x) MC[i] <- mean(f) ZV1[i] <- zvcv(f,x,u,options=list(polyorder=1,regul_reg=FALSE))$expectation # Checking if the sample size is large enough to accommodation a second order polynomial if (N > choose(d+2,d)){ ZV2[i] <- zvcv(f,x,u,options=list(polyorder=2,regul_reg=FALSE))$expectation } myopts <- list(list(polyorder=Inf,regul_reg=FALSE,polyorder_max=max_r), list(polyorder=Inf,nfolds=nfolds)) ZVchoose[i] <- zvcv(f,x,u,options=myopts,folds = folds)$expectation # Calculating the kernel matrix in advance for CF and SECF K0_list <- list() for (j in 1:length(sigma_list)){ K0_list[[j]] <- K0_fn(x,u,sigma_list[[j]],steinOrder=2,kernel_function="RQ") } CF[i] <- CF_crossval(f,x,u,K0_list=K0_list,folds = folds)$expectation SECF1[i] <- SECF_crossval(f,x,u,K0_list=K0_list,folds = folds)$expectation aSECF1[i] <- aSECF_crossval(f,x,u,steinOrder=2,kernel_function="RQ", sigma_list=sigma_list,reltol=1e-05,folds = folds)$expectation if (max_r>=2){ SECF2[i] <- SECF_crossval(f,x,u,polyorder=2,K0_list=K0_list,folds = folds)$expectation aSECF2[i] <- aSECF_crossval(f,x,u,polyorder=2,steinOrder=2,kernel_function="RQ", sigma_list=sigma_list,reltol=1e-05,folds = folds)$expectation } medHeur <- medianTune(x) K0_medHeur <- K0_fn(x,u,medHeur,steinOrder=2,kernel_function="RQ") CF_medHeur[i] <- CF(f,x,u,K0=K0_medHeur)$expectation SECF1_medHeur[i] <- SECF(f,x,u,K0=K0_medHeur)$expectation aSECF1_medHeur[i] <- aSECF(f,x,u,steinOrder=2,kernel_function="RQ", sigma=medHeur,reltol=1e-05)$expectation if (max_r>=2){ SECF2_medHeur[i] <- SECF(f,x,u,polyorder=2,K0=K0_medHeur)$expectation aSECF2_medHeur[i] <- aSECF(f,x,u,polyorder=2,steinOrder=2,kernel_function="RQ", sigma=medHeur,reltol=1e-05)$expectation } # print(sprintf("--%d",i)) } # Adding the results to a data frame MSE_crude <- mean((MC - 1)^2) results <- rbind(results,data.frame(N=N, order = "1 or NA", efficiency = 1, type = "MC")) results <- rbind(results,data.frame(N=N, order = "1 or NA", efficiency = MSE_crude/mean((ZV1 - 1)^2), type = "ZV")) results <- rbind(results,data.frame(N=N, order = "2", efficiency = MSE_crude/mean((ZV2 - 1)^2), type = "ZV")) results <- rbind(results,data.frame(N=N, order = "1 or NA", efficiency = MSE_crude/mean((ZVchoose - 1)^2), type = "ZVchoose")) results <- rbind(results,data.frame(N=N, order = "1 or NA", efficiency = MSE_crude/mean((CF - 1)^2), type = "CF")) results <- rbind(results,data.frame(N=N, order = "1 or NA", efficiency = MSE_crude/mean((SECF1 - 1)^2), type = "SECF")) results <- rbind(results,data.frame(N=N, order = "1 or NA", efficiency = MSE_crude/mean((aSECF1 - 1)^2), type = "aSECF")) if (((folds-1)/folds*N) > choose(d+2,d)){ results <- rbind(results,data.frame(N=N, order = "2", efficiency = MSE_crude/mean((SECF2 - 1)^2), type = "SECF")) results <- rbind(results,data.frame(N=N, order = "2", efficiency = MSE_crude/mean((aSECF2 - 1)^2), type = "aSECF")) } results <- rbind(results,data.frame(N=N, order = "1 or NA", efficiency = MSE_crude/mean((CF_medHeur - 1)^2), type = "CF_medHeur")) results <- rbind(results,data.frame(N=N, order = "1 or NA", efficiency = MSE_crude/mean((SECF1_medHeur - 1)^2), type = "SECF_medHeur")) results <- rbind(results,data.frame(N=N, order = "1 or NA", efficiency = MSE_crude/mean((aSECF1_medHeur - 1)^2), type = "aSECF_medHeur")) if (((folds-1)/folds*N) > choose(d+2,d)){ results <- rbind(results,data.frame(N=N, order = "2", efficiency = MSE_crude/mean((SECF2_medHeur - 1)^2), type = "SECF_medHeur")) results <- rbind(results,data.frame(N=N, order = "2", efficiency = MSE_crude/mean((aSECF2_medHeur - 1)^2), type = "aSECF_medHeur")) } # print(N) } ## Not run: # Plotting results where cross-validation is used for kernel methods require(ggplot2) require(ggthemes) a <- ggplot(data=subset(results,!(type %in% c("CF_medHeur","SECF_medHeur", "aSECF_medHeur","SECF_medHeur","aSECF_medHeur"))), aes(x=N,y=efficiency,col=type,linetype=order)) + scale_color_pander() + ggtitle("") + geom_line(size=1.5) + scale_x_log10() + scale_y_log10() + annotation_logticks(base=10) + labs(x="N",y="Efficiency",color="Method", linetype="Polynomial Order") + theme_minimal(base_size = 15) + theme(legend.key.size = unit(0.5, "cm"),legend.key.width = unit(1, "cm")) + guides(linetype = guide_legend(override.aes = list(size=1),title.position = "top"), color = guide_legend(override.aes = list(size=1),title.position = "top")) print(a) # Plotting results where the median heuristic is used for kernel methods b <- ggplot(data=subset(results,!(type %in% c("CF","SECF","aSECF","SECF","aSECF"))), aes(x=N,y=efficiency,col=type,linetype=order)) + scale_color_pander() + ggtitle("") + geom_line(size=1.5) + scale_x_log10() + scale_y_log10() + annotation_logticks(base=10) + labs(x="N",y="Efficiency",color="Method", linetype="Polynomial Order") + theme_minimal(base_size = 15) + theme(legend.key.size = unit(0.5, "cm"),legend.key.width = unit(1, "cm")) + guides(linetype = guide_legend(override.aes = list(size=1),title.position = "top"), color = guide_legend(override.aes = list(size=1),title.position = "top")) print(b) ## End(Not run)