Bagging cross-validated bandwidth selection in nonparametric regression estimation with applications to large-sized samples

Barreiro-Ures, D.; Cao, R.; Francisco-Fernández, M.

Statistics > Methodology

arXiv:2105.04134 (stat)

COVID-19 e-print

Important: e-prints posted on arXiv are not peer-reviewed by arXiv; they should not be relied upon without context to guide clinical practice or health-related behavior and should not be reported in news media as established information without consulting multiple experts in the field.

[Submitted on 10 May 2021]

Title:Bagging cross-validated bandwidth selection in nonparametric regression estimation with applications to large-sized samples

Authors:D. Barreiro-Ures, R. Cao, M. Francisco-Fernández

View PDF

Abstract:Cross-validation is a well-known and widely used bandwidth selection method in nonparametric regression estimation. However, this technique has two remarkable drawbacks: (i) the large variability of the selected bandwidths, and (ii) the inability to provide results in a reasonable time for very large sample sizes. To overcome these problems, bagging cross-validation bandwidths are analyzed in this paper. This approach consists in computing the cross-validation bandwidths for a finite number of subsamples and then rescaling the averaged smoothing parameters to the original sample size. Under a random-design regression model, asymptotic expressions up to a second-order for the bias and variance of the leave-one-out cross-validation bandwidth for the Nadaraya--Watson estimator are obtained. Subsequently, the asymptotic bias and variance and the limit distribution are derived for the bagged cross-validation selector. Suitable choices of the number of subsamples and the subsample size lead to an $n^{-1/2}$ rate for the convergence in distribution of the bagging cross-validation selector, outperforming the rate $n^{-3/10}$ of leave-one-out cross-validation. Several simulations and an illustration on a real dataset related to the COVID-19 pandemic show the behavior of our proposal and its better performance, in terms of statistical efficiency and computing time, when compared to leave-one-out cross-validation.

Subjects:	Methodology (stat.ME); Statistics Theory (math.ST); Applications (stat.AP); Computation (stat.CO)
Cite as:	arXiv:2105.04134 [stat.ME]
	(or arXiv:2105.04134v1 [stat.ME] for this version)
	https://doi.org/10.48550/arXiv.2105.04134

Submission history

From: Daniel Barreiro Ures [view email]
[v1] Mon, 10 May 2021 06:31:37 UTC (6,276 KB)

Statistics > Methodology

Title:Bagging cross-validated bandwidth selection in nonparametric regression estimation with applications to large-sized samples

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Methodology

Title:Bagging cross-validated bandwidth selection in nonparametric regression estimation with applications to large-sized samples

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators