Higher Criticism for Large-Scale Inference, Especially for Rare and Weak Effects

Donoho, David; Jin, Jiashun

doi:10.1214/14-STS506

Mathematics > Statistics Theory

arXiv:1410.4743 (math)

[Submitted on 17 Oct 2014 (v1), last revised 10 Apr 2015 (this version, v2)]

Title:Higher Criticism for Large-Scale Inference, Especially for Rare and Weak Effects

Authors:David Donoho, Jiashun Jin

View PDF

Abstract:In modern high-throughput data analysis, researchers perform a large number of statistical tests, expecting to find perhaps a small fraction of significant effects against a predominantly null background. Higher Criticism (HC) was introduced to determine whether there are any nonzero effects; more recently, it was applied to feature selection, where it provides a method for selecting useful predictive features from a large body of potentially useful features, among which only a rare few will prove truly useful. In this article, we review the basics of HC in both the testing and feature selection settings. HC is a flexible idea, which adapts easily to new situations; we point out simple adaptions to clique detection and bivariate outlier detection. HC, although still early in its development, is seeing increasing interest from practitioners; we illustrate this with worked examples. HC is computationally effective, which gives it a nice leverage in the increasingly more relevant "Big Data" settings we see today. We also review the underlying theoretical "ideology" behind HC. The Rare/Weak (RW) model is a theoretical framework simultaneously controlling the size and prevalence of useful/significant items among the useless/null bulk. The RW model shows that HC has important advantages over better known procedures such as False Discovery Rate (FDR) control and Family-wise Error control (FwER), in particular, certain optimality properties. We discuss the rare/weak phase diagram, a way to visualize clearly the class of RW settings where the true signals are so rare or so weak that detection and feature selection are simply impossible, and a way to understand the known optimality properties of HC.

Comments:	Published at this http URL in the Statistical Science (this http URL) by the Institute of Mathematical Statistics (this http URL)
Subjects:	Statistics Theory (math.ST)
Report number:	IMS-STS-STS506
Cite as:	arXiv:1410.4743 [math.ST]
	(or arXiv:1410.4743v2 [math.ST] for this version)
	https://doi.org/10.48550/arXiv.1410.4743
Journal reference:	Statistical Science 2015, Vol. 30, No. 1, 1-25
Related DOI:	https://doi.org/10.1214/14-STS506

Submission history

From: David Donoho [view email] [via VTEX proxy]
[v1] Fri, 17 Oct 2014 14:43:38 UTC (1,345 KB)
[v2] Fri, 10 Apr 2015 11:43:53 UTC (807 KB)

Mathematics > Statistics Theory

Title:Higher Criticism for Large-Scale Inference, Especially for Rare and Weak Effects

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Mathematics > Statistics Theory

Title:Higher Criticism for Large-Scale Inference, Especially for Rare and Weak Effects

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators