The reusable holdout: preserving validity in adaptive data analysis

22:10 06.08.2015
Posted by Moritz Hardt, Research Scientist Machine learning and statistical analysis play an important role at the forefront of scientific and technological progress. But with all data analysis, there is a danger that findings observed in a particular sample do not generalize to the underlying population from which the data were drawn. A popular XKCD cartoon illustrates that if you test sufficiently many different colors of jelly beans for correlation with acne,...
  1,296