Strategic sample selection

Di Tillio, Alfredo; Ottaviani, Marco; Sørensen, Peter Norman

Are the highest sample realizations selected from a larger presample more or less informative than the same amount of random data? Developing multivariate accuracy for interval dominance ordered preferences, we show that sample selection always benefits (or always harms) a decision maker if the reverse hazard rate of the data distribution is log-supermodular (or log-submodular), as in location experiments with normal noise. We find nonpathological conditions under which the information contained in the winning bids of a symmetric auction decreases in the number of bidders. Exploiting extreme value theory, we quantify the limit amount of information revealed when the presample size (number of bidders) goes to infinity. In a model of equilibrium persuasion with costly information, we derive implications for the optimal design of selected experiments when selection is made by an examinee, a biased researcher, or contending sides with the peremptory challenge right to eliminate a number of jurors.