In a data set of a thousand or more cases, I’ve been told to conduct a test once, and once only, and the result of that test is to be reported as the defintive result of the study. Because conducting a test several times on different randomly drawn subsamples of the data set and comparing them for replicability of the result is apparently unscientific.

I think what’s unscientific is choosing not to acknowledge that a test result is not replicable on randomly drawn subsamples when it indeed isn’t. Because you’re insisting that something you chance upon in the first instance is gospel, to however a small extent, when you have the possibility of showing that it isn’t. But you’re scared to do it, because science dictates you can’t.

That sucks.

And however much traditional science embraces an epistemology of empiricism, insofar as that empiricism depends on statistics to show that effects exist in the world, there will always be an intellectual community that embraces single-test philosophy as the only feasible way to report the existence of an effect, and thus, with so many data sets, we can never really know whether such effects still exist, by way of replication, in multiple randomly drawn subsamples of large data sets.

Well, that’s exactly what I’ve just done and the results are variable. Meaning what you find first isn’t always the right answer.

Rather, ‘truth’, insofar as science can escertain it, should be built on the basis of repeated discovery of the same replicability of an effect, not just a single instance of an effect. If an effect only exists in a data set in one instance, I’m just not inclined to believe it’s really there.

Or maybe I’ve just been staring at my data for too long.