an interesting exercise¶
This is an interesting exercise in statistical analysis. In cows, the incidence of polymelia (being born with extra limbs) is about 16,000 times higher than polycephaly (multiple heads), which might lead one to conclude that counting heads is a more reliable metric... https://t.co/ryS4FrdrxT HOWEVER, both are rare, with the former happening in approximately 1 in 25,000 births, and the latter 1 in 400,000,000. We also need to factor in the incidence of cows which have FEWER limbs. I can't find statistics on phocomelia (congenitally missing limbs) incidence in cows. This leads me to assume that either it is vanishingly rare and/or has such low survival rate (naturally or because of human intervention) that we can consider it as irrelevant to herd statistics as cows with missing heads, i.e. there wouldn't be any at all, because... how? What is NOT statistically irrelevant is cows losing limbs after birth. Injury, disease, and especially poor living conditions all contribute to loss of limb, and this varies tremendously across time, geography, practices, and industry (beef vs dairy). I've seen estimates of 8%, 50%, <1%, etc. All over the map, so not especially useful for getting hard data. However, what is absolutely clear is that acquired loss of limbs FAR outweighs the impact of congenital effects. In other words, counting limbs will always give you a statistically significant undercount of herd size. If the incidence is around 1 missing limbs per 100 cows at any given moment, this so-called "fastest cow counter" will consistently report 399 cows for every 400. Now, why did I go through all of this blather? What's the broader message?
Well, mostly I'm just being silly, obviously.
But there actually is an important lesson here about the way statistical measures are so easily skewed by lack of consideration for outliers. All too often, we hear top-level statistics reported in a vacuum, and we just assume all appropriate respect was given to the analytical process -- the data collection methods, the consideration for confounding variables, the usefulness of the reporting format, etc. Population-level studies are notoriously difficult, both in terms of straight data collection (how do you even find the people you're looking for? how do you avoid sampling bias?) but also in terms of interpretation and drawing useful, meaningful conclusions. Even the process of compensating for variability and establishing error rate relies on other statistical information, which can also be biased. And the very questions you ask in the first place structure EVERYTHING about your study, possibly making it useless from the start. I'm not saying "trust nothing and no one!" But understand that there is nothing sacred about "hard data."
If you understand what you're doing, you are totally free to pick apart sources and methodologies, even if the source is Official™️ and Respected™️.