Friday, October 13, 2006
Iraqi Death Survey Part VI
The data used to determine the number of clusters per Governorate, as well as the probability of a particular Administrative Region's selection within the Governorate, and to extrapolate to the final 600,000+ figure come from 2004 estimates of Iraqi population.
There are several things to think about here.
First of all, how accurate are 2004 population estimates likely to be, to begin with? Census data from the Saddam era has every reason to be suspect, and in so far as the 2004 estimates base calculations on pre-2003 census data, they are likely to be flawed. Moreover, independent estimates made in 2004 are likely to suffer from many of the same uncertainties present in this study, namely logistical difficulties in conducting surveys, imprecise administrative records and the like. There's simply no way to know how good these 2004 estimates are, and every reason to believe that they are rough estimates, at best.
Be that as it may, and assuming for the moment that they are accurate, what effects do massive displacement since that time have on the extrapolated numbers?
According to official reports , over 180,000 internally displaced refugees were reported just between the months of February and June of 2006. Undoubtedly those not registering pushes the number much higher. As I pointed out below, the survey methodology means that these displaced refugees had very little chance of being surveyed. But in addition to that, their migration is sure to skew the analysis of the data.
The authors acknowledge as much in their paper:
"The population data used for cluster selection were at least 2 years old, and if populations subsequently migrated from areas of high mortality to those with low mortality, the sample might have over-represented the high-mortality
Well, not just over-represented in sampling, but also over-estimated in projections.
In addition, of course, emigration from Iraq entirely would cause the estimates to be overstated by a corresponding amount (if the population were only 22.5 million, rather than 25 million, for example, then the true extrapolated estimate would have to be revised downward by 10%)
The authors, however, also make this very misleading statement about internal migration:
“internal population movement would be less likely to affect results appreciably [than emigration from Iraq.]”
Perhaps less likely, but the effect could be considerable: Consider the following simplified (and exaggerated) example:
Suppose there are 2 regions, each with population = 1 million in 2004, and suppose that, from 2004 to 2006 one of the regions is subjected to extreme violence, while the other is not. Suppose that this causes 50% of the population (500,000) to move from the region of extreme violence to the region without.
Now, suppose a survey is done in the two regions, where we find that the violent death toll in the war-torn region is 10 per 1000, while in the more peaceful region it is 2 per 1000 (in the latest 1 year period.)
Assuming that the surveys are accurate, we would see that:
Actual deaths in war-torn region: 10 per 1000 x 500,000 = 5,000
Extrapolated deaths in war-torn region: 10 per 1000 x 1,000,000 = 10,000
Actual deaths in more peaceful region: 2 per 1000 x 1,500,000 = 3,000
Extrapolated deaths in more peaceful region: 2 per 1000 x 1,000,000 = 2,000
Total actual deaths: 8,000
Total extrapolated deaths: 12,000
Difference: 4000 deaths or an over-estimate of 50%.
Now, the true changes from 2004 to 2006 are liable to be far less than the 50% in the example. However, there is also the multiplier effect of overestimation in sampling combined with overestimation in extrapolation:
Suppose that a city had a population of 50,000 in 2004, but due to a flare-up of violence, half the people left the city by 2006 (this is NOT at all unlikely, there are reports of entire cities becoming ghost towns overnight, due to the actions of the various militias and insurgent groups, see, for example, Fallujah.) Even though its actual 2006 population was 25,000, it would have twice the likelihood that it should have of being selected, based on its current population. And, given that it had suffered this tremendous out-migration, it would be far likelier to be in a very violent area, contributing higher than representative numbers, which in turn get multiplied by a higher than correct factor.
On the other hand, if it were in a very violent area, it's quite possible that the survey teams would simply have decided that it was too risky to get there, and selected, at their whim, another, safer place (which, again, takes this survey completely out of the realm of statistical analysis, without doing a psychoanalysis of the survey teams, and attempting to massage the data in some way to compensate for their selection biases.)
Again, the purpose of these analyses is not to show that the actual numbers are higher or lower than the survey's estimates, but rather to analyze the many flaws in both the methodology and interpretation of the survey which may lead it to be not especially meaningful.
Update: BBC has more on Iraqi displacement estimates here.
I think your point about the likelihood of the survey teams preforming as they reported they have is worth exploring further.
I will post a link to your blog there as it should be innterest to the discussion.