Thursday, November 02, 2006

 

Algebra Problems From 1822

I collect old math textbooks as a hobby, and have a fairly extensive assortment of Algebra, Probability and Statistics, and Calculus texts from 1820 - 1930.

In terms of no-nonsense brevity, my favorite is Percey Smith's "Elementary Calculus" from 1902; a complete 1 semester Calculus course, with many exercises, plus a nice treatment of partial derivatives, in 89 pages of a 5" x 7" book. Most current Calculus texts take over 100 jumbo-sized pages just to get through the "preliminaries" chapter.

My newest acquisition is a copy of John Bonnycastle's "An Introduction to Algebra;" the British Edition came out in 1815, mine is a New York edition from "the twenty-eighth day of December in the forty-sixth year of the Independence of the United States of America," which, by my count would be 1822.

From Bonnycastle, here are some of my favorite "Miscellaneous Questions producing Simple Equations:"

3. A shepherd being asked how many sheep he had in his flock, said, "If I had as many more, half as many more, and 7 sheep and a half, I should have just 500;" how many had he?

4. A post is one fourth of its length in the mud, one third in the water, and 10 feet above the water, what is its whole length?

14. A servant agreed to live with his master for 8 pounds a year, and a livery, but was turned away at the end of seven months and received only 2 pounds, 13 shillings and 4 pence and his livery; what was its value?

19. A person at a tavern borrowed as much money as he had about him, and out of the whole spent 1 shilling; he then went to a second tavern, where he also borrowed as much as he had now about him, and out of the whole spent 1 shilling; and going on, in this manner, to a third and a fourth tavern, he found, after spending his shilling at the latter, that he had nothing left; how much money had he at first?

27. A person has two horses, and a saddle, which itself is worth 50 pounds; now, if the saddle be put on the back of the first horse, it will make his value double that of the second, and if it be put on the back of the second, it will make his value triple that of the first; what is the value of each horse?

(My favorite) 34. A man and his wife usually drank out a cask of beer in 12 days, but when the man was from home it lasted the woman 30 days; how many days would the man alone be in drinking it?

and "Questions producing Quadratic Equations:"

13. A grazier bought as many sheep as cost him 60 pounds, and after reserving 15 out of their number, sold the remainder for 54 pounds, and gained 2 shillings a head by them; how many sheep did he buy?

16. A company at a tavern had 8 pounds, 15 shillings to pay for their reckoning; but before the bill was settled, two of them went away; in consequence of which those who remained had 10 shillings a piece more to pay than before; how many were there in the company?

21. Two detachments of foot being ordered to a station at the distance of 39 miles from their present quarters, begin their march at the same time; but one party arrive there an hour sooner; required their rates of marching?

And, from a nice section on "Indeterminate Analysis," involving Euclid's Algorithm, and "Tower of Hanoi" type problems (no, he doesn't use these terms!)

10. I owe my friend a shilling, and have nothing about me but guineas, and he has nothing but louis d'ors; how must I contrive to acquit myself of the debt? (Fortunately, he adds the note that the louis are valued at 17 shillings a piece, and the guineas at 21 shillings.)

Notropis


Wednesday, October 18, 2006

 

Iraqi Death Survey Wrap-Up

After re-reading the articles by Steven E. Moore, in the Wall Street Journal and Iraq Body Count, as well as perusing the comments on many other blogs, I’ve decided that one last post might be in order on this topic. I’ll break my observations down into three broad topics:


1. The construction of the survey
2. The conduct of the survey
3. The analysis of the results


(I also add, here, that Iraq Body Count's criticism of the results of the study, based on what I would describe as face validity, seem to me to be very compelling. I won't address those issues, as I'm in no way competent to offer an analysis that could compete with that of Iraq Body Count.)


1. The construction of the survey:

A) Steven Moore makes much of the fact that 47 clusters were used, and this is far too small, given the extremely non-uniform distribution of violent deaths in Iraq. He may well be right; he’s certainly more experienced in surveying techniques than am I. Several opposing voices have pointed out that Mr. Moore himself has used only 75 clusters in similar situations, and others have used 150, or whatever. This sort of analysis quickly gets out of the realm of statistics and into polemics and name-calling. My problem with the number of clusters has to do with the assumed stratification of the population.


This survey was, at the top level, a stratified survey. Iraq was divided into its Governorates, and the number of clusters chosen per Governorate was decided by population. Evidently, the authors had reason to believe that there might be significant differences in death rates between Governorates (which was confirmed by their own results.) Unfortunately, in all but two Governorates, three or fewer clusters were selected from that Governorate. In several cases, only a single cluster was selected. How can one possibly control for the possibility of getting a very unrepresentative cluster, when a sample of a single cluster is used? The authors say that they did comparisons between clusters and within clusters. Within clusters, I’ll grant you. But between clusters? Evidently between clusters from different strata. This makes no sense. If you stratify a population, it is because you are assuming, a priori, that there may be significant differences between strata (Governorates.) You can’t then turn around and compare between strata to attempt to identify, or compensate for, a single cluster as being representative or unrepresentative of that particular stratum. In the famous words of Kwame Nkrumah upon his removal by coup as the first President of Ghana, “You can’t compare one thing.”


When a stratified sampling is used, it is common practice to use a large enough sample to get several draws (even if each draw consists of a cluster of individual samples) from each stratum.


B) The method of selecting named main streets, followed by named cross streets is certainly not random, and quite possibly not representative.


I don’t know much about what proportion or which streets are named in Iraq. But I have had several experiences which lead me to question whether the distribution of officially named and recorded main streets and cross streets is uniform enough to use as a basis for a random selection procedure.


How many streets are named? In rural America, where I now live, the answer is “almost all.” But even ten years ago, the answer was “most in some places, none in others.” The change came about due to the 911 emergency calling system. Here in my town of 300, the locals laugh that a UPS guy can find your house from the address, but none of the citizens could. My official address is XYZ 3rd St. (has been officially so named for about 8 years), but if I want to tell anyone where I live, I have to say “the old Hoffman house.” I would suspect that much of Iraq still doesn’t have streets (main or cross) that would be listed in an official directory.


Back when I lived in Liberia, we conducted health and demographic surveys in conjunction with the national vaccination campaigns. I was just a foot soldier, working the villages of Grand Cape Mount County, and have no idea how the cluster selection process was done, but I can guarantee that it wasn’t done by street name. Outside of Robertsport, there wasn’t a named street in all of the county. At that time, I would guess that fewer than 50 communities in the entire nation had named streets, and that, even in the most advanced places, like Monrovia, fewer than 50% of the population lived on a named street, and those who did, were distinctly non-representative on many levels. On the urban side, the most densely populated part of Monrovia was an area called West Point. I lived there for about two months. I estimated the population at the time at about 30,000 – and there wasn’t a single street, named or otherwise, in the entire slum. I’m thinking Sadr City, Baghdad looks a lot like West Point, Monrovia in that respect.


My guess is that the systematic selection of only named cross streets to named main streets as listed in an official directory will systematically exclude broad segments of the Iraqi population, namely the rural, the urban poor, and the internally-displaced refugees. Whether this systematic bias skews the survey results up or down or is neutral, I don’t know, nor does anyone else (if they did, then why the hell would anyone be doing the survey?) The fact is, that it’s bad statistical methodology to use a systematic selection method that consistently biases against particular large demographics.


Several posters to Tim Blair’s blog (see link below) brought up a further problem with the selection process. Although it’s a bit unclear from the description in the articles, it appears that all of the clusters were chosen from a named cross street, at a distance fairly close to a main street. The report says that “a house was chosen at random,” but does not specify how that random starting place was selected, or what the maximum distance from the main street the starting point would be. They (the blog posters) suggest that violence might be concentrated nearer the main streets and that therefore the incidence of violent deaths would be higher close to a main street. Again, there’s no way to know that, but, again, systematic bias is bad practice, and can lead to results far outside of the “margin of error” (which, of course, is constructed assuming an absolute absence of non-sampling, or “methodological,” error.)


2. The conduct of the survey:

Here’s where my analysis gets a bit dicey, but I’ve got to point out what I see, and here’s where my experience of many years as a statistics teacher, supervising and grading student projects, leads me to grave doubts about what the survey teams actually did and didn’t do.


A) The response rates reported by the teams, in terms of their success at finding a head of household or spouse at home and willing to participate are just amazingly, extremely, insanely, unbelievably high, especially given the fact that the teams never once paid a second visit to a household, due to the dangers they were facing, working in a war zone, and apparently worked throughout the day, rather than confining their visits to times when respondents would be likely to be home (see the time constraint concern, below.)


The authors report that in 99%+ of the households, someone was at home. They also claim that in only one cluster were any empty households found among the 40 adjacent households surveyed. They phrase it so as to insinuate that they are minimizing the death estimates:


“Households where all members were dead or had gone away were reported in only one cluster in Ninewa and these deaths are not included in this report.”


This quote has become a favorite among some blogs as showing that, in fact, the real numbers must be much higher than those in the survey, since any annihilated households have been discounted. It's definitely true that the phrase “were dead or had gone away” followed by the “these deaths” clearly implies that the researchers have reason to believe that the former occupants of the households in question were all dead (without explicitly stating so.) But what bothers me is the implication that vacant houses were supposedly encountered in only one cluster in all of Iraq, and, by insinuation, none vacated by emigration. With estimates of over a million recent emigrants from Iraq entirely, and up to that many again internally displaced persons (together pushing 7% of the total population of Iraq), one would have expected to see more, and more widely distributed, vacant houses – even if no entire households had been annihilated. The next question becomes, what’s the difference between a ‘household where all members were dead or had gone away’ and the ‘16 (0.9%) dwellings [where] residents were absent.’ The latter evidently includes households where the surveyors believed that someone was still living, but no one was home when they knocked (or so I’m guessing.)


In any event, the fact that 7% or more of Iraqis have vacated their homes (for other parts of Iraq, or other parts of the world, or Heaven,) and yet that less than 1% of the households surveyed found no one at home, is very suspicious to me. If nothing else, it calls into question the representativeness of the sampling. The <1% "not at home," even absent the contributing concerns, raises all kinds of red flags for me.


B) Among those that were at home, only “15 (0.8%) households refused to participate.” Given the purported methodology of the survey, this must also include any households where some members were home, but not the head of household or spouse, since we’re guaranteed that the head of household or spouse were the only ones questioned. So we’re left (combining A and B) with the astounding result that in more than 98% of the attempted contacts, the head of household or spouse was at home and willing to participate in the survey, and this, without a single call-back, since attempting a second contact with a household was deemed too dangerous for the survey teams. What makes these results doubly surprising is that the surveys must have been conducted throughout the day, in order to accomplish 40 surveys per day (see below.) So, somehow, a total of 15 or fewer “Dad’s at work (or the Mosque or wherever), Mom’s at the market” responses in over 1700 attempts.


It’s quite possible that 15+ years of teaching Introductory Statistics and similar courses has left me a bit jaded, but I know that I’d be calling these survey teams into my office, with some serious questions about what they actually did or did not do, before accepting any of their results.


C) The time spent per survey belies the notion that great care was taken to insure the interviewees' comprehension of the questions and the interviewers' assurances of accuracy in the answers. According to the article, the survey teams “could typically complete a cluster of 40 households in 1 day.” The survey teams reportedly consisted of 4 members each, two male and two female. It is not stated how or whether the teams split up in conducting the interviews. From what I’ve heard about Islamic culture, it would seem likely that they would not have gone out individually, given that some women might be reluctant to speak to a single man, and vice versa. So, if we assume that they split up into two pairs of one male, one female doctor each, then each pair was interviewing 20 households in a day. Even assuming 8 hours per day for fieldwork, this leaves less than 24 minutes per interview (less than because it takes some time to walk from house to house.) Based on my own experiences with face-to-face interviewing, this would be maybe 15 minutes for the actual survey questioning (there’s always a cushion for formalities, pleasantries, getting settled and whatnot.) Somewhere in there, also, the interview teams had the time to reassure the interviewees of their honesty and good intentions, and double-check any questionalbe results. Read what the article claims went on at each interview:


“The survey purpose was explained to the head of household or spouse, and oral consent was obtained. Participants were assured that no unique identifiers would be gathered. No incentives were provided. The survey listed current household members by sex, and asked who had lived in this household on January 1, 2002. The interviewers then asked about births, deaths, and in-migration and out-migration, and confirmed that the reported inflow and exit of residents explained the differences in composition between the start and end of the recall period. …. Deaths were recorded only if the decedent had lived in the household continuously for 3 months before the event. Additional probing was done to establish the cause and circumstances of deaths to the extent feasible, taking into account family sensitivities. At the conclusion of household interviews where deaths were reported, surveyors requested to see a copy of any death certificate and its presence was recorded. Where differences between the household account and the cause mentioned on the certificate existed, further discussions were sometimes needed to establish the primary cause of death.”


Now, it’s tough to compare different cultures, but when I worked on the health surveys in Liberia, we’d figure on maybe 3 or 4 good interviews per day, by the time we were satisfied that the interviewees were understanding the questions correctly and we were understanding the answers correctly (and we always had at least one interviewer who was a native speaker of the dialect.) Canned political surveys in America tend to take over 5 minutes each, even though the interviewees pretty well know what to expect in terms of the questions, and the surveyors have no need to verify things like death certificates.


So, 15 minutes or so per survey? I guess it’s possible, since some might be very easy (“All six of us have lived here for many years, and no one has died”), but I’m suspicious whether the implied care was actually taken in the interview process.


D) Due to safety concerns, the survey teams were apparently allowed great latitude in changing the pre-determined cluster to a more convenient one.


In terms of statistical validity, this point is crucial. The article states:


“Decisions on sampling sites were made by the field manager. The interview team were given the responsibility and authority to change to an alternate location if they perceived the level of insecurity or risk to be unacceptable.”


The authors give us no information about how often these changes were forced to be made, and absent that information, this survey is, simply, worthless. No amount of advanced statistical massaging can fix a sampling of convenience. So, did the violence in Iraq force one change, two changes, forty changes? We don’t know. But what we do know is that there is a clear admission of selection bias in the sampling. Given the sectarian tensions in Iraq, even granting the alleged professionalism of the canvassing teams, it is impossible to tell the impact of these biases. The implication in the report is clearly that more deadly areas were underrepresented, but were more distant (possibly safe) areas also selected against because of the level of risk required to reach them? Were teams of Shia (resp. Sunii) doctors afraid to enter areas where they thought themselves unwelcome? Did coalition roadblocks or bombing campaigns lead to certain areas of the country being off-limits? I find it very troubling that while the authors of the article go out of their way to mention anecdotes like the fact that households where everyone was killed were not counted, and that some interviewees may have been afraid to admit that they have had family members killed, this essential bit of information (“how often were the survey teams forced to deviate from the pre-determined cluster, and what procedures did they implement to attempt to insure that an equally representative cluster was selected”) was left out of both the article, and the appendices (at the versions I’ve found. I’d appreciate it greatly if someone could point me to this information, if it’s published.)


Once again, it would be easy to jump to the conclusion that any deviations would lead the estimates to be low (this is clearly the authors’ implication in their wording: “if they perceived the level of insecurity or risk to be unacceptable"), but any deviations of this sort remove the survey from the realm of statistical science, into the realm of conjecture, anecdote or advocacy.


E) Given the freedom apparently allowed the survey teams to deviate from pre-selected cluster sites, and to determine the starting point for the cluster (on the street), as well as the above-mentioned concerns about veracity, the fact that there were only two survey teams involved in the entire survey, and that these teams had only two days of training, leads to the fact that any selection bias introduced by the survey teams will skew the results greatly, all in the same direction. If there were many teams, we might expect that some might be making selections that (consciously or unconsciously) minimize the reported number of deaths, while others might be making selections that maximize them, and others might be making selections that were essentially neutral. Given that there were only two teams, these biases have much less chance of canceling each other out, and much greater risk of increasing the actual margin for error.


Dr. Donald Berry, the chairman of the Applied Statistics and Bioscience Department at the University of Texas-Austin, put it this way:


“Selecting clusters and households that are representative and random is enormously difficult. Moreover, any bias on the part of the interviewers in the selection process would occur in every cluster and would therefore be magnified. The authors point out the possibility of bias, but they do not account for it in their report.” (see link below)


3. The analysis of the results:


Here’s where the article is apparently on its most solid ground, but since none of the methodology involved in the analysis has been published, it’s hard to say. I’m guessing that this would be what any peer review would concentrate on, and given the quality of statistical software, it’s hard to make mistakes in statistical analysis. I’d be surprised if there were any grave flaws in analysis that I could uncover made by a PhD in statistics, which I definitely am not. As Mark Chu-Carrol notes at GoodMath/BadMath, it’s surprisingly in this area where most of the attacks have concentrated, and, consequently why most of the attacks can be dismissed as failures on the critics’ part to understand statistics.


Be that as it may, the authors do provide enough detail for me to find one criticism in their analysis, which they themselves allude to, but attempt to minimize:


"The population data used for cluster selection were at least 2 years old, and if populations subsequently migrated from areas of high mortality to those with low mortality, the sample might have over-represented the high-mortality
areas."


“[I]nternal population movement would be less likely to affect results appreciably [than emigration from Iraq.]”


As I pointed out in an earlier post (Part VI), the effects of faulty population estimates (due to massive internal migration) can have considerable impact on the extrapolations, because they have essentially double impact – first in making some more violent areas more likely to be sampled than their current populations would warrant, and then again, in calculating the estimates, since the same inflated numbers would be used to multiply out the projected values.


Supporters of this study have latched onto this criticism, accepting it (for the sake of the argument) and then pointing out that even still, it would only lower estimates by a few percent (even lowering it by 25% would still leave the estimates many times higher than others, after all), and so I was a bit hesitant to bring it up, as providing an opportunity to ignore the other concerns that, if valid, go to the heart of any legitimacy whatsoever for this study. But, since I noticed it, and it seems a true potential for error, I’m pointing it out, again.


My purpose, throughout this critique, is not to claim that particular errors in the study would lead the reported results to be too high, or too low, or balance out. As I always tell my students, if you really knew the effect of a bias, there’d be no need to do the study to begin with. You could just use your amazing reasoning powers and puzzle out how many deaths there really have been in Iraq, due to the Coalition’s actions, and then yell at everybody about how smart you are, and then they'd all believe you. My purpose is to point out the places where this study failed to use good statistical methodology, and to show up evidence that leads me to conclude that the survey teams’ reports, themselves, are suspect. These suspicions (about the survey teams) are not necessarily grounds to deduce intentional bias. From personal experience, I know that many amateur data collectors under-report the difficulties they have in obtaining responses, believing that a higher non-response rate reflects negatively on their own skills, and over-report things like how many deaths they were able to validate by certificate for the same reason. Further, the less thorough the interview process, and I’ve pointed out how quickly they must have been conducted, the more the interviewers’ biases influence the reported responses, even when the survey teams believe they are recording the results fairly. Finally, given the extremely high sectarian tensions in Iraq, it would seem unlikely that a mere two teams of 4 physicians each, given a high degree of selection autonomy, would produce unquestionably unbiased results under the hectic conditions they were experiencing in Iraq.


Two final notes about those death certificates: 1. The authors, it seems to me, do a good job of explaining why we would expect to see a high discrepancy between the number of death certificates that family can produce, and the number of officially recorded deaths at the national level. What they do not address, is why no attempt was made to double-check the totals locally, at least in areas of less chaos. This would have provided a good check on the representativeness of their sample. Steven Moore makes a similar point, regarding basic demographic information (which a bunch of his critics in the blogs have misunderstood entirely.) Had the surveyors conducted a basic survey of demographic data per household (men, women, old, young, whatever), this could have been used to compare with the other population estimates to get a check on whether their particular clusters were, at least in these respects, representative. Instead, they were left, far too often, with no legitimate means of checking for representativeness of a particular cluster.


2. The question must be addressed as to whether there would be any incentive for Iraqis to falsify death certificates. I don’t know the answer to this. But it is an important question. Given the corruption, chaos and confusion that is a fact of life in Iraq today, it would be very easy to forge death certificates, and if there is any market for such forged documents, we must assume that they exist in great numbers. Are coalition forces making cash payments for collateral deaths? Are families hiding members for their own safety by falsifying death records? I don’t know the answer to these questions, but it would be foolish to accept the validity of the certificates without an analysis of the incentive to falsify them. Again, I speak from experience with West African nations, where the levels of corruption and confusion are probably not as high as currently in Iraq, but where, if there’s a need for an official document, it can always be created, for the right price.


Tuesday, October 17, 2006

 

Iraqi Death Survey, Final

I had several more posts prepared, but why bother, when the Iraq Body Count website has done a much better, more thorough job than I could:


http://www.iraqbodycount.org/press/pr14.php


Thanks to all who looked over my material.


Notropis


Update: Here's more from a real-life statistical researcher, courtesy of the Wall Street Journal:


http://www.opinionjournal.com/editorial/feature.html?id=110009108


Friday, October 13, 2006

 

Iraqi Death Survey Part VI

The effects of migration on the extrapolated numbers


The data used to determine the number of clusters per Governorate, as well as the probability of a particular Administrative Region's selection within the Governorate, and to extrapolate to the final 600,000+ figure come from 2004 estimates of Iraqi population.


There are several things to think about here.


First of all, how accurate are 2004 population estimates likely to be, to begin with? Census data from the Saddam era has every reason to be suspect, and in so far as the 2004 estimates base calculations on pre-2003 census data, they are likely to be flawed. Moreover, independent estimates made in 2004 are likely to suffer from many of the same uncertainties present in this study, namely logistical difficulties in conducting surveys, imprecise administrative records and the like. There's simply no way to know how good these 2004 estimates are, and every reason to believe that they are rough estimates, at best.


Be that as it may, and assuming for the moment that they are accurate, what effects do massive displacement since that time have on the extrapolated numbers?


According to official reports , over 180,000 internally displaced refugees were reported just between the months of February and June of 2006. Undoubtedly those not registering pushes the number much higher. As I pointed out below, the survey methodology means that these displaced refugees had very little chance of being surveyed. But in addition to that, their migration is sure to skew the analysis of the data.


The authors acknowledge as much in their paper:


"The population data used for cluster selection were at least 2 years old, and if populations subsequently migrated from areas of high mortality to those with low mortality, the sample might have over-represented the high-mortality
areas."
Well, not just over-represented in sampling, but also over-estimated in projections.


In addition, of course, emigration from Iraq entirely would cause the estimates to be overstated by a corresponding amount (if the population were only 22.5 million, rather than 25 million, for example, then the true extrapolated estimate would have to be revised downward by 10%)


The authors, however, also make this very misleading statement about internal migration:


“internal population movement would be less likely to affect results appreciably [than emigration from Iraq.]”


Perhaps less likely, but the effect could be considerable: Consider the following simplified (and exaggerated) example:


Suppose there are 2 regions, each with population = 1 million in 2004, and suppose that, from 2004 to 2006 one of the regions is subjected to extreme violence, while the other is not. Suppose that this causes 50% of the population (500,000) to move from the region of extreme violence to the region without.


Now, suppose a survey is done in the two regions, where we find that the violent death toll in the war-torn region is 10 per 1000, while in the more peaceful region it is 2 per 1000 (in the latest 1 year period.)


Assuming that the surveys are accurate, we would see that:


Actual deaths in war-torn region: 10 per 1000 x 500,000 = 5,000


Extrapolated deaths in war-torn region: 10 per 1000 x 1,000,000 = 10,000


Actual deaths in more peaceful region: 2 per 1000 x 1,500,000 = 3,000


Extrapolated deaths in more peaceful region: 2 per 1000 x 1,000,000 = 2,000


Total actual deaths: 8,000


Total extrapolated deaths: 12,000


Difference: 4000 deaths or an over-estimate of 50%.


Now, the true changes from 2004 to 2006 are liable to be far less than the 50% in the example. However, there is also the multiplier effect of overestimation in sampling combined with overestimation in extrapolation:


Suppose that a city had a population of 50,000 in 2004, but due to a flare-up of violence, half the people left the city by 2006 (this is NOT at all unlikely, there are reports of entire cities becoming ghost towns overnight, due to the actions of the various militias and insurgent groups, see, for example, Fallujah.) Even though its actual 2006 population was 25,000, it would have twice the likelihood that it should have of being selected, based on its current population. And, given that it had suffered this tremendous out-migration, it would be far likelier to be in a very violent area, contributing higher than representative numbers, which in turn get multiplied by a higher than correct factor.


On the other hand, if it were in a very violent area, it's quite possible that the survey teams would simply have decided that it was too risky to get there, and selected, at their whim, another, safer place (which, again, takes this survey completely out of the realm of statistical analysis, without doing a psychoanalysis of the survey teams, and attempting to massage the data in some way to compensate for their selection biases.)


Again, the purpose of these analyses is not to show that the actual numbers are higher or lower than the survey's estimates, but rather to analyze the many flaws in both the methodology and interpretation of the survey which may lead it to be not especially meaningful.


Update: BBC has more on Iraqi displacement estimates here.


 

Iraqi Death Survey Part V

An opinion from an actual expert confirms my problems with this survey.

The Chairman of the Department of Biostatistics and Applied Mathematics at the University of Texas MD Anderson Cancer Center, Donald Berry, has this to say:


"Selecting clusters and households that are representative and random is enormously difficult. Moreover, any bias on the part of the interviewers in the selection process would occur in every cluster and would therefore be magnified. The authors point out the possibility of bias, but they do not account for it in their report."

and


"Incorporating the possibility of such biases would lead to a substantially wider range, the potential for bias being huge. Although there is no formal way to address bias short of having an ‘independent body assess the excess mortality,’ which the authors recommend, the lower end of this range could easily drop to the 100,000 level."


Sort of what I mentioned in my second post, items 4 - 6. But Dr. Berry notices a further problem with the teams' selection bias, and that is that, given that there were only two teams of surveyors, any selection bias will be consistent throughout the survey, skewing all results in the same direction and further widening the true confidence interval.


Also, I don't know if he's an expert or not (I know nothing about him), but Tim Blair had the same thought about the death certificates.


Thursday, October 12, 2006

 

Iraqi Death Survey Part IV

What about the death certificates?


According to the article, 80% of the deaths identified in the survey were confirmed by the presence of a death certificate. That's a good thing. That helps A) to make sure that some deaths aren't counted multiple times as being misremembered members of multiple households, B) to establish time of death, and C) to help determine cause of death, among other things.


But here's my question: If there's a death certificate, doesn't that make it an officially recorded death? That is (I'm assuming and asking), if the head of household has a copy of the death certificate, wasn't another one filed at the appropriate administrative office, by whoever made out the certificate? If this is the case, why doesn't it end up on the official Iraqi government death toll tally?


OK, I can think of several reasons why that might not happen. But the record should still be available locally. So wouldn't it have been/be a good check on the survey to contact the local administrative office, look at the death certificates there, and see whether the actual numbers at the administrative office come within the expected 20% of the projected numbers from the survey of 40 households?


I'll grant that that might not be possible on a larger scale, or in every situation, but there's something troubling about a survey that claims to have discovered hundreds of thousands of unreported deaths, by looking past official channels, yet that also claims that 80% of those deaths that it discovered have official death certificates (so clearly were not unreported.) Again, I understand that in some individual circumstances, administrative corruption or confusion might not make this possible, but you'd think it would make sense to double-check this where possible.


A positive confirmation would go miles to validate the methodology of this survey.


 

Iraqi Death Survey, Part III

Summary of the first two posts:


I find some major reasons to be skeptical of the data collected, and the method of collection.


The first three are questions that lead me to doubt the credibility of the survey teams, in spite of the authors’ assurances:


1) 99% of residences had people at home?
2) 98% of residences had heads of household (or spouses) at home and willing and able to provide the requested information?
3) Thorough explanations and double-checking and proofs of information were possible in well under ½ hour per interview?


The next three lead me to question the inherent bias allowed in the data collection:


4) Samples of size 1 - 3 (the number of clusters chosen from all but two of the governorates) chosen “proportionally to population size” can’t possibly reflect the actual urban-rural makeup of an individual governorate, and given the small number of governorates, won’t reflect the makeup, even taken in aggregate. You can "bootstrap" all you want, but you can't bootstrap with a sample size of 1 (and yes, one cluster is, in many respects just "1.") Moreover, no amount of bootstraping will make a demographic that's completely absent (like "rural Iraqis in X province") suddenly appear.


5) Persons living on non-officially-recognized and named streets were systematically ignored, biasing considerably against those living in urban slums, internally-displaced refugees, and rural Iraqis, and this influence was completely overlooked.


6) The field manager could select sampling sites at whim, and these could be changed at the “responsibility and authority” of the interview team.


I'm sorry, but, even with the amazingly advanced statistical analysis done on the numbers, these fundamental questions about the worth of the data collected undercut any validity of the survey. "Garbage in = garbage out" remains as true as ever, as does "data of questionable validity in = conclusions of questionable validity out" (although that's not nearly so catchy.)


 

Iraqi Death Survey Part II

In my first post, I simply mentioned 3 items that, from my own experience conducting surveys, seemed surprising, at best, or downright unbelievable, at worst, regarding the apparent success and efficiency of the survey teams. These questions did not address the methodology of the survey, they simply cast aspersions on the integrity of the survey teams. Quite simply, I find it particularily hard to believe that you can achieve a 98%+ response rate under conditions as difficult as those in Iraq (I've never gotten that kind of result in Minnesota), and that you can conduct the careful, thorough, sensitive surveys that the article implies it conducted, in what must average well under 1/2 hour per survey (at 40 per day, it would probably actually add up to less than 15 minutes per household of actual survey time.)


But those are just skepticisms. In this post, I will bring up three problems that I find with the sampling methodology. In future posts I will look at some difficulties I find in the methods of extrapolation in the conclusions.


1. The second stage of the sampling is troubling:


“At the second stage of sampling, the Governorate's constituent administrative units were listed by population or estimated population, and location(s) were selected randomly proportionate to population size.”


Why should this be troubling? Well, if a few hundred selections were made per Governorate it wouldn’t be, but given the fact that in all but two governorates, 3 or fewer clusters were selected – in many cases only 1, the chances that any smaller towns (or administrative units) anywhere in Iraq might be selected are diminishingly small. It would be interesting to compare the number of small town or rural household clusters selected with the overall population of rural Iraqis to find out whether this underrepresentation were substantial, as I guess that it is. I don’t have access to the raw data, so I can’t tell.


2. The sampling method used carries incredible inherent bias:


“The third stage consisted of random selection of a main street within the administrative unit from a list of all main streets. A residential street was then randomly selected from a list of residential streets crossing the main street. On the residential street, houses were numbered and a start household was randomly selected. From this start household, the team proceeded to the adjacent residence until 40 households were surveyed.”


This may seem fine, and it would be, in suburban U.S. However, in Iraq, with hundreds of thousands of internally displaced persons, and the chaos of many others living in temporary housing and the squalor of newly-constructed unofficial slums, none of these persons had any possibility of being surveyed. Neither did any who live on any unnamed or unrecognized streets. And again, it is quite possible that many smaller towns don’t even have any officially named main street, so wouldn’t show up on the list at all. What effect would this bias have? There’s no way to know, but it is clearly a built-in bias, systematically selecting against some rather large demographic groups.


3. In addition, it is clear that at the whim of the interview team, these pre-selected sites could be changed:


“Decisions on sampling sites were made by the field manager. The interview team were given the responsibility and authority to change to an alternate location if they perceived the level of insecurity or risk to be unacceptable.”


This is a clear admission of selection bias in the sampling. Given the sectarian tensions in Iraq, even granting the alleged professionalism of the canvassing teams, it is impossible to tell the impact of these biases, but their existence is unquestionable. The implication in the report is clearly that more deadly areas were underrepresented, but were more distant (possibly safe) areas also selected against because of the level of risk required to reach them?


To summarize, then, the sampling methods reported systematically select against three groups (that I can think of): A) rural Iraqis (both in the method of selecting administrative units, and in the method of selecting particular streets) B) internally displaced refugees (since many live in camps, and not on named streets) and C) urban slum dwellers, who may often also not live in organized households on named streets.


The fact that the previous survey, done via GPS location, mirrors the present one in the 2003-04 results merely indicates that, to the extent that the present method may be less representative than the previous one, the biases did not effect the death toll results from 2004; it says nothing about whether these biases are still unimportant. After all, the nature of the situation has changed significantly in Iraq; that's one of the main conclusions of the report. If the recent upsurge in violence has been predominantly urban rather than rural, these sampling methods might tend to overestimate the results. If it has been predominantly in the urban slums or refugee camps, the methods might underestimate them. There is simply no way to tell beyond educated guesses, which take us completely out of the realm of statistical science. In any event, these obvious biases are, in my view, important enough to raise serious doubts about the validity of the conclusions.


More troubling, and carrying us way beyond anything that can be "fixed" by more statistical analysis, is the third point; namely, that in spite of the authors' best intentions to randomize the clusters, what they ended up with was, in point of fact, a sample with a high risk of personal selection bias. As understandable as concerns for safety are, statistics has no compassion. If you modify your selection based on personal considerations, you lose your claim to statistical validity.


 

Iraqi Death Survey Part I

I'm a statistics teacher, with only limited experience conducting surveys, and by no means a statistician, but, in perusing the new Iraqi Death Survey, “Mortality after the 2003 invasion of Iraq: a cross-sectional cluster sample survey” by by Bunham, et al, in the recent Lancet, I've come across several things that disturb me in the reports of data collected, the method of data collection and the method of extrapolation that I'd like to lay out in several posts, here. Then, perhaps, I can entice some people who know more than I do, or just think more clearly than I do, to look at these questions that I have, and address them, either supporting my concerns, or allaying them -- I don't much care which.


I don't really have a horse in this race, after all.


So, to the first, and probably most tenuous concerns, the matter of the believability of several items regarding the data collection.


1. “In 16 (0.9%) dwellings, residents were absent.” When or where can you conduct a survey and find over 99% of the potential respondents at home? Especially in Iraq, where there are, according to NPR, two hundred thousand registered internally displaced persons (link ), and an unknown, but presumably much higher, number of unregistered. So did the surveyors simply skip houses that looked “obviously vacant?” Assuming they did (and this would be a gross breach of statistical process, allowing one’s own biases to influence which houses get sampled, but this happened anyway, see the next post; whether or not they did is hard to tell from the reported procedure -- “Empty houses or those that refused to participate were passed over until 40 households had been interviewed in all locations” -- are the “empty houses” the same as the ones reported above where “residents were absent,” or does “residents were absent” refer to apparently occupied houses where they just didn’t find anyone at home), it still seems amazing to find over 99% of potential respondents at home (and all the more so, since they apparently had to canvass throughout the day – see below.)


(Further puzzling is the statement: “Households where all members were dead or had gone away were reported in only one cluster in Ninewa and these deaths are not included in this report.” Does this mean that in only one cluster were any vacant houses encountered? I can find more than that in upscale suburbs of Minneapolis.)


2. Only “15 (0.8%) households refused to participate.” Now this could be a sign that Iraqis are concerned to get the truth of their plight out, and that’s great. But putting this together with (1) above, we find that in a remarkable 98%+ of the potential households, the head of household or spouse was available and willing to answer the questions (according to the methodology, those were the only ones surveyed.) And this result was achieved, according to the article, on the first pass, without ever re-contacting a household, which the survey teams deemed "too dangerous."


3. In reading the methodology, the impression is given that the surveyors did an incredibly thorough, careful, and considerate job in their work. Yet we read that the teams each consisted of four individuals, who “could typically complete a cluster of 40 households in 1 day.” Now, it’s not clear whether the teams stuck together, or split up into 1s or 2s, but, given time for travel, and assuming 8 hours of surveying time available in a day, if they worked in pairs (which would make the most sense, one male and one female), we find that they spent less than half an hour (24 minutes), on average, per household, yet we’re assured that the following protocols were strictly observed:


“The survey purpose was explained to the head of household or spouse, and oral consent was obtained. Participants were assured that no unique identifiers would be gathered. No incentives were provided. The survey listed current household members by sex, and asked who had lived in this household on January 1, 2002. The interviewers then asked about births, deaths, and in-migration and out-migration, and confirmed that the reported inflow and exit of residents explained the differences in composition between the start and end of the recall period. …. Deaths were recorded only if the decedent had lived in the household continuously for 3 months before the event. Additional probing was done to establish the cause and circumstances of deaths to the extent feasible, taking into account family sensitivities. At the conclusion of household interviews where deaths were reported, surveyors requested to see a copy of any death certificate and its presence was recorded. Where differences between the household account and the cause mentioned on the certificate existed, further discussions were sometimes needed to establish the primary cause of death.”


And further on, we read that official death certificates were produced for 80% of the deaths recorded, all in an average of less than half an hour per interview. I'll have a few more questions about these death certificates in another post.


This page is powered by Blogger. Isn't yours?