Good, Bad, and Ugly

A classification algorithm, or classifier, “learns” the functional relationship between a set of attributes (the independent variables) and the class labels from some binary or multi-categorical class variable (the dependent variable). The purpose of the classifier is to assign the correct class label based on the patterns identified between it and the attributes observed in the training data. A classifier is considered effective if it can assimilate the “knowledge” from the training data to correctly assign the class label to other instances described by those attributes in the out-of-sample data. Because we rarely identify all the attributes that routinely cause some outcome of interest, there is always an element of uncertainty to this process.

It is prudent to approach uncertainty in the world with trepidation on the one hand and probability theory on the other. Probability is a mathematical theory whose axioms are tested, proven, and generally not disputed. The assumptions of probability theory, however, can be quite limiting, especially when it comes to aggregating probabilistic evidence across a range of factors. Bayesian classifiers, for instance, rely on the assumption of independence among factor attributes (i.e., independent variables), an assumption that rarely holds for phenomena of interest in the social world. This assumption, routinely violated in naïve Bayesian approaches, limits the number of mathematical tools available with which to aggregate evidence across multiple attributes.

Fuzzy Analysis of Statistical Evidence (FASE), developed by Chen (2000), is an extension of a theory Chen (1995) proposed in earlier work. FASE is a hybrid method that incorporates elements from statistics, possibility theory, and fuzzy logic. It is based on the principle of inverse inference and so has properties similar to Bayesian classifiers. The principal difference is that classification is performed on the possibility rather than the probability measure. Though it involves a softer scale of measurement, possibility theory parallels probability theory; however, it is more flexible than its better-known counterpart. There is no need to consider any prior, and its softer scale of measurement is compensated by a wider array of mathematical tools it draws upon to aggregate evidence obtained from multiple indicators of the phenomenon of interest (see Dubois and Prade 1988; Chen 1995, 2000).

With FASE, the conditional probabilities from each attribute are normalized into possibilities, and the possibilities are combined with a t-norm from fuzzy set theory (Chen 2000). Given a set of observations across multiple factors, this allows us to compute the likelihood of a class outcome without violating probability theory’s restrictive assumptions. Once the attribute possibilities have been combined into overall possibilities for the class label, the class label possibilities can be normalized into probabilities for a more familiar interpretation. We illustrate the application and interpretation of FASE by example. Those interested in its mathematical foundations are encouraged to review Chen (1995, 2000).

Consider the following illustration: Let D be some dependent variable with four classes (D₁, D₂, D₃, D₄). A set of attributes, A₁, A₂, A₃, which can take a range of values, are thought to be correlated with D and are available in the form of a historical database. Given a set of observations on A₁, A₂, A₃_, for some Transaction (or case) T, evaluated in a historical context, what is the likelihood of each of the four classes of D? This question is analogous to predicting the correct instability intensity level from a set of country macro-structural attributes. Using FASE, we approach the problem as follows.

First, we divide the data set into training and test sets using spatial or temporal rules and then either randomly or by selection. Second, we split the data for each attribute in the training set by each class label of D and estimate the class probability distributions. For discrete variables, the probability is estimated by the relative frequency in each category. For continuous variables, we estimate the probability density function using an Average Shifted Histogram (ASH), kernel method, or other suitable density estimator.[i] These probability distributions are the likelihood templates against which observations on the attributes in the test set are to be evaluated.

Third, using data in the test set, we evaluate how well the algorithm can classify on each of the class labels on D. To do that, for any observation on an attribute in the test set, we calculate the likelihood ratio across the class labels on D, based on the patterns observed in the class probability distributions from the training set. Suppose those probabilities are distributed across the class labels as they appear in the example in Table 5. Historically, in this example, Transactions with values on the order of magnitude comparable to those for A₁ and A₂ have been associated with the outcome D₁ 50% and 25% of the time respectively (note the data on A₃ are completely missing for purposes of illustration).

If A₁ and A₂ are not independent (and this is the assumption here), then we cannot aggregate the evidence across them to compute the likelihood of the class labels without violating the assumptions of probability theory. Therefore, in the fourth step, we normalize the likelihood ratios for each attribute into a possibility measure, using the most likely class label as the base. The possibility measures appear in each cell in parentheses. Thus, by definition, the most likely class label for any observed attribute value will always have a possibility score of 1. A possibility score of 1 is also assigned in those instances in which the data on an attribute for some Transaction is missing. Since in such instances, no class label is any more or less likely than any other, this ensures that missing data are treated as vacuous. [ii] The possibility measure is not neatly interpretable; one must consider the possibility and its conjugate (the belief measure) together (Dubois and Prade 1988; Chen, 2000). A possibility of 1 does not represent complete certainty about an outcome as it does in probability, but is rather only an imprecise indication of our belief in that outcome, relative to the other possibilities; however, the possibilities do reflect ordinal properties consistent with probability theory.

In the fifth step, we aggregate the possibility measures across the attributes for each class label on D using the fuzzy set t-norm known as the Frank Rule (Frank 1979):[iii]

Where k is the number of attributes and s is an adjustment parameter that is set close to 0 if our independent variables are highly correlated, and close to (but not equal to)1 if they are independent (s is set to .01 here).[iv] This adjustment factor allows us to deal with multicollinearity in a realistic manner, avoiding the need to drop relevant indicators from the model that may contribute toward predictions on the dependent variable.

We apply the Frank Rule to the possibility scores for the attributes on each class label. This produces overall likelihood ratios, which, again, are normalized into overall possibility measures for each class label. The overall possibility measure of a class label indicates that class label’s likelihood, given a vector of observed (or forecast) attribute values. For ease of interpretation, these overall possibility scores for each class on the dependent variable can be transformed back into probability measures by straightforward normalization. Having applied this process to the example in Table 5, we see that D₁_,with a 60% probability, is the most likely outcome.[v]

In GBU, we validated 5- through 15-year global forecasts of country instability. The unrestricted FASE model used all of the independent variables described in that article. This was to distinguish it from the restricted FASE model, presented below, which relies on a limited set of non-collinear independent variables to facilitate a comparison with the multinomial logistic regression model. We used a split-sample validation design. To begin, the data for the 10-year period 1975-1984 were used as the training set. We applied the FASE procedure to the data in this training period to “learn” how different configurations and levels of country macro-structurals have been associated with different levels of instability. Then, using the data on country macro-structurals only for the period 1985-1999 (the test set), FASE classified the countries by their expected intensity levels of instability based on the historical patterns. We then compared how FASE classified each country with actual occurrences over the period 1985-1999 and computed some standard forecasting performance metrics. The performance metrics revealed how well FASE could learn the existing patterns and how robust the patterns were through time.

Tables 6 and 7 display output of this validation analysis for Colombia in 1992 and Kyrgyzstan in 1994. These forecasts were generated from patterns FASE identified over the period 1975-1984 and, therefore, represent 8- and 10-year validation forecasts, respectively. Tables 6 and 7 are designed to resemble the illustrative example in Table 5. The second column in each table displays the observed values for each of the macro-structural factors for that country in the year indicated. The possibility scores associated with each of the four class (or conflict type) outcomes are displayed in the cells.[vi]

The results in Table 6 suggest that, based on the macro-structural factors Colombia exhibited in 1992 and the decision rules articulated above, we would expect a moderate intensity instability (conflict type 2 or 3) to occur with an 83% probability. Colombia was engaged in a violent crisis (conflict type 3) in 1992—a counter-insurgency against the Revolutionary Armed Forces of Colombia (FARC) that began in 1964 and persists at present-- and nothing more serious, so we would regard this forecast as a correct prediction. Kyrgyzstan is an interesting case because it does not exist as an independent entity in the period covered by the training set. Also, data were only available on Kyrgyzstan for half the test set years (1991-1999). Therefore, its % of history in state of conflict variable, which is calculated based only on years in the training set, is completely missing. KOSIMO does not record a conflict for Kyrgyzstan in 1994 or, for that matter, for any other year between 1991 and 1999. The FASE analysis does indicate that based on the value of Kyrgyzstan’s macro-structurals in 1994, a conflict type 1 (none) is somewhat likely (41% probability), but no more so than a conflict type 4 (war), which has a 44% probability. Nevertheless, the probabilities on no two adjacent conflict levels breach the 67% threshold. So, by the strict decision rules governing this analysis, we could conclude that we are uncertain about what level of instability Kyrgyzstan was likely to experience in 1994-- a forecast that would be considered neither correct nor incorrect from an overall performance perspective.

In their re-analysis of the State Failure Project’s forecasts of state failure, King and Zeng (2001) offer a useful test to evaluate the accuracy of the probabilities of state failure derived from their models. The probabilities we estimate for different levels of intensity of country instability are normalized from the possibility measures, so their accuracy is a legitimate empirical question. Therefore, we apply the same test used by King and Zeng (2001) to evaluate the accuracy of these probabilities.

A probability that is accurate gives the fraction of times that a state with a given set of characteristics will experience a certain level of intensity of instability. To evaluate these probabilities, we placed them in bins of .1 width (e.g., 0 to .1, .1 to .2, .3 to .4, etc.) for each of the three conflict types used to construct the index of instability (or 4 categories if one counts the category reserved for no conflicts that occurred). Then, for observations falling within each bin, we compute the fraction of observations that experienced each of the three conflict types (or each of the 4 possible outcomes). If the probabilities are accurate, these quantities should closely correspond. For example, given all the country-year observations for which the model estimates a 30-40% probability of experiencing a crisis, we would expect 30-40% of the cases in that probability bin to actually experience a crisis as its maximum level of intensity of conflict; given all those observations in which the model estimates a 80-90% probability of war, we should expect 80-90% them to experience a war. We then compare these expected probabilities to the actual outcomes to assess their accuracy and model fit.

We performed this procedure for each conflict type in both the training (1975-1994) and test (1995-1999) sets to evaluate the accuracy of the probability estimates for no conflict, crises, violent crises, and war. Then we repeat the exercise on both the training and test sets to evaluate the probability estimates for none/low, moderate, and high intensity levels of instability, which are derived by aggregating the probabilities across the individual conflict categories according to the six decision rules described in GBU. Recall that if the combined probabilities of no conflict or a crisis occurring in a given country is 67% or greater, we expect that country to experience none/low intensity level of instability; if the combined probabilities of a crisis or violent crisis occurring is greater than 67% we expect a moderate intensity level of instability to occur; and if the combined probabilities of a violent crisis and war occurring are greater than 67%, we expect a high intensity level of instability to occur.

Figures 3 and 4 display the results of these analyses for the probabilities estimated in the training and test sets respectively. In both figures, graph a shows the correspondence between the disaggregated probability estimates (e.g., probabilities by conflict type) and actual outcomes. Graph b shows the same thing for the aggregate probabilities.

The closer the probability estimates are to the 45-degree line, the better the fit of the model and the more accurate the probabilities. In the training set (graph a in figure 3), we see that the probability estimates for wars and violent crises, though not perfect, are quite accurate. The probabilities of no conflict and crises, by contrast, are rather inaccurate. The green line doubling back on itself reveals that higher probabilities of crises actually correspond to lower actual occurrences of crises. The line associated with the probabilities of no conflict (below the 45-degree line) shows that we are under-estimating those countries that are likely to experience no conflict, and, therefore, over-estimating the number of countries likely to experience a conflict of some type (specifically, crises). This is not surprising since the precision scores that measure the rate of false positives generated by the model were the weakest of the three performance measures across all the validation analyses in GBU.

Turning to the aggregate probabilities, we see in graph b in figure 3 that the in-sample correspondence between the instability predictions and actual occurrences is quite accurate for probability estimates below 75%. The lines for probabilities of none/low and moderate intensity levels double back on themselves at high probability levels (above 75%). The out-of-sample aggregate probabilities (graph b in figure 4), are somewhat less accurate (as one might expect), but still informative. Taken together, these results demonstrate the challenge we have in generating reliable forecasts of the individual types of conflicts a country might experience. However, the approach we use here is very useful for generating accurate forecasts of the general level of intensity of instability a country is likely to experience.

We forecast that a country will experience a none/low, moderate, or high intensity level of instability if the combined probabilities of experiencing no conflict or crisis, crisis or violent crisis, or violent crisis and war is greater than 67%. The 67% probability threshold gives us reasonably high confidence that only one of the three categories is more likely than the other two possibilities. If the probability threshold is set too high (say, 90%) then overall accuracy may improve, but the number of “uncertain” (and therefore, uninformative) forecasts will also be high, because few observations will satisfy such a high threshold. On the other hand, if the probability threshold is set too low (say 25%), than multiple outcomes will satisfy the threshold and each may be no more likely than the other, and perhaps less likely than all other alternatives combined. Thus, the choice in this probability threshold is likely to have implications for model performance.

We therefore re-estimated the 1995-99 out-of-sample forecast using a truncated range of probability thresholds (from 52% to 82%) to determine how sensitive the forecast performance metrics are to the choice of this parameter. The results of this analysis are displayed if figure 5.

Figure 5 shows the overall accuracy, recall, and precision scores generated for the 1995-1999 out-of-sample forecast, broken down by probability thresholds (in 5% increments). For each probability threshold, we also compute the fraction of observations about whose likelihood of instability we are uncertain, due to the inability of the forecast to satisfy the threshold. Ideally, we wish to identify a probability threshold that generates good overall accuracy, recall and precision scores, but minimizes, to the extent possible, the fraction of uncertain (and, therefore, uninformative) forecasts. The results in figure 5 reveal that while the recall scores are stable across the range of probability thresholds, the overall accuracy and precision scores increase with each successive increase in the probability threshold. However, the fraction of uncertain forecasts also increases dramatically in correspondence with each successive increase in the probability threshold, illustrating the tradeoffs that operate. Above the 67% threshold, overall accuracy, recall, and precision do not appear to improve significantly enough to tolerate the substantially higher fraction of uncertain predictions. Below the 67% threshold, the fraction of uncertain forecasts is fairly low. However, we pay the price for this lower fraction of uncertain forecasts with diminished overall accuracy and precision scores. Thus, the 67% probability threshold we used for classification and forecasting in GBU is a reasonable choice.

Next, for purposes of comparison, we show how the FASE model performs relative to the technique most often used in conflict studies to examine dependent variables with multiple classes: the multinomial logistic regression model.[vii] Many of the independent variables used in this analysis are collinear. The estimates FASE generates are unaffected by mulicollinearity per se, since the t-norm used to aggregate the evidence across the independent variables contains a parameter to adjust during aggregation for the degree of inter-factor dependence. Multicollinearity does, however, pose a challenge to techniques like multinomial logit, which assume independence among independent variables. The presence of multicollinearity in a logit model has the practical effect of producing inefficient parameter estimates, depressed t-scores, and high standard errors for the regression parameters (Pindyck and Rubinfeld 1998, 95-98).[viii]

Thus, to circumscribe this problem, we used the collinearity matrices, stepwise regression, and bivariate chi-square statistics to identify the best multinomial logit model. Each candidate model, comprising alternative combinations of the independent variables, was then evaluated based on its ability to forecast out-of-sample. The best logit model contained the following variables: civil liberties index, youth bulge, GDP per capita, and % of history spent in conflict. Using these factors only, we repeated the 5 - through 15 - year validation exercise to compare a restricted FASE model with the multinomial logit model. These results appear in Table 8.

The restricted FASE and logit models perform almost identically well with respect to their average precision and overall accuracy scores (62 % precision, 78% and 79% overall accuracy, respectively). However, the restricted FASE model is superior with respect to its ability to correctly forecast the level of intensity of instability countries experience (73% vs. 69% average recall score).

Table 9 shows the classification tables for the restricted FASE and multinomial logit models associated with the analysis reported in the lower right-hand cell of Table 8 (1975-94 training set, 1995-99 test set). The restricted FASE model used values only on the limited set of macro-structural factors to correctly identify 74% of the countries involved in war (37 / 50), 87% of those involved in violent crises (93 / 107), and 61% of those involved in non-violent crises (51 / 83) during the period 1995-1999. The corresponding figures for the multinomial logit model are 69%, 67%, and 57%, respectively.[ix] The FASE model does produce a somewhat higher proportion of uncertain predictions, cases in which the likelihood of instability is roughly equally distributed across the intensity levels (15.7% uncertain predictions vs. 6% for the logit model). However, an interesting characteristic of the logit model is that it generates very few predictions for moderate intensity levels of instability at all; indeed, it primarily estimates that countries will be either very stable or very unstable.[x]

The restricted FASE and multinomial logistic regression models perform comparably with respect to their ability to forecast levels of intensity of country instability using macro-structural attributes. Therefore, any choice between them should be determined primarily by the preference of the analyst. We prefer the FASE model in the present context, the unrestricted FASE model in particular, for several reasons. First, FASE treats missing data values as vacuous, avoiding the need to delete cases for which values on the independent variables are missing. Second, FASE is much less sensitive than the logit model to the presence of multicollinearity. Therefore, variables that are partially, or even highly, correlated can be simultaneously modeled and assessed to maximize predictive accuracy. As a result, finally, the unrestricted FASE model performs somewhat better than the multinomial logit model, specifically with respect to its ability to predict the correct intensity level of instability with precision.

We use the unrestricted FASE model, with the full compliment of independent variables, to generate forecasts of country instability for every major country over the period 2000-2015. To do so, we use the entire historical database for the years 1975-1999 as the training set. Using the historical data as a baseline, the trend for each macro-structural factor is forecast for each country out to the year 2015. Based on the patterns FASE identifies in the training data, and given the values of the forecasted macro-structural attributes, we estimate the likelihood that a given intensity level of instability will occur in countries over the next 15 years.

Of the 12 factors considered here, 3--life expectancy, youth bulge, and infant mortality--were obtained from the U.S. Census Bureau. The Bureau has already generated annual forecasts on these factors for some 227 countries out as far as the year 2050, and it has done so using a fairly sophisticated methodology that takes into account trends and projections of other indicators, as well as a myriad of country-specific censuses, circumstances, and expectations—a methodology that goes well beyond the resources at our disposal.[xi] We use the Bureau’s forecasts on these factors for the country instability forecasts. Forecasts on the remaining factors are generated using Holt’s Two Parameter Method (Makridakis et al. 1983),[xii] a double exponential smoother, which is comprised of the following three equations:

Where S_t is a smoothed value, F_t is a forecasted value, m is the number of periods ahead to be forecast, and α and γ are the smoothing factors, each of which is bounded by 0 and 1. The technique is somewhat flexible because it allows trend values to be smoothed separately. Equation 1.1 adjusts S_t directly for the trend of the previous period, b_t_-1,by adding it to the last smoothed value S_t_-1. Equation 1.2 updates the trend, by taking the difference between the last two smoothed values.[xiii] A smooth, non-responsive forecast, one dominated by the series’ initial trend, is achieved when α and γ are set low. When both parameters are set high, the last few observations receive greater weight. The result is an erratic forecast (oftentimes, too erratic) that responds quickly to changes in a serie's trend.

For each country, we forecast through the year 2015 each of the nine macro-structural factors collected from sources other than the Census Bureau. This resulted in about 1,749 separate forecast analyses (8 factors for each of 159 countries, and an additional 3 factors to compute trade openness). Each of these forecasts was graphed, inspected for reasonableness, and adjusted if circumstances warranted.[xv] The forecasts on the macro-structurals were generated, recorded as a baseline value, and used to compute the likelihood of country instability. This “first cut” provides a scientifically derived guess about the trend each macro-structural factor might exhibit over the next 15 years given its past performance over, in most cases, a fairly long period of time. The country instability forecasts should be viewed as a scientific guess about where and when conflicts are likely to occur in the world over each of the next several years, based largely on the patterns and trends observed over the past 25 years. Additional analysis is required before these forecasts could be used as a basis for sound decision making. It is the instability forecasting approach, and not individual country forecasts, that is introduced here as one of several approaches that, taken together, may be necessary, useful and effective in anticipating conflicts and country instabilities.

The forecasts of country instability for the years 2005, 2010, and 2015 are graphically displayed in Figures 6 through 8.

These graphs depict regional trends and expected improvements in some countries’ prospects for stability. Most of the states expected to experience the highest intensity levels of instability are found in East Africa and South Asia. More specifically, the following countries are expected to spend a majority of the next 15 years in an environment conducive to high intensity instability:

Afghanistan, Angola, Burma, Burundi, Cambodia, Chad, Congo-Kinshasa, Ethiopia, India, Indonesia, Iraq, Laos, Nepal, Pakistan, Peru, Senegal, Somalia, Sudan, Tajikistan, Uganda, and Yemen.

These countries are expected to experience conditions conducive to moderate intensity instability over a majority of the next 15 years:

Algeria, Argentina, Armenia, Azerbaijan, Bangladesh, China, Colombia, Comoros, Djibouti, Egypt, France, Georgia, Greece, Guatemala, Haiti, Iran, Israel, Kenya, Lebanon, Liberia, Libya, Morocco, Nicaragua, Nigeria, North Korea, Oman, Papua New Guinea, Philippines, Russia, South Africa, Spain, Sri Lanka, Syria, Tanzania, Thailand, Turkey, and United Kingdom.

Finally, for the preponderance of the next 15 years, the model is uncertain about the level of instability that the following countries will experience:

Burkina Faso, Cameroon, Congo – Brazzaville, Malawi, Mali, Mozambique, Niger, Rwanda, Sierra Leone, Ukraine, Uzbekistan, and Yugoslavia.

Based on historical experience, about 50% of these 12 countries in the uncertain category will experience a conflict of some sort in any given year. Countries that do not fall into one of the three categories were either excluded from the analysis (if they had populations less than 500,000) or are expected to experience none/low instability over most or all of the next 15 years. These forecasts represent the statistical likelihood and not an unequivocal prediction of the different levels of instability that each country might experience. The accuracy of the forecasts is expected to degrade over the forecast horizon. Nevertheless, this approach provides an analytical framework for the periodic update of the forecasts as new data become available and the quality of existing data improves.

Table 10 shows a summary of the results for the individual years 2005, 2010, and 2015. In particular, the table shows how the model, based on the forecasts of each country’s macro-structurals, expects the good, the bad, and the ugly (as well as the uncertain) countries to be distributed over each of those years. Generally, it suggests that countries (compared to 1999) will become increasingly unstable at higher intensity levels out to the year 2005 or so. After 2005, a trend toward greater stability is expected to occur, especially in the years between 2010 and 2015. In particular, 5 countries are expected to make consistent improvements and move from ugly or uncertain to bad between 2001 and 2015. Along with the macro-structural rationales for the expected improvements, they include:

· Bangladesh: Infant mortality rate falls nearly 50%; GDP per capita increases; political rights improve (though its commitment to civil liberties declines somewhat); life expectancy improves; youth bulge declines; trade openness improves slightly.

· Guatemala: Infant mortality rate declines; political rights and civil liberties improve; life expectancy increases; youth bulge declines; trade openness improves somewhat; however, caloric intake declines slightly.

· Indonesia: Caloric intake increases; infant mortality declines; civil liberties improve; life expectancy improves somewhat; youth bulge declines; however, trade openness also declines somewhat.

· Iran: Caloric intake improves; infant mortality declines; life expectancy increases; youth bulge declines; however, trade openness also declines.

· Lebanon: Caloric intake increases; infant mortality declines; GDP per capita increases; life expectancy increases; youth bulge declines; trade openness improves somewhat.

Eleven more countries are expected to move from bad or uncertain to good. They include:

· Bosnia: Significant decline in youth bulge; GDP per capita, trade openness, and caloric intake also decline slightly; political rights and civil liberties improve somewhat.

· Congo-Brazzaville: Infant mortality rate declines; political rights and civil liberties improve; life expectancy improves somewhat; youth bulge declines; trade openness improves; however, GDP per capita and level of democracy also decline.

· El Salvador: Caloric intake improves; infant mortality rate declines; life expectancy improves; GDP per capita increases slightly; political rights improve somewhat; youth bulge declines; trade openness improves.

· Egypt: Caloric intake improves; infant mortality declines; GDP per capita improves; youth bulge declines; however, trade openness is also expected to experience a slight decline.

· Greece: Caloric intake increases; GDP per capita increases; youth bulge declines; trade openness improves.

· Israel: Caloric intake increases; GDP per capita increases; life expectancy improves; youth bulge declines; trade openness improves.

· Mozambique: Caloric intake improves; infant mortality declines dramatically; political rights and civil liberties improve; GDP per capita improves; life expectancy improves; youth bulge declines; trade openness improves slightly.

· Nicaragua: Infant mortality declines; political rights and civil liberties improve; GDP per capita improves; life expectancy improves; youth bulge declines; and trade openness improves; however, caloric intake declines slightly.

· Spain: GDP per capita increases 50%, youth bulge declines, trade openness improves.

· Thailand: Caloric intake increases; infant mortality declines, GDP per capita increases, youth bulge declines; trade openness improves.

In addition, it is expected that a few countries will move in the opposite direction toward higher intensity instability. This includes Albania (from good to bad) and Djibouti (from uncertain to bad). Political rights and civil liberties are expected to worsen in Albania; civil liberties worsen and the youth bulge increases in Djibouti.

ENDNOTES

[i] FASE is sufficiently robust that the choice of density estimation has only marginal consequences on the results (Chen 2000). The window size or band width is the more important choice; a narrow band can fit a training set with 100% accuracy. However, if the bandwidth is too narrow the model will not forecast well out-of-sample.

[ii] This is a nice feature of FASE and an advantage over other approaches to handling missing data such as listwise deletion, for instance, which can unnaturally skew a dataset. Density estimation is usually robust even if we are missing a small amount of data in the training test.

[iii] See also Chen (2000) for an evaluation of two other t-norms, which are special cases of the Frank Rule.

[iv] When s is set to very close to 1, the Frank Rule is nearly equivalent to a naïve Bayesian model.

[v] The current version of FASE is run using a C++ program. Earlier versions were run in a Microsoft Excel spreadsheet which, when used in tandem with a probability density estimator, is appropriate for small datasets. A moderately skilled analyst can easily program the spreadsheet to assess how sensitive the results are to changes in assumptions (e.g., what if an Asian economic recession occurs in 2005?) or to changes in the projected trends of individual macro-structural attributes.

[vi] Given an observed value on a macro-structural factor, recall that the conflict level most likely to occur is that which has a possibility score of 1.

[vii] The multinomial logistic regression model is estimated using SPSS 10.0.

[viii] Econometric Models and Economic Forecasts (4^th Edition). Irwin McGraw-Hill.

[ix] Note that these tabulations exclude the uncertain predictions.

[x] This was observed systematically across all of the logit classification tables not shown

here.

[xi] These data can be found in the Bureau’s International Data Base (IDB) at http://www.census.gov/ipc/idbnew.html. For a description of the methodology used to generate forecasts of fertility and mortality (among others), see McDevitt (n.d.) at http://www.census.gov/ipc/www/wp98.html.

[xii] Forecasting: Methods and Applications (2^nd edition). New York: John Wiley and Sons

(1983).

[xiii] For initial values, we set S₁= X₁, and b₁=X₂ - X_1.

[xiv] Note that this is relevant only to the true forecasts of country instability, and has no

bearing on the validation analyses.

[xv] Two general types of cases warranted adjustment due to reasonableness considerations. One case involved a resulting forecast trend that approached a nonsensical value (negative calories consumed, for instance). In such cases, the forecasts were forced to the globally observed minimum value (or maximum as the case might be) and straight lined through periods thereafter. The other involved cases in which only a few observations were available for a country with which to generate a forecast. This occurred most often in newly independent countries, especially in the Former Soviet Union (FSU). In such cases, the most recently available observation or averaged set of observations were used to develop a straightline forecast through the forecast period.