CROSSTABS

The Bread and Butter of Survey Research

05-15-07

Cross tabulation of categorically coded data has been with us even before there were computers. The U.S. Census Bureau coded the responses they obtained on paper questionnaires, and entered this data into punch cards which were the size of a dollar bill (at least at that time). The punch cards were then run through mechanical sorting and tabulating machines which allowed for both single variable frequency counts and joint frequency counts for two variables. The results of odometer like counters from these tabulating machines were then copied down and ultimately presented as Crosstab Tables. IBM got started by making and selling such tabulating machines.

Things have changed a lot since then. However, they haven't really changed as much as the advances in computing technology would lead us to expect. Firms that offer Code and Tab services still tend to use the same old programs originally designed for mainframes (one might remark 'Why not... they still work'). They deliver 'tree killer' sized hard copy reports of crosstabs whose value appears to be based as much as anything else on their weight. Research departments analyze this voluminous output and then select a 'meaningful/significant' subset of these tables. They then 'pretty print' these tables along with textual analysis to present to their clients.

With the widespread availability of personal computers and the WEB, various adjuncts to Crosstabs are now available. These range from Chi-Square and other Contingency Table statistics and measures of association, to Log-Linear Analysis, and Correspondence Analysis. There are also various Graphics options for viewing bi-variate frequencies depicted in Crosstabs.

While most Statistical Analysis Packages include Crosstabs (some in fairly crude form), there are also stand-alone Crosstab Programs for doing all of this, that can get to be tens of MEGABYTES in size (they cover everything from questionnaire design to processing and analysis, to presentation of results).

All of these advances have had limited use and popularity in the Research Community. Market Reserach firms tend to be conservative almost to the point of being LUDDITES when it comes to PC and WEB based software, and tend to stick with the 'plain vanilla' crosstabs they've used for decades.

Just about ANYBODY can obtain FREE software to conduct WEB based surveys, and to use FREE software to create CROSSTABS and analyze the results. The 'Traditional' Code and Tab service providers have apparently been very MINIMALLY affected by this. So it goes.

The Survey Research community is apparently not a 'Progressive' or 'Leading Edge' factor when it comes to computing innovation.

X-TABS DRIVING (or following) SOCIOLOGICAL IDEOLOGY

To a significant degree, Sociology at least in its quantitative manifestations, has always been driven by an attempt to make a case for CLASS DIVISIONS determining everything else. CROSSTABS are the ideal tool for trying to make such a case. One can CROSSTAB just about anything by DEMOGRAPHICS.

Even if nothing is going on for REAL, if you grind out hundreds of X-TAB tables with DEMOGRAPHICS as one of the two variables, you're likely to find SOMETHING by pure chance (remember that 1 out of 20 chance relationships in hundreds of X-TABS will look like they're SIGNIFICANT whether they REALLY are or NOT). This in turn leads to grandiose EXTRAPOLATIONS that DEMOGRAPHY IS DESTINY. And of course what's implicit in this is that CLASS DIFFERENCES DRIVE EVERYTHING ELSE. That's the 'Sociological Dialectic' and DOGMA.

There may well be some instances where this is valid. But there are MANY other instances where it's NOT. The CROSSTAB TOOL won't make it so one way or the other.

UNDER THE HOOD... COMPUTATIONAL DETAILS

While the basic mechanics of creating cross tabulations are fairly simple (after all as mentioned previously, even mechanical tabulating and card sorting machines could do it) the procedure is hard to do using an APL 'one liner'. A 'five liner' using the 'dreaded explicit loop' (explicit loops are supposed to be bad form in APL) is what we've used instead. The advantage of this approach is of course that it doesn't gobble up as much of the user's WORKSPACE as a 'one liner' might do. This is because a 'one liner' using INNER and OUTER array products takes up tremendous amounts of temporary space.

Even with FIVE LINES (as opposed to one line) of source code, one is still left wondering 'Is that all there is?' to a paradigm as important as CROSSTABS.

DATA REQUIREMENTS
The input for this procedure is stored in a data array consisting of two columns and as many rows as there are observations. We assume that the first column represents the coded entries for the ROW of the Crosstab Table, and the second column contains the coded entries for the COLUMN of the Crosstab Table. We further assume that the codes are integers and there are not much more than 10 or 15 of them or the Crosstab table would then have many zero or small entries.

In the example shown below, there are SIX pairs of observations in the matrix labeled X. The first column (the ROW column) has numerical codes ranging from 1 to 4, while the second column (the COLUMN column) has codes ranging from 1 to 3. This will result in a Crosstab Table of FOUR ROWS and THREE COLUMNS.

The function XTABS takes the raw data and sorts it into a Crosstab table. Think of the numerical pairs of codes for each observation as subscripts of the Crosstab Table. Every time a particular pair of codes is encountered, they point to the cell of the table with those particular subscripts and that cell is incremented by one.

The output table represents a table of counts, i.e. how many times each row and column subscript combination was encountered in the data. In the example, no cell has more than one count, but a larger data set would undoubtedly result in higher counts per cell.

What's shown here is but the barest rudimentary essence of what's involved in Crosstabs. There are also operations that produce row and column totals, Chi-Square and other Contingency Table statistics, Log-Linear analysis, Correspondence Analysis, as well as text labels for various components of the table, cleaning and pre-processing of the raw data to provide the necessary integer codes... etc. ad infinitum.

Additional information on the Contingency Table module (module #5) used by Spring-Stat(c) our Statistical System can be found here:

SEE: SPRING-STAT(c) MODULES

There are several ways to implement such computations on your own computer. Link to our Array Processing Resources page to find out more.

SEE: ARRAY PROCESSING RESOURCES

o Return to: 'SPRING SYSTEMS HOME PAGE'