sort

5. Sorting and Classifying

"Bow, bow ye lower middle classes
Bow, bow ye tradesmen, bow ye masses"

W S Gilbert, 'Iolanthe'

References to this subject that have appeared from time to time are

"recognise geometrical features and properties, and use these to classify shapes"

"compare and order objects, and order events, without measuring."

"When pupils sort and classify objects, they are able to demonstrate the criterion they use."

2. Some uncertainty of intent and terminology is apparent - the word 'sorting' is usually taken to mean 'ordering' but here it might be meant as a synonym for 'classifying'. These are related ideas but differ in that ordering involves exact quantitative comparisons between objects and classifying, possession of qualitative similarities. Since references also occur in Level 1 it is possible that many of the children may have already been exposed to the topic in KS1 teaching but with the widespread introduction of computer-based methods for both classification and ordering, where precision is paramount and exhaustive verification impracticable, it is probably desirable to cover the subjects again to ensure that the basic issues have been understood and to illustrate them in relation to new ideas learned since.

3. Classification 7. Heirarchies

5. Criteria 8. Ordering

6. Applications 9. Techniques

3. Classification

Classification finds wide application in many diverse fields and is basically concerned with criteria for grouping objects or data which show similarities, often qualitative, into mutually exclusive categories. The motivation for this is usually (but not always) the expectation that members of the same category may have other features in common (ie other than those used for selection). Independent development in a variety of areas has created a somewhat confused terminology with different names being used for essentially the same technique or concept but this is unlikely to be a problem at Key Stage 2.

4. Two superficially similar techniques, dissection and identification, are sometimes included in Classification although generally regarded as distinct. Dissection (in this context) is division into categories by a superficial or arbitrary condition rather than by inherent properties eg Post Codes - most houses are closer to some houses with a different post-code than to most of those with the same code and there is no substantial reason to expect generally that houses with the same post code will share other characteristics to a marked degree. Identification relates to the assignment of new objects to previously defined categories, again with the hope that the new object may display properties analogous to those of the existing examples but without the assurance that they are necessarily comparable with the defining samples.

The quantitative techniques used in classification, mainly devoted to finding objective measures of similarity or dissimilarity between groups of objects and devising probabilistic decision criteria, are well beyond the range of Key Stage 2 pupils and to a large extent the requirement can be interpreted as calling mainly for identification ie an ability to understand a formal criterion and abstract the relevant properties of an object so as to recognise when it satisfies the criterion (and when it does not). The requirement to 'demonstrate the criteria' however suggests that they should be aware of some of the practical implications of classification and the basic issues involved in choosing criteria

5. Classification criteria

A basic property required of any effective classification criterion is that it should be comprehensive ie it must be applicable it to all objects to be classified, whether these are already known or an indefinite, as-yet unknown, collection. It must also provide definitive identification ie it should not be possible to assign any object to more that one class.

A common example is a set of coloured objects to be sorted into classes - red, yellow, green, and blue say. At first sight this is straightforward but reveals some of the basic problems of classification on closer inspection. There are are in fact wide divergences of subjective opinion (and maybe physiological perception) on where the boundaries between these colours lie, particularly in the blue/green and red/yellow regions. With a particular set of objects there may be no border-line cases but where colour is used as an identification criterion (ie applied an unspecified collection of as-yet unseen objects) ambiguities and inconsistencies arise. Another example is the old favourite 'Animal, mineral, or vegetable?' - the assignments of bacteria, coal, oil, heat, light, and sound for example are problematic.

The colour criterion also illustrates a second issue - if a purple/mauve object appears it cannot be fitted into the given classes (it is physically as well as subjectively different from the others in having a double spectral peak in the visible region) ie the classes are not comprehensive. A overlapping set of categories is illustrated by the seed-catalogue classification of garden plants into 'flowers, weeds, fruit, and vegetables' - where do trees, tomatoes, cucumbers, and marrows truly belong and what is the proper class for plants which are rampant in their native habitat and pampered garden inhabitants where they are exotic ?

Real-life criteria rarely satisfy these two conditions strictly and in practice classification schemes often need to specify how ambiguous or anomalous objects are to be treated eg the notorious 'miscellaneous','don't know', or 'other' category. One scheme which is rigorous is the classification of natural numbers into 'primes, squares, and rectangular numbers': although this is not mentioned specifically in the Number section, it is an unequivocal example and is useful in factoring numbers and testing primality.

6. Practical applications

The practical applications of classification tend to fall into one of two broad types - to organise a collection of objects or facts to facilitate retrieval or recall, or to explore possible underlying relationships and differences between members and between classes. Familiar examples of the first are the Classified Football Results and Classified Advertisement Section in the newspaper, while shopping would be much more time-consuming if the shelving in supermarkets was organised alphabetically, or by size or colour. The criteria in the Football case are clear cut but the other examples illustrate the importance of selecting criteria so that potential users can locate the items they are seeking easily. A factor sometimes overlooked is that the words and phrases users will choose to describe an item may not correspond with the classifier's choices of category names. With large 'knowledge' data bases for example there is often a need for a 'thesaurus' to list possible synonyms and related categories for the keywords.

The other type is illustrated by the biological classification of plants and animals (by Latin names) where the intention is to produce species groupings which are closely related in their characteristics and genetic structure, and in medicine, where classification of symptoms is used to diagnose diseases and classification of patients to search for common causes of disease.

The Contents List and Index of technical books are intermediate cases, where the intent is to indicate the scope of particular sections of the book and to flag areas where specific common concepts are invoked respectively.

7. Heirarchies

The geometrical requirement necessitates the introduction of the idea of a hierarchy, which is a common form of systematic classification with a tree structure comprising sub-categories at different levels characterising objects progressively more closely. It is not always possible however to produce a strictly hierarchical system since even though many individual objects may share common characteristics these may not be sufficiently universal to provide global discriminants. Sometimes, as here, there is no compelling natural order for the different levels and a variety of correct descriptions can be produced for the same object eg

trapezium quadrilateral with two parallel sides

parallelogram quadrilateral with two pairs of parallel sides

quadrilateral with two equal parallel sides

quadrilateral with opposite sides equal

rhombus semi-regular quadrilateral

equilateral parallelogram

equilateral trapezium

rectangle right-angled parallelogram

equiangular trapezium

square regular quadrilateral

equilateral rectangle

equiangular rhombus

Formal testing of a requirement such as this except as a multiple choice question poses a dilemma as to what form and level of classification should be accepted as answer eg to take a simple case, a particular figure can be described without error but varying degrees of precision as a 'quadrilateral', a 'trapezium', or an 'isoceles trapezium'. When the question does not specify what form and level of description is required is it justifiable to reject any of these answers? Questions that are in essence paraphrases of 'Guess what I am thinking' belong to the realm of parapsychology rather than mathematics.

In real life this particular question produced two unusual responses which give pause for thought

(a) it is half a hexagon - yes it is indeed (in the particular case a regular hexagon) but half a hexagon may be either a quadrilateral (divided corner to corner) or a pentagon (divided across opposite sides).

(b) would it still be a quadrilateral and a trapezium if the two non-parallel sides crossed: a moot point - is it a quadrilateral, a hexagon, or two triangles, and how many vertices does it have? This is also a timely reminder of the unstated assumptions that lurk in every dark corner ie the traditional classification applies only to plane figures comprising straight lines enclosing a simply connected space (ie a line, not necessarily straight can be drawn joining any two points in its interior without cutting the boundary)

8. Ordering

The idea of ordering will already be familiar to most pupils but practical implementation can be more time-consuming and error-prone than might be expected so that some practice and acquaintance with some of the simpler algorithms is probably worth while. The only significant theoretical point at this stage is recognition that a comparison operation may have four outcomes viz

(a) The comparison is not meaningful
(b) x is less than y
(c) x equals y
(d) x is greater than y

If case (a) applies to any pair of objects ordering is formally impossible, while if (c) occurs for any pair the set is said to be 'partially ordered' by the comparison rule - some additional rule as to the precedence of equal members is needed. If all pairs of objects result in (b) or (d) the set is said to be 'simply ordered' (or just 'ordered'). A third class 'well-ordered' exists, involving the existence or otherwise of a smallest member of any sub-set, but this is relevant only to infinite sets - finite sets which are simply-ordered are also well-ordered viz as far as finite sets of objects are concerned the crucial question is whether equality between any two members can occur and, if so, how they are to be treated.

9. Sorting techniques

For practice sorting playing cards provides a convenient example - say ace to ten of one suit or ace to ten of all suits with the conventional order clubs, diamonds, hearts, spades. A good understanding of the nature of sorting and algorithms generally would be encouraged by a class project to examine and compare (manually) the simpler methods that have been developed for computer use. The methods do not require the cards to be consecutive and work satisfactorily if some are duplicated - examples of these cases might also be included. Pencil-and-paper methods tracing each exchange with smaller numbers of objects can also prove enlightening.

Bubble sort
(a) Examine the cards from left to right - if an adjacent pair are in the wrong order exchange them, if not move on.
(b) After dealing with the last pair, return to the first and repeat the operation until a scan with no exchanges occurs.

Selection sort
(a) Pick up the left-most card
(b) Compare the card in hand with each of those to the right in turn. If the card in hand is greater, exchange them
(c) When the right-hand end is reached replace the card now in hand where the original was picked up and pick up the next card to the right.
(d) Repeat (b) and (c) until no more cards remain to pick up at the end of (c).

Insertion sort
(a) Starting with the second card from the left, examine each card in turn.
(b) If it is less than the card immediately to its left insert it among the cards to the left in its correct place ie so that its value is bracketed by its left and right neighbours, or at the start if less than all of them.
(c) Continue until all cards have been examined.

Radix sort (i)
(a) Examine each card in turn, placing it in a heap of cards of the same suit.
(b) Sort each of the four heaps by any of the other methods.

Radix sort (ii)
(a) Examine each card in turn, placing it in a heap with cards of equal face value
(b) Sort each of the ten heaps by any of the other methods.

Quicksort
(a) Select a card with a value near the middle of the expected range.
(b) Scan the cards from the left until one with a value greater than or equal to the selected one is found.
(c) Scan the cards from the right until one with a value less than or equal to the selected one is found
(d) Exchange the two
(e) Continue the scan alternately from the left and the right until the all the cards have been examined. The selected card is now in the correct position in the sequence.
(f) Repeat the algorithm from (a) for the segment to the left of the original selection and then for the segment to the right of the original selection.
(g) Repeat from (a) until all segments are sorted.

In performing Quicksort manually it is helpful to use two counters to mark the end of the scan at each step. These are placed before the first card and after the last card of the segment being ordered, and are moved along as each card is examined. When the markers reach the same card the partition of the segment is finished and the counters moved to another segment if any remain unsorted.

Pigeonhole sort
(a) Allocate a 'pigeonhole' for each possible value
(b) Examine the cards in turn, inserting each in the appropriate pigeonhole.
(c) Collect the cards from the pigeonholes in order

Approximate number of operations

Number of objects n=10 n=40 n=52

METHOD

Bubble Sort 100 1600 2704

Selection Sort 60 840 1404

Insertion Sort 38 600 1014

Radix (i) - 240 390

Radix (ii) - 120 156

Quicksort 67 426 593

Pigeonhole 10 40 52

Note that these comparisons are only approximate - in practice the relative merits of the different methods vary considerably for best, average, and worst cases.

Sorting in reverse order is achieved by reversing the comparison condition ie exchange 'more' and 'less' or 'smaller' and 'larger'. For small n, Selection sort is usually preferred since it produces a very simple program: Insertion sort is the one used by many card-players. For large n, Quicksort is the most popular choice, and for medium n a combination of Radix sorting with Selection or Insertion sort is the best choice. The Pigeonhole method is by far the most efficient in terms of speed where it can be applied but it requires that all possible values of the discriminant are known and, in computer terms, needs a large amount of temporary storage relative to the others.

3. Classification		7. Heirarchies
5. Criteria		8. Ordering
6. Applications		9. Techniques

trapezium	quadrilateral with two parallel sides
parallelogram	quadrilateral with two pairs of parallel sides
	quadrilateral with two equal parallel sides
	quadrilateral with opposite sides equal
rhombus	semi-regular quadrilateral
	equilateral parallelogram
	equilateral trapezium
rectangle	right-angled parallelogram
	equiangular trapezium
square	regular quadrilateral
	equilateral rectangle
	equiangular rhombus

Number of objects	n=10	n=40	n=52
METHOD
Bubble Sort	100	1600	2704
Selection Sort	60	840	1404
Insertion Sort	38	600	1014
Radix (i)	-	240	390
Radix (ii)	-	120	156
Quicksort	67	426	593
Pigeonhole	10	40	52