This was written December 1989 and published in Archive, Issue 3 No.5, and is still used in several of my progams. Having recently got to grips with ARM code, as an exercise, a friend asked me to write a sort routine for him. This is the resul. It may not the best example of coding but it does the job. The program was converted from BASIC to ARM code and I have tried to show how easily conversions can be done, by putting the BASIC as REM statements into the assembler. It is very fast, typically sorting 8000 strings in around 10 to 11 seconds, compared with 9 seconds, using RISCOS's Heap sort, which only moves the pointers. It can also sort from anywhere in the string, is case insensitive (if you want), and just uses 232 bytes once assembled. There are however some conditions:
1. The strings must be Indirection Strings i.e. $A and not A$(),
There are advantages to using indirection strings,
1. It can use less memory; BASIC uses 5 bytes for storing addresses. Indirection uses none.
The sort works from the second string (string 1) to the end of strings but does not include the first string (string zero). This is due to the original ShellMetzner routine. If used in a database, string zero can be used to store information; fieldnames, date of saving etc. The program was based on the following BASIC ShellMetzner routine which can be downloaded as a Zip with the other files..
1300 REM ShellMetzner sort algorithm in BASIC The GOTO's are the original program and have nothing to do with me! The only alterations I made were to use integer variables, use SWAP in line 1300 and use ELSE GOTO 1350 in line 1380. The latter was done in order to make how the conversion was done, easier to understand. This BASIC program is surprisingly fast, sorting 1000 strings in 14 seconds. How it works I leave up to you to work out, all I did was convert it to ARM code. Converting to ARM code The first thing was to pass the address, string length, number of strings and start point of the sort from BASIC. RO or A% is the address of the start of string storage RI or B% is the length of the string R2 or C% is the number of strings (Numstr) R3 or D% is the start point of the sort (1 st char = 0) In converting the BASIC routine, the first thing I did was designate all the variable names to registers: M% = R4, J% = R5, K% = R6, I% = R7, L% = R8 It was then just a matter of writing the appropriate code, with jumps to compare and swap subroutines. As BASIC passed four registers which are used throughout the program and there were several registers needed during the program, it was necessary to use registers RO to R10. It would be possible to stick to eight registers but this slows up the sort considerably. Before going to the assembler version of line 1380, registers 4, 5 and 6 are pushed on the stack. 1380 J$=J%+1: IF J$>K% THEN GOTO 1330 ELSE GOTO 1350 This caused a minor problem in pulling off the stack since, in the BASIC program, there are two GOTO's, hence two stack pulls and two return points. In both compare and swap routines, it was necessary to find the address of the strings. In BASIC this would be found by Address = StartAddr + (StringNumber * StringLength) In ARM code this was no different:
R5 = RO + R7 * Rl and where R7 and R8 are the string numbers, R2 is the length and RO is the start address of storage. I found that it was quicker to recalculate the addresses rather than storing and retrieving them. The compare is done a character at a time, checking for a Carriage Return (ASCII 13) before ANDing with 95 (&9F) which strips the case off the character. (This ANDing can be removed if you wish to use a case sensitive sort.) The bytes are then compared and if the first is greater than the second, a string swap is done. If they are equal, the next character is obtained and so on until we reach the carriage return. At this point the strings must be identical, so we continue with the next strings. As long as alphanumeric characters are used there is no problem with ANDing with 95, but some other characters when ANDed give the same result and can cause problems. If you wish to use ASCII 33 to 255 then you may have to lose case sensitivity by omitting lines 830 and 840. The swap is done four bytes (a word) at a time and a pointer increased by 4 bytes, until the pointer is the same length as the string, whereupon we return to the sort routine.
330DEFPROCass The most extreme test is to sort from character Included in the BASIC program is a demonstration number three, T, as the sort has to check all the of a sort which sets up C% strings of length 72 characters from the T of The to the t of sort before characters (line 1160) in the formatany differences are seen. Now use it in your own programs For those of you who want to use the sort in your own programs, the first thing you have to do is convert your multidimensioned arrays to single ones, with two arrays holding the length of each string and the start points.(DIM$) There are some procedures in the program "Convert" which will attempt to convert your array (if it is possible to convert). To use this procedure, first write the relevant lines of BASIC to load your array and then fill in what a and b are, i.e. the D IMensions of your strings as A$(a,b). The procedure will tell you the maximum length of each string and the spacing and gives you option of changing the length for a maximum string of 256 (including carriage return). The only problem you may have in running the program is lack of memory if you have a large array of strings. You can then decide the lengths you want and store them as DATA statements in your program or read them from the beginning of the store memory, where the length is stored and then the "DIM$" array. Any queries, write to me at the address in the REM statements. (Please send a SAE for replies.) It is also possible, by stating that the start address of strings is at the start of string 45 and that there are only 11 strings to sort part of the array. I hope you find this routine useful. Sort routines in RISCOS This is a sort using the Heapsort provided within RISCOS. In order to use, it the strings must be stored as indirection strings and the location of each string must also be stored as a block of 4 byte words (at least until I discover how to find where BASIC stores its strings). The sort is called using SYS"OS_HeapSort", RO , RI, R2 RO is the number of strings RI is a pointer to an array of word sized objects, which in this case is a list of 4 byte words holding the address of the strings. R2 is the type of object to be sorted, 4 is case insensitive strings and 5 is case sensitive strings For details of other types, see PRM page 819 (vo1.2)
0 REM>Heapsort4A To download the programs 6KB Zip click here>.
|