Article 68851 of comp.sys.cbm:
Path: news.acns.nwu.edu!newsfeed.acns.nwu.edu!math.ohio-state.edu!cs.utexas.edu!feeder.chicago.cic.net!EU.net!enews.sgi.com!decwrl!tribune.usask.ca!rover.ucs.ualberta.ca!news.ucalgary.ca!srv1.freenet.calgary.ab.ca!albrecht
From: "Alvin R. Albrecht" 
Newsgroups: comp.sys.sinclair,comp.sys.cbm,comp.emulators.cbm
Subject: Re: Spectrum Emulator for C64 (LONG!)
Date: Fri, 6 Jun 1997 20:15:51 -0600
Organization: Calgary Free-Net
Lines: 402
Message-ID: <5nag9i$7ru@ds2.acs.ucalgary.ca>
References: <337C5E94.388@actcom.co.il> <01bc6f79$486f2220$04b8de8b@w9622136> <5mvksv$rif@news.acns.nwu.edu> <5n23b1$11pa@ds2.acs.ucalgary.ca> <5n4rci$fhc@news.acns.nwu.edu>
Reply-To: "Alvin R. Albrecht" 
NNTP-Posting-Host: albrecht@srv1.freenet.calgary.ab.ca
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
In-Reply-To: <5n4rci$fhc@news.acns.nwu.edu>
Xref: news.acns.nwu.edu comp.sys.sinclair:39411 comp.sys.cbm:68851 comp.emulators.cbm:21223



On 4 Jun 1997, Stephen Judd wrote:

> As to heeding my own advice, I do not see that I was giving any advice.

You must have forgotten what you said previously in the thread (go back a
couple of your posts).
 
> Since you did not comment on the substance of the comments, I can only
> assume that you agree with them?

:-).  No.  Although I don't doubt that many of you may be geniuses,
you don't know enough about anyone here to make those statements.
 
> >When you return, check your ego at the door.

> It is your perogative to decide how you want to interperet my statements
> to infer my opinions.  My interest is in the statements themselves.  If
> a statement is true, then it is true.  I happily stand by the statements
> I have made to date.

OK, I withdraw what I said and apologize.

Here's one of your statements that I'd like you to explain:  the
6502 is significantly different from the 6809.  Please tell me 
what the 6502 has that the 6809 doesn't have and then show me what
they have in common so we can see just how different they are.

As for the questions you asked, let's start tackling them one at a time
with the first being "Is a z80 faster than a 6502"?  I'll be happy
to explain to you why software driven anything is better than hardware
driven stuff later ;) (and all those software sprites questions).
 
A serious look at this question is going to be dull & long.  For that,
I apologize, but you seem like you are genuinely interested in this
question, as am I.  So let's begin by taking a look at the on chip
resources of each processor in an attempt to determine relative strengths
and weaknesses.  (Registers for interrupts & refresh not included here).

I am very familiar with a z80, but not very familiar with a 6502.
Please make any corrections in the 6502 section below.

Z80
---

Eight bit registers:
A      : Eight bit accumulator
F      : Flags
B      : General purpose
C      :   "
D      :   "
E      :   "
H      :   "
L      :   "

Eight more in an alternate set:
A',F',B',C',D',E',H',L'

Only one set of registers is active at a time.
EXX (4 cycles) will swap the BCDEHL registers and
EX AF,AF' (4 cycles) swaps the AF/AF' registers.
 (* when two 8 bit registers are combined
    it means we treat the two as a single
    16 bit entity )

16bit registers:
IX     : Index register 1
IY     : Index register 2
SP     : Stack pointer
PC     : Program counter

In addition, the eight bit registers can be used in
combination as 16 bit registers:

AF/AF', BC/BC', DE/DE', HL/HL'

The instruction set is very asymmetric, a result of having
a lot of on chip resources and using just an 8 bit wide opcode.
This means that the general purpose registers aren't really
general at all as the instruction set favours registers with
certain purposes:

A          : 8 bit accumulator
             fast shifts, add, sub, load/store to memory,
             compares, logical operators, i/o
B          : 8 bit counter, general
C,D,E,H,L  : general
             slower shifts, hold data on chip rather than in memory
             for compares & arithmetic etc with accumulator,
             bit operations

AF         : Used solely for push/pop from stack (on z80, all stack
             operations are 16bits at a time), communication
             between alternate register sets
BC         : 16 bit counter, tertiary memory pointer (LD A,(BC)),
             16 bit i/o addresses
DE         : secondary memory pointer (LD A,(DE) and there's a fast
             EX DE,HL instruction (4 cycles) putting DE in HL.
HL         : 16 bit accumulator and primary memory pointer.
             fast 16 bit add & sub, LD r,(HL), logical operations
             between A and memory, shifts on memory directly, etc.

The z80 was designed to run 100% of the 8080's software.  As such,
all the 8080 instructions were copied.  The Z80 is more than just
an 8080, though, with a lot of features added.  Unfortunately,
these features could not be added without resorting to two byte
opcodes, which adds another 4 cycles to their execution time.

One of those features is the two 16 bit index registers IX and IY.
They act as 16 bit memory pointers in LD r,(IX+d) (r=any 8 bit register
is loaded with contents of memory address IX+d where d is an 8 bit
two's complement offset).  The IX and IY registers are even more special
because they have a lot in common with the HL register pair: simply
precede an instruction with HL in it with a special opcode and you get
the same action on the IX/IY registers.  So they also function as slower
16 bit accumulators.

Another feature is the bit addressable mode.  For example, BIT 5,D,
which tests whether bit 5 is set or reset in D.  This is equivalent
to   LD A,#20; AND D which takes a total of 8 cycles.  BIT 5,D has
two opcode bytes and also takes 8 cycles.  The only advantage is that
the contents of the accumulator are not lost.

Another feature added to the 8080 core is the set of complex instructions.
These were intended to make block memory moves, block compares, block
i/o, etc. very quick.  An example is the LDIR instruction which copies
BC bytes of memory pointed at by HL to the address in DE.  By using the
instruction rather than the equivalent without it, the cpu avoids
a lot of fetch cycles (LDIR is actually LDI - the R stands for repeat -
with the PC being decremented by 2 if BC!=0 after each iteration.  Thus,
the Z80 refetches LDIR after each iteration, something corrected in the
Z180/280/380 where opcodes can be 16 bits wide which does away with the
need for two opcode fetches).

The z80 is a hard-wired device and is not microcoded.  Zilog added
several instructions which are physically located on the edges of
the z80 die and are known as the undocumented instructions.  They
remained undocumented because Zilog could not guarantee their
reliability, but in practice, they turn out to be reliable and are
frequently used in games where speed is paramount.

OK, there's a bit of background on the Z80.  Now a look at the 6502.


6502
----

Eight bit registers:
A      : accumulator
F      : flags
X      : index register 1
Y      : index register 2
SP     : stack pointer

16 bit registers:
PC     : program counter


The 6502 opts for holding data in a fast area of memory: page 0
which is used as a repository of registers.  Its two 8 bit index
registers are used to create stack frames in page 0.  The 8 bit
stack pointer provides for a LIFO stack underneath the current
stack frame.  

What delivers the power in the 6502 is its diversity of addressing
modes.  This combined with the index registers gives the 6502 a 
set of 256 8 bit general purpose registers, exactly analogous with
the Z80's twelve or so on chip.  But access is done by a fast memory
read/write rather than a faster read/write on chip in the Z80.
Also, the stack and all the stack frames of subroutines have to
exist in the same 256 byte space (if you want to avoid severe
time penalties versus a z80 with its 16 bit pointers) so realistically,
you won't have all 256 at your disposal.

A major disadvantage the 6502 has vs the Z80 is that access to the
full 64k space is much slower.  The Z80 has 16 bit memory pointers
which can be used to read/write a single byte in a minimum of 7 cycles
(3.5 in 6502 time).  The 6502, on the other hand, has to use its
direct addressing mode and must specify the 16 bit address on every
read/write.  I don't know if the 6502 has PC relative addressing.

Another major disadvantage is the 8 bit stack pointer.  This is
especially crucial in recursive algorithms like flood fill and
quicksort which could use up a lot of stack space.  On the 6502,
you'd need to implement a 16 bit stack pointer variable in page 0
which will cause the 6502 to take a severe performance hit vs the z80
when it accesses the full 64k space.

Other comparisons:  the two A registers function pretty much
identically, as do the F registers.

The index registers have a lot in common in the indexed
addressing mode, with the z80's also able to behave as 16 bit
accumulators.


Now, lets look at some examples.  Remember the z80 does twice as
many cycles as the 6502 in the same time.  The Spectrum does 3.54
times as many as the C64 in the same time.

1.  First, one Stephen came up with:  multiply two 8 bit numbers and
get a 16 bit result.  The amount of time needed will depend on the 
algorithm selected.  I have no idea what algorithm you have used to
get your 30 cycle result, so I'm going to use a straightforward
shift and add algorithm.  Please provide the 6502 equivalent so we can
compare.  If you'd like, let's see your fast multiply so we can
write a z80 equivalent.  But don't ignore this version, please.

      This z80 version multiplies an 8 bit number with a 16 bit
      number and keeps the least significant 16 bits of the product.
      There is no advantage in dealing exclusively with 8 bit
      multiplicands.

      enter:  A=8 bit number
             DE=16 bit number
      exit:  HL=A*DE, least significant 16 bits

            LD HL,0		10
            LD B,8		7
      loop  ADD HL,HL		11
            RLA			4
'A'         JR NC,noadd		12/7	; if 0: 12, 1: 18
'A'         ADD HL,DE		11	;  avg of 15
      noadd DJNZ loop		13/8

      Z80 cycle times are on the right.  The a/b form for branches
      are cycle times if the branch is taken/not taken.  The two
      instructions marked A above take an average of 15 cycles to
      complete if the eight bit multiplicand has an equal number of
      1s and 0s.

      I count:  10+7+8*(11+4+15+13)-5=356 cycles.

      If I unroll the loop, I save 7+11+7*13+8=117 cycles, so the
      routine takes 356-117=239 cycles.  I think the z80 will benefit
      more from unrolling than the 6502.

      For a 6502 to beat the z80, it must do the above in
      356/2=178 cycles.  For a C64 to beat the Spectrum, it must
      do it in 356/3.54=101 cycles.

The z80 should do this faster for two reasons:  1) All the variables
needed can be kept in the z80's register set versus being accessed
from page 0 memory in the 6502.  2) The z80 can add 16 bit numbers.


2.  I want to look at exactly how much slower the 6502 is at 
accessing the full 64k compared to the z80.  Let's copy nn bytes
(16 bit) from a 16 bit source address to a 16 bit destination address.

     First, a z80 solution without making use of the complex 
     instruction LDIR to find out what the advantage is of
     having 16 bit memory pointers.

     enter: HL=source address
            DE=destination address
            BC=# bytes to transfer (0 means 65536)=nn

     loop   LD A,(HL)		7
            LD (DE),A		7
            INC HL		6
            INC DE		6
            DEC BC		6
            LD A,B		4
            OR C		4
            JP NZ,loop		10/10

     I count nn*(7+7+6+6+6+4+4+10)=50nn cycles.

     For the 6502 to beat the z80, you need to do it in 25nn cycles.
     For the C64 to beat the Spectrum, you need to do it in 14nn cycles.

The z80 should do this faster because it has on chip 16 bit memory
pointers vs the 6502 which must use the direct addressing mode to
load from memory and must fiddle with 16 bit numbers a byte at a 
time.  The 6502 will gain time on the z80 because the 16 bit
decrement operation "DEC BC" above does not set a flag when the
result is zero.  Hence the 8 cycle penalty in each loop to test if
the result is zero.

     Now, let's have another look with the help of one of the Z80's
     complex instructions.

     enter:  HL=source address
             DE=destination address
             BC=byte count=nn (0 means 65536)

             LDIR		21/16

     Yep, that's it.  The cycle time for the LDIR is 21 cycles if
     BC!=0 after the iteration and 16 if it is.

     I count 21*nn-5 cycles.

     For the 6502 to beat the z80, you need to do it in 10.5nn-2.5
     For the C64 to beat the Spectrum, 5.9nn-1.4 cycles.

I know the 6502 has no chance here.  I would like to see a 6502
version that does a block copy in page 0 (everything 8 bit) and
see how it compares.  XMikeX already wrote one, but he said it
wasn't necessarily optimized.  Here's a second chance.


3.  Just to underline how much an advantage the complex instructions
& the z80's 16 bit pointers are, let's try out this problem.  I want
to write a simple tokenizer.  The strings already tokenized are stored
sequentially in memory and are null terminated.  Following each
null terminated string, the token for the string is stored.  The
subroutine should try to match the string to be tokenized with a 
string already stored in the data structure and return its token
if found.

Eg:  Tokens  -> Hello0&There0&Everyone0&0
     Search  -> There0

The '&'s are distinct single byte numerical tokens.  For example, given
the search string above, this subroutine should return a positive match
and the token '&' stored at the end of 'There0&'.

Your subroutine would get a pointer to Tokens, the data structure
containing the stored strings, and a pointer to Search, the string
being sought.

To make the code easier to follow, I've added comments.  Cycle times
are on the left.

     Here's a z80 version:

     enter:  HL=Tokens
             DE=Search

     exit :  - If found, C flag set, A=token value
             - If not found, NC flag set, HL=address of
               last 0 in Tokens string, where the
               string being searched for can be added
               to the token list
             - DE=Search, unchanged

11   SEARCH  PUSH DE		; save head of string being sought
7            LD A,(HL)		; at this point, HL=address of
4            OR A		;   1st char in string in token list
12/7         JR Z,fail		; if 1st char=NULL, no match found

7    compare LD A,(DE)		; compare char of string being sought
7            CP (HL)		;   and token string being examined
10/10        JP NZ,notsame	; no match if different
4            OR A		; same - check if NULL char
12/7         JR Z,match		; if yes, we have match
6            INC DE		; increment both pointers
6            INC HL
10           JP compare		; and continue comparing strings

10   notsame POP DE		; recover pointer to string sought
4            XOR A		; A=NULL character
4            LD C,A		; BC=0, set up for CPIR
4            LD B,A
21/16*       CPIR		; move HL to point to 1st char after
				; NULL in string that last failed to
                                ; match (now points at token)
6            INC HL		; get past token
10           JP SEARCH		; check against next tokenized string

10    match  POP DE		; recover pointer to string sought
6            INC HL		; move HL to token
7            LD A,(HL)		; A=token value
4            SCF		; set carry flag to indicate success
10           RET		; return

10    fail   POP DE		; recover pointer to string sought
10           RET		; return with carry flag reset


* Again, with CPIR, 21 cycles if BC!=0 and 16 otherwise.  Since BC=0
before the CPIR, the entire 64k space will be searched before BC=0
again.  Thus, this instruction will end when it finds the NULL
character rather than when BC=0 (so 21 cycles*# chars it passes before
finding NULL).

This one's a little harder to compare cycle times.  I'll let you solve
this problem on the 6502 and let you decide on how to compare.


4.  In this example, I would like to demonstrate how much a liability
an 8 bit stack pointer is in recursive algorithms.  If you are up
to it, I'd like to write a flood fill subroutine.  I can't do that
in the next few minutes, so I won't include a z80 version here.  Let
me know if you want to try this and I'll come up with an algorithm
to follow (or you can come up with one - your choice).



Now the ball's in your court.  You are welcome to try to come up with
problems that concentrate on the 6502's strengths and the Z80's
weaknesses.

To anyone wanting to reply to this post, please have mercy on us all
and edit out all irrelevant bits.


Alvin



Article 68961 of comp.sys.cbm:
Path: news.acns.nwu.edu!merle!judd
From: judd@merle.acns.nwu.edu (Stephen Judd)
Newsgroups: comp.sys.sinclair,comp.sys.cbm,comp.emulators.cbm
Subject: Shootout at the 0K Corral (was various other things)
Date: 9 Jun 1997 08:38:16 GMT
Organization: Northwestern University, Evanston, IL
Lines: 426
Message-ID: <5ngfdo$79r@news.acns.nwu.edu>
References: <337C5E94.388@actcom.co.il> <5n23b1$11pa@ds2.acs.ucalgary.ca> <5n4rci$fhc@news.acns.nwu.edu> <5nag9i$7ru@ds2.acs.ucalgary.ca>
Reply-To: sjudd@nwu.edu (Stephen Judd)
NNTP-Posting-Host: merle.acns.nwu.edu
Xref: news.acns.nwu.edu comp.sys.sinclair:39501 comp.sys.cbm:68961 comp.emulators.cbm:21286



A dusty street is lined with the scattered bits of past code showdowns.
It is deserted but for two men, and men they are: pocket protectors,
glasses, not even a user manual nearby.  One handsome devil has a brown
breadbox at his side; the other, a small rubber chew-toy of some sort.

They choke softly on the dust which arises as they walk.

...They approach...

One draws and fires!

In article <5nag9i$7ru@ds2.acs.ucalgary.ca>,
Alvin R. Albrecht  wrote:
>
>Here's one of your statements that I'd like you to explain:  the
>6502 is significantly different from the 6809.  Please tell me 
>what the 6502 has that the 6809 doesn't have and then show me what
>they have in common so we can see just how different they are.

Beats the hell out of me :).

Last time I looked at a 6809, though (cuz I have a CoCo 3 which
is like your Dragon), it had extra GP registers, mult/div instructions
built-in, PC-relative addressing, and an overall programming/design
philosophy different than the 6502/6510.

BTW, my only statement on this subject was "Yep" :)

(You'll have to ask someone more familiar with both processors,
their design, and their history for a real answer to your question,
although I'd be happy to research it if you're really interested).

>As for the questions you asked, let's start tackling them one at a time
>with the first being "Is a z80 faster than a 6502"?  I'll be happy

Excellent!  I want to test my claim/guess that a Z80 is on average around
three times slower than a 6510 :).  (Or, to put it in words possibly more
acceptable to the Spectrum crowd, that you'd have to run a Z80 three times
faster than a 6510 to get similar performance).

One thing should be perfectly clear: a good algorithm can always
overcome hardware limitations.

Incidentally, I am completely unable to resist a coding challenge. :)

>I am very familiar with a z80, but not very familiar with a 6502.

Well, that's OK, since I'm the opposite :).

BTW, my area of expertise is in 6510 mathematical and graphics algorithms.
I don't know much about Computer-Science type programs (sorts, operating
systems, etc.) but I am of course happy to give things a stab.

Actually, they are the main reason I dropped out of Computer Science :).

Anyways, the web is littered with sites from the really kick-ass programmers,
e.g. Craig Bruce (Totally Kick Ass), Andre Fachat (Really kick-ass),
Daniel Dallman (Pretty kick-ass), and so on, down to the Merely kick-ass.
While I have written tiny compilers and simpleton programs they have
written multi-tasking multi-threaded operating systems, SLIP stacks,
115kbps terminal programs, etc.  So they would be the place to go
for actually decent algorithms.

Anyways...

>6502
>----
>
>Eight bit registers:
>A      : accumulator
>F      : flags
>X      : index register 1
>Y      : index register 2
>SP     : stack pointer
>
>16 bit registers:
>PC     : program counter
>
>
>The 6502 opts for holding data in a fast area of memory: page 0
>which is used as a repository of registers.  Its two 8 bit index

I would not say that zero page (ZP) is a repository for registers.
It's just an area of memory that offers faster access and a mode
of indexing (also smaller instruction size).  You could just
as easily say that non-ZP memory is a series of virtual 16-bit
(or 24-bit, or...) registers.

It's a kind of zen-thing programming mindset; maybe other people
think of it as a bunch of registers, but not me.  There's just
the one register, A, and the two index regs, X and Y.  There's
memory from $00-$FFFF.  Everything revolves around that.  It's
a very minimalist existence.

>What delivers the power in the 6502 is its diversity of addressing
>modes.  This combined with the index registers gives the 6502 a 
>set of 256 8 bit general purpose registers, exactly analogous with

Again, they really aren't GP regs.  For instance, I can do an

	ADC #32

on the accumulator, but not on a zero page location.  They are
just like other memory, except they have shorter cycle times
for identical operations.

>The index registers have a lot in common in the indexed
>addressing mode, with the z80's also able to behave as 16 bit
>accumulators.

I don't think so; the indexing difference is small on paper but
strikes me as being enormous for coding.  The cycle times are
quite different, too.

Z80 indexing looks to be 8+16 (8 bits memory, 16 bits index).
On a 6510, it's 16+8 -- 16-bit address, 8-bit memory.  Moreover,
the indexing is just as fast as a normal memory operation (sometimes
plus one cycle, if a page boundary is crossed, but it's not important).
This has enormous ramifications.  LDA $C000 is exactly as fast
as LDA $C000,X and LDA $C000,Y.

In the same way that you can't imagine getting along with just
a single 8-bit GP register, most 6510 programmers can't imagine
getting along without the indexing modes.

>Now, lets look at some examples.  Remember the z80 does twice as
>many cycles as the 6502 in the same time.  The Spectrum does 3.54
>times as many as the C64 in the same time.

The Z80 runs at 2MHz.  The Spectrum runs at 3.54 MHz.  Got it.

>1.  First, one Stephen came up with:  multiply two 8 bit numbers and
>get a 16 bit result.  The amount of time needed will depend on the 
>algorithm selected.  I have no idea what algorithm you have used to
>get your 30 cycle result, so I'm going to use a straightforward

Well, some day when you're feeling motivated just stop by my web page
and you can see all of my Zecretz 2 Coding :).

>      This z80 version multiplies an 8 bit number with a 16 bit
>      number and keeps the least significant 16 bits of the product.
>      There is no advantage in dealing exclusively with 8 bit
>      multiplicands.
>
>      enter:  A=8 bit number
>             DE=16 bit number
>      exit:  HL=A*DE, least significant 16 bits
>
>            LD HL,0		10
>            LD B,8		7
>      loop  ADD HL,HL		11
>            RLA			4
>'A'         JR NC,noadd		12/7	; if 0: 12, 1: 18
>'A'         ADD HL,DE		11	;  avg of 15
>      noadd DJNZ loop		13/8

Are you sure the above works?  I use A=128 and DE=1, and get an
answer of HL=2, after two loop iterations.  Actually if RLA
doesn't set Z, I get HL=0 (after a whole bunch of iterations).

I think with a DEC B at noadd, it will work more like you want it
to.  +4 cycles?

>      Z80 cycle times are on the right.  The a/b form for branches
>      are cycle times if the branch is taken/not taken.  The two
>      instructions marked A above take an average of 15 cycles to
>      complete if the eight bit multiplicand has an equal number of
>      1s and 0s.

Hmmm, that's no good, use best cases and worst cases.  Average comes
out in the wash.  Use a routine that works though :).

Using the modified +4 routine, I get 17 + 8*(11+4+12/18+13)=337/385,
so around 360 cycles (343 ignoring the 17 cycle start-up).

--

Hmmmm... OK, plain jane shift and add multiply, 8 bits * 8 bits,
easy to scale up to n bits * m bits.  I usually use a slightly
different routine which preserves some memory locations and scales
in a nice way, but the below is a copy of the above routine.
(My normal version, as well as the divide version, is on my
web page).

This routine is basically like yours; sometimes I used a modified
"stop when zero" routine which doesn't keep adding zero to itself,
but its slower for large numbers as I recall. 

ACC and AUX are locations in zero page.  Cycle times are to the right.

* ACC*AUX -> [.A, EXT+1] (low,hi) 16 bit result

MULT
          LDA #0		2
          STA EXT+1		3
          LDY #8		2

]LOOP     ASL 			2
          ROL EXT+1		5
          ASL ACC		5
          BCC MUL2		3/2
          CLC			  2
          ADC AUX		  3
	  BCC MUL2		  3/2
	  INC EXT+1		    5
MUL2      DEY			2
          BNE ]LOOP		3/2

By using a simple little trick (DEC AUX) it is possible to eliminate
the CLC and save 2 cycles, but this routine is good enough.

The main loop takes either 20 cycles, 27 cycles, or 31 cycles, depending
on the numbers.  This gives times of 160/216/248 cycles, so around
a factor of two less cycles (the "averaging rule" doesn't really apply
here -- the cases don't appear with equal probability).

>2.  I want to look at exactly how much slower the 6502 is at 
>accessing the full 64k compared to the z80.  Let's copy nn bytes
>(16 bit) from a 16 bit source address to a 16 bit destination address.
>
>     First, a z80 solution without making use of the complex 
>     instruction LDIR to find out what the advantage is of
>     having 16 bit memory pointers.
>
>     loop   LD A,(HL)		7
>            LD (DE),A		7
>            INC HL		6
>            INC DE		6
>            DEC BC		6
>            LD A,B		4
>            OR C		4
>            JP NZ,loop		10/10
>
>     I count nn*(7+7+6+6+6+4+4+10)=50nn cycles.

This is the routine I use:

(I leave out the initialization of registers and memory source/dest)

:LOOP	LDA $8000,Y	4
	STA $8000,Y	4
	INY		2
	BNE :LOOP	3/2
	INC :LOOP+2	  6
	INC :LOOP+5	  6
	DEX		  2
	BNE :LOOP	  3/2

So, 13 (or 15) * nn cycles.  The extra 6+6+2+3 stuff adds a wholly
trivial amount to the execution time, but if you want to be pedantic
and call it 13.06 cycles it's OK by me :).

Like I said, 6510 programmers can't imagine getting by without
proper indexing modes.  Incidentally, the 6510 DEC sets things
like Z for you, although with 16-bits an OR is needed.

Now, on a 6510, people who need fast memory fills usually unroll
part of the loop, e.g.

	LDA $8000,Y
	STA $8000,Y
	LDA $9000,Y
	STA $9000,Y

or whatever.  This gets it closer to the maximum possible memory
transfer rate of 8*nn cycles.

>     Now, let's have another look with the help of one of the Z80's
>     complex instructions.
>
>             LDIR		21/16
>
>     I count 21*nn-5 cycles.

Nifty.  (On a 65816 it takes 7 cycles per byte, I believe, but no
such luck on a 6510).

>I know the 6502 has no chance here.  I would like to see a 6502
>version that does a block copy in page 0 (everything 8 bit) and

It won't be any different.

>3.  Just to underline how much an advantage the complex instructions
>& the z80's 16 bit pointers are, let's try out this problem.  I want
>to write a simple tokenizer.  The strings already tokenized are stored
>sequentially in memory and are null terminated.  Following each
>null terminated string, the token for the string is stored.  The
>subroutine should try to match the string to be tokenized with a 
>string already stored in the data structure and return its token
>if found.
>
>Eg:  Tokens  -> Hello0&There0&Everyone0&0
>     Search  -> There0
>
>The '&'s are distinct single byte numerical tokens.  For example, given
>the search string above, this subroutine should return a positive match
>and the token '&' stored at the end of 'There0&'.
>
>Your subroutine would get a pointer to Tokens, the data structure
>containing the stored strings, and a pointer to Search, the string
>being sought.

OK, here's my version:

	LDY #00							2
:LOOP1	LDX #00							2
:LOOP2
	LDA TOKENS,Y						4(5)
	CMP SEARCH,X						4(5)
	BNE :NEXT						3/2
	INY							  2
	INX							  2
	CMP #00							  2
	BNE :LOOP2	;Is token string zero?			  3/2
:FOUND			;All done
	RTS or whatever

:LOOP3	INY		;Find end of string			2
:NEXT	LDA TOKENS,Y						4(5)
	BNE :LOOP3						3/2
:CONT	INY		;Advance past null byte   		  2
	INY		;Advance past token			  2
	BNE :LOOP1	;Always taken				  3


	
- Since you didn't specify a terminating condition, I didn't use one,
  i.e. I always assume a match is found eventually.

- You could make it a little faster in :NEXT if e.g. tokens all have
  high bit set, or if last character of string has high bit set, etc.

- I assume strings aren't longer than 256 bytes.  If longer, then
  add a few cycles to :NEXT (TYA CLC ADC :LOOP+2 STA :LOOP+2 etc.)
  or else use a ZP-indirect instruction, i.e.  LDA (TOKENS),Y

Assuming TOKENS and SEARCH are in normal memory, that gives a total
routine size of 28 bytes.  Traversing the string takes 9 cycles
per byte to the next token, plus a few extra.  Comparing the strings 
takes 18-19 cycles when a match is made, 11 cycles otherwise.

(I'll go through your code later, I promise!  It just looks like it
will involve work on my part :)

>4.  In this example, I would like to demonstrate how much a liability
>an 8 bit stack pointer is in recursive algorithms.  If you are up

6510 solution: don't use a massively recursive algorithm.  (Especially
since recursion is slow :).

Steve's rule #31: There's always another way of doing it.

>to it, I'd like to write a flood fill subroutine.  I can't do that
>in the next few minutes, so I won't include a z80 version here.  Let
>me know if you want to try this and I'll come up with an algorithm
>to follow (or you can come up with one - your choice).
>
>
>Now the ball's in your court.  You are welcome to try to come up with
>problems that concentrate on the 6502's strengths and the Z80's
>weaknesses.

Good, now my turn. :)

When I think of things where processor speed is important, I think
of calculations.  Coarse shift-add was done above; fast multiplies
are a breeze and much more useful.  Here is how to do a fast multiply:

	let f(x)=x^2/4
	then a*b = f(a+b) - f(a-b)

Thus with a small, simple table of squares you can do multiplications
very rapidly.  By using an expanded set of tables, you can make it
even faster, but I won't worry about that below -- just the basic
algorithm.

On a 64: let ZP1-ZP4 be zero page pointers (2-bytes each), set up to
	point to the low (ZP1, ZP2) and high (ZP3, ZP4) bytes of
	the f(x) table (the table is page-aligned).

* Multiply A*Y, result in X,A = lo,hi

	STA ZP1		;Low byte contains A			3
	STA ZP2							3
	EOR #$FF						2
	CLC							2
	ADC #01							2
	STA ZP3		;Low byte contains -A			3
	STA ZP4							3
	LDA (ZP1),Y	;f(y+a), low byte			6(7)
	SEC							2
	SBC (ZP3),Y	;- f(y-a)				6(7)
	TAX							2
	LDA (ZP2),Y	;f(y+a), high byte			6(7)
	SBC (ZP4),Y	;- f(y-a)				6(7)
							Total: 46 cycles

The shift and add was some 180 cycles.  So just by using a superior
algorithm we have suddenly gained a factor of 4 or so speed increase.

Faster with a few more tricks, but this is good enough.  How would
this be done on a Z80?

Next, a number of people have claimed that 3D programs will always be 
much faster on the Spectrum than on the 64.  In my experience, the
graphics totally overwhelm the mathematical calculations in terms
of cycle burn.  So, I'd be interested in seeing a simple line routine,
including plotting to the screen.  I won't make you write a filled polygon :),
even though I think it will be much slower on a Z80 (at least, my routine
will be heck of a lot slower, due to the lack of indexing).

The line routine doesn't need to be very smart -- assume that endpoints
lie on the screen, etc.

I can post my line routine if you really want, but there are about
five different versions of it on my web page (or in the C= technical
journals).

Well, that's all for now, I am totally tired, its like 5 hours past
my bedtime now :).

	evetS-

>Alvin


Article 69078 of comp.sys.cbm:
Path: news.acns.nwu.edu!merle!judd
From: judd@merle.acns.nwu.edu (Stephen Judd)
Newsgroups: comp.sys.sinclair,comp.sys.cbm,comp.emulators.cbm
Subject: Re: Shootout at the 0K Corral (was various other things)
Date: 10 Jun 1997 15:20:19 GMT
Organization: Northwestern University, Evanston, IL
Lines: 97
Message-ID: <5njrbj$k7k@news.acns.nwu.edu>
References: <337C5E94.388@actcom.co.il> <5n4rci$fhc@news.acns.nwu.edu> <5nag9i$7ru@ds2.acs.ucalgary.ca> <5ngfdo$79r@news.acns.nwu.edu>
Reply-To: sjudd@nwu.edu (Stephen Judd)
NNTP-Posting-Host: merle.acns.nwu.edu
Xref: news.acns.nwu.edu comp.sys.sinclair:39601 comp.sys.cbm:69078 comp.emulators.cbm:21370

In article <5ngfdo$79r@news.acns.nwu.edu>, Stephen Judd  wrote:
>This is the routine I use:
>
>(I leave out the initialization of registers and memory source/dest)
>
>:LOOP	LDA $8000,Y	4
>	STA $8000,Y	4

I hope its obvious that the $8000's above are set to SOURCE and DEST by
the setup routine, etc. ...

>	INY		2
>	BNE :LOOP	3/2
>	INC :LOOP+2	  6
>	INC :LOOP+5	  6

...and that these two INCs increment the high bytes of SOURCE and DEST.

>	DEX		  2
>	BNE :LOOP	  3/2
>
>So, 13 (or 15) * nn cycles.  The extra 6+6+2+3 stuff adds a wholly

--

Well, I went through your code and see you put some terminating
conditions and such in it (double zero marks end of string, etc.)
so I thought I'd add them in to my version.

Also, how does CPIR work?  I.e. what does it do?  I assume it will not
execute if the character is already zero?

>>Your subroutine would get a pointer to Tokens, the data structure
>>containing the stored strings, and a pointer to Search, the string
>>being sought.
>
>OK, here's my version:
>
>	LDY #00							2
>:LOOP1	LDX #00							2
>:LOOP2
>	LDA TOKENS,Y						4(5)
>	CMP SEARCH,X						4(5)
>	BNE :NEXT						3/2
>	INY							  2
>	INX							  2
>	CMP #00							  2
>	BNE :LOOP2	;Is token string zero?			  3/2
>:FOUND			;All done

	LDA TOKENS,Y
	SEC		;C->Successful

>	RTS or whatever
>
>:LOOP3	INY		;Find end of string			2
>:NEXT	LDA TOKENS,Y						4(5)
>	BNE :LOOP3						3/2
>:CONT	INY		;Advance past null byte   		  2
>	INY		;Advance past token			  2

	LDA TOKENS,Y	;Check if at end

>	BNE :LOOP1	;Always taken				  3

	CLC		;Token not found
	RTS


If the TOKENS string can be more than 256 bytes, then I would use
a ZP-indirect strategy.  That is, let TOKENS be a zero-page pointer
to the TOKENS string.  Change

	LDA TOKENS,Y -> LDA (TOKENS),Y

and change :NEXT to look like

	...
:CONT	INY
	INY
	BPL :CONT2	;Wait until Y > 127
	TYA		;Add Y to TOKENS pointer
	LDY #00		;and reset Y
	CLC
	ADC TOKENS
	STA TOKENS
	BCC :CONT2
	INC TOKENS+1
:CONT2	LDA TOKENS,Y
	...

or something similar (could use CPY #240 or whatever to wait until Y>240, 
etc.).

Wow, I'm amazed that my 3 a.m. code actually seems to work :).

	evetS-


Article 68966 of comp.sys.cbm:
Path: news.acns.nwu.edu!newsfeed.acns.nwu.edu!news.ece.nwu.edu!news.cse.psu.edu!news3.cac.psu.edu!howland.erols.net!newsfeed.nacamar.de!peernews.paralex.co.uk!paralex!peernews.ftech.net!telehouse1.frontier-networks.co.uk!basilisk.pdc.nhs.gov.uk!yama.mcc.ac.uk!simonc
From: simonc@jumper.mcc.ac.uk (Cookie)
Newsgroups: comp.sys.sinclair,comp.sys.cbm,comp.emulators.cbm
Subject: Re: Spectrum Emulator for C64
Followup-To: comp.sys.sinclair,comp.sys.cbm,comp.emulators.cbm
Date: 9 Jun 1997 10:28:11 GMT
Organization: Sirius Cybernetics Corporation
Lines: 56
Distribution: world
Message-ID: <5nglrr$f4q@yama.mcc.ac.uk>
References: <337C5E94.388@actcom.co.il> <3389cd28.1468228@commodore64.com> 
NNTP-Posting-Host: jumper.mcc.ac.uk
X-Newsreader: TIN [version 1.2 PL2]
Xref: news.acns.nwu.edu comp.sys.sinclair:39506 comp.sys.cbm:68966 comp.emulators.cbm:21288

The Starglider (starglider@thespian.demon.co.uk) wrote:
: >If any of you guys are actually interested in learning about multiplication
: >routines, instead of gesticulating wildly in the air as you speak,
: >take a look at my web page (and read my articles in C=Hacking).
: >
: You are led to believe that computers have special multiplication
: routines, but at the end of the day, the computer takes your
: mathematics, and uses it's own method to work out the results. You can't
: stop that.

Since when? Steve *does* know what he's talking about here:

When I write a routine, the computer does it the way that *I* tell it to.
Which means that if I put together a table of values to lookup from, it
doesn't decide to go off and do it in its own way, it uses the table of
values.

Multiplication routine in Z80:

Enter with: D & E = numbers to multiply together...

            LD H,tableofsquares/256
            LD A,D
            XOR E  ; find out difference between D and E
            LD L,A
            LD C,(HL)
            LD A,D
            ADD A,E
            LD L,A
            LD A,(HL)
            SUB C
            RET ;... result is in A

Besides, a computer can't only add, it can also OR, XOR and AND, as well
as setting, resetting bits, rotating bits and shifting them too. If you
want to multiply by 2, then the computer can do that immediately. It's
only for arbitrary values rather than powers of two that you need to do
anything special.

BTW: The routine above should work for any pair of D and E that sum to
<256, and which multiply together to give a value <1020 or so...

BTW: (2) - if you want to use it for negative numbers, you'll have to
fiddle around with it a bit -- pre and post-process the numbers to make
them bost positive, and enforce the sign afgterwards.

BTW: (3) - the table of squares is 256 bytes long, page aligned (256 bytes
per page) and is formed like this (pseudoC code follows)

  byte table[256];
  for (int i=0;i<256;i++) {
      table[i] = (i*i)/4;
  }


Si Cooke


Article 68993 of comp.sys.cbm:
Path: news.acns.nwu.edu!merle!judd
From: judd@merle.acns.nwu.edu (Stephen Judd)
Newsgroups: comp.sys.sinclair,comp.sys.cbm,comp.emulators.cbm
Subject: Re: Spectrum Emulator for C64
Date: 9 Jun 1997 16:03:27 GMT
Organization: Northwestern University, Evanston, IL
Lines: 82
Message-ID: <5nh9gf$hd8@news.acns.nwu.edu>
References: <337C5E94.388@actcom.co.il> <3389cd28.1468228@commodore64.com>  <5nglrr$f4q@yama.mcc.ac.uk>
Reply-To: sjudd@nwu.edu (Stephen Judd)
NNTP-Posting-Host: merle.acns.nwu.edu
Xref: news.acns.nwu.edu comp.sys.sinclair:39537 comp.sys.cbm:68993 comp.emulators.cbm:21317

Hello Simon,

In article <5nglrr$f4q@yama.mcc.ac.uk>, Cookie  wrote:
>
>Since when? Steve *does* know what he's talking about here:

What do you mean, *I* know what I'm talking about?  I won't stand for
these kinds of slanderous remarks upon my character from the Spectrum
crowd.

:)

>Multiplication routine in Z80:
>
>Enter with: D & E = numbers to multiply together...
>
>            LD H,tableofsquares/256
>            LD A,D
>            XOR E  ; find out difference between D and E

Don't you need to INC here to get 2's complement?  So far it's just 255 + d-e.

>            LD L,A
>            LD C,(HL)
>            LD A,D
>            ADD A,E
>            LD L,A
>            LD A,(HL)
>            SUB C
>            RET ;... result is in A
>
>Besides, a computer can't only add, it can also OR, XOR and AND, as well
>as setting, resetting bits, rotating bits and shifting them too. If you
>want to multiply by 2, then the computer can do that immediately. It's
>only for arbitrary values rather than powers of two that you need to do
>anything special.
>
>BTW: The routine above should work for any pair of D and E that sum to
><256, and which multiply together to give a value <1020 or so...

Just out of curiosity, why this restriction?  The sum part I can see,
since you're just using the low byte of H (what would it take to get
around this?).  The <1020 restriction I don't see though, although I
can see that it is 255*4.

It looks like you're only using the low byte of f(x), i.e. a single
page table.  If so, then the maximum product value will be eight bits,
i.e. only numbers which multiply to <256 will work.

Does the LD A,(HL) do a 16-bit or 8-bit load?  Also, is the SUB 8 or 16 bits?

Just for completeness, the algorithm I posted last night (this morning? :)
doesn't have these restrictions.  (So Alvin doesn't EVEN get off the hook
by reposting the above code :).

>BTW: (2) - if you want to use it for negative numbers, you'll have to
>fiddle around with it a bit -- pre and post-process the numbers to make
>them bost positive, and enforce the sign afgterwards.

Nah, it's easier than that if you aren't limited by the a+b<256 thing.
All you have to do is put a copy of the table on top of itself.

Let's say you multiply 9*-1, i.e. a=9 and b=255.  Then f(a+b)=f(256+8)=f(8)
since the table sits on top of itself.  So it works equally well for
signed and unsigned numbers, only f(x) needs to be changed (so that
e.g. f(254) is actually f(2), etc.).

>BTW: (3) - the table of squares is 256 bytes long, page aligned (256 bytes
>per page) and is formed like this (pseudoC code follows)
>
>  byte table[256];
>  for (int i=0;i<256;i++) {
>      table[i] = (i*i)/4;
>  }

Don't forget to round (not truncate) the numbers :).

	evetS-

>Si Cooke




Article 69021 of comp.sys.cbm:
Path: news.acns.nwu.edu!newsfeed.acns.nwu.edu!math.ohio-state.edu!howland.erols.net!worldnet.att.net!europa.clark.net!disgorge.news.demon.net!demon!dispatch.news.demon.net!demon!peernews.ftech.net!telehouse1.frontier-networks.co.uk!basilisk.pdc.nhs.gov.uk!yama.mcc.ac.uk!simonc
From: simonc@jumper.mcc.ac.uk (Cookie)
Newsgroups: comp.sys.sinclair,comp.sys.cbm,comp.emulators.cbm
Subject: Re: Spectrum Emulator for C64
Followup-To: comp.sys.sinclair,comp.sys.cbm,comp.emulators.cbm
Date: 9 Jun 1997 22:49:18 GMT
Organization: Sirius Cybernetics Corporation
Message-ID: <5ni19e$ote@yama.mcc.ac.uk>
References: <337C5E94.388@actcom.co.il> <3389cd28.1468228@commodore64.com>  <5nglrr$f4q@yama.mcc.ac.uk> <5nh9gf$hd8@news.acns.nwu.edu>
NNTP-Posting-Host: jumper.mcc.ac.uk
X-Newsreader: TIN [version 1.2 PL2]
Lines: 90
Xref: news.acns.nwu.edu comp.sys.sinclair:39561 comp.sys.cbm:69021 comp.emulators.cbm:21338

Stephen Judd (judd@merle.acns.nwu.edu) wrote:
: >Since when? Steve *does* know what he's talking about here:

: What do you mean, *I* know what I'm talking about?  I won't stand for
: these kinds of slanderous remarks upon my character from the Spectrum
: crowd.

: :)

*grins* Don'tcha just lurve flamewars? Beats the old SU/YS/Crash rivalry
hands down ;)

: >Multiplication routine in Z80:
: >
: >Enter with: D & E = numbers to multiply together...
: >
: >            LD H,tableofsquares/256
: >            LD A,D
: >            XOR E  ; find out difference between D and E

: Don't you need to INC here to get 2's complement?  So far it's just 255 + d-e.

Nope - I just bollocked up completely. Try (instead of XOR E):

       SUB E
       JR NC,novf
       NEG
novf: ...

: >BTW: The routine above should work for any pair of D and E that sum to
: ><256, and which multiply together to give a value <1020 or so...

: Just out of curiosity, why this restriction?  The sum part I can see,
: since you're just using the low byte of H (what would it take to get
: around this?).  The <1020 restriction I don't see though, although I
: can see that it is 255*4.

: It looks like you're only using the low byte of f(x), i.e. a single
: page table.  If so, then the maximum product value will be eight bits,
: i.e. only numbers which multiply to <256 will work.

Yep, that's true... I had a brainstorm at work today - it's all that java
programming (also the multiplication routine had me thinking about
factoring primes the other day, and I can't get it out of my head now ;) )

: Does the LD A,(HL) do a 16-bit or 8-bit load?  Also, is the SUB 8 or 16 bits?

Both the LD A,(HL) and the SUB are 8 bits -- thus the limitation. At the
expense of a little speed, it could be expanded though.

: Just for completeness, the algorithm I posted last night (this morning? :)
: doesn't have these restrictions.  (So Alvin doesn't EVEN get off the hook
: by reposting the above code :).

I'll have to do a search for that one then ;)

: Nah, it's easier than that if you aren't limited by the a+b<256 thing.
: All you have to do is put a copy of the table on top of itself.

: Let's say you multiply 9*-1, i.e. a=9 and b=255.  Then f(a+b)=f(256+8)=f(8)
: since the table sits on top of itself.  So it works equally well for
: signed and unsigned numbers, only f(x) needs to be changed (so that
: e.g. f(254) is actually f(2), etc.).

Excellent! :) Now I've only got to fiddle the routine a bit to handle 8
and 16-bit multiplies correctly, and we're on to a winner ;)

: >BTW: (3) - the table of squares is 256 bytes long, page aligned (256 bytes
: >per page) and is formed like this (pseudoC code follows)
: >
: >  byte table[256];
: >  for (int i=0;i<256;i++) {
: >      table[i] = (i*i)/4;
: >  }

: Don't forget to round (not truncate) the numbers :).

*grins* re: your mail: surely it should be (excuse the superfluous
brackets): ((i*i)+2)/4 instead of (i*i)/4+0.5? I'd have thought that when
working out integer ... hang on, I'm thinking in terms of C here instead
of basic ;) (still, the term on the left of the 2nd line of this paragraph
is safer to use).

Simon

--
+- Email:Simon.Cooke@umist.ac.uk ---- Fidonet: 2:250/124.2 (Simon Cooke) -+
|  Snail: 26 Woodhouse Lane, Sale, Cheshire, M33 4JX   Tel: 0161 976 3426 |
|  Message Pager: 01426 208084 (55p per min peak, 35ppm offpeak)          |
+- WWW: http://jumper.mcc.ac.uk/~simonc ----------------------------------+


Article 69094 of comp.sys.cbm:
Path: news.acns.nwu.edu!merle!judd
From: judd@merle.acns.nwu.edu (Stephen Judd)
Newsgroups: comp.sys.sinclair,comp.sys.cbm,comp.emulators.cbm
Subject: Re: Spectrum Emulator for C64
Date: 10 Jun 1997 17:17:29 GMT
Organization: Northwestern University, Evanston, IL
Lines: 62
Message-ID: <5nk279$muu@news.acns.nwu.edu>
References: <337C5E94.388@actcom.co.il> <5nglrr$f4q@yama.mcc.ac.uk> <5nh9gf$hd8@news.acns.nwu.edu> <5ni19e$ote@yama.mcc.ac.uk>
Reply-To: sjudd@nwu.edu (Stephen Judd)
NNTP-Posting-Host: merle.acns.nwu.edu
Xref: news.acns.nwu.edu comp.sys.sinclair:39619 comp.sys.cbm:69094 comp.emulators.cbm:21389

In article <5ni19e$ote@yama.mcc.ac.uk>, Cookie  wrote:
>Stephen Judd (judd@merle.acns.nwu.edu) wrote:
>
>: >            XOR E  ; find out difference between D and E
>
>: Don't you need to INC here to get 2's complement?  So far it's just 255 + d-e.
>
>Nope - I just bollocked up completely. Try (instead of XOR E):

Well, that's OK, I have no idea what I was thinking when I made that statement,
since it isn't even close to 255+d-e :).  (Actually I do know what I was
"thinking", I just don't know why I was thinking it :).

>
>       SUB E
>       JR NC,novf
>       NEG

NEG takes 2's complement of A?  Now there's an instruction I wouldn't
mind having!

You can get around the NEG business (and CLC/ADC #00 on 6510) by using
another table, but I rarely begrudge the extra four cycles.

Incidentally, for most programs (at least, most of my programs :) only
an eight bit result is retained, so often with a little massaging of
the underlying algorithm the 8-bit result multiply (like the one
you posted) is often sufficient.  (But Alvin still has to do the
16-bit version :).

>: >  byte table[256];
>: >  for (int i=0;i<256;i++) {
>: >      table[i] = (i*i)/4;
>: >  }
>
>: Don't forget to round (not truncate) the numbers :).
>
>*grins* re: your mail: surely it should be (excuse the superfluous
>brackets): ((i*i)+2)/4 instead of (i*i)/4+0.5? I'd have thought that when
>working out integer ... hang on, I'm thinking in terms of C here instead
>of basic ;) (still, the term on the left of the 2nd line of this paragraph
>is safer to use).

Well, I can never keep straight which commands truncate and which round
and which square and whatever.  But yes, the second method ought to
always work :).  If it isn't rounded, though, my recollection is that
most but not all calculations will come out correctly.

Another thing to keep in mind is that it is easy to piggyback calculations
into the table.  For instance, often you really want to calculate c*x*y,
where c=some constant e.g. 32, 3.14159, 1/64, etc.  In terms of the
algorithm this is

	c*(f(x+y) - f(x-y))
	= g(x+y) - g(x-y)

where g(x) = c*x^2/4.  So with a modified table you can get the extra
computation for free.

	evetS-

>+- Email:Simon.Cooke@umist.ac.uk ---- Fidonet: 2:250/124.2 (Simon Cooke) -+


Article 69183 of comp.sys.cbm:
Path: news.acns.nwu.edu!newsfeed.acns.nwu.edu!news.ece.nwu.edu!news.cse.psu.edu!uwm.edu!newsfeeds.sol.net!europa.clark.net!disgorge.news.demon.net!demon!dispatch.news.demon.net!demon!peernews.ftech.net!telehouse1.frontier-networks.co.uk!basilisk.pdc.nhs.gov.uk!yama.mcc.ac.uk!simonc
From: simonc@jumper.mcc.ac.uk (Cookie)
Newsgroups: comp.sys.sinclair,comp.sys.cbm,comp.emulators.cbm
Subject: Re: Spectrum Emulator for C64
Followup-To: comp.sys.sinclair,comp.sys.cbm,comp.emulators.cbm
Date: 11 Jun 1997 09:52:07 GMT
Organization: Sirius Cybernetics Corporation
Message-ID: <5nlsg7$g0c@yama.mcc.ac.uk>
References: <3398550e.5045369@news.demon.co.uk> <5nif18$7p8$2@triglav.iwaynet.net> <339d8a2a.3224107@news.demon.co.uk> 
NNTP-Posting-Host: jumper.mcc.ac.uk
X-Newsreader: TIN [version 1.2 PL2]
Lines: 32
Xref: news.acns.nwu.edu comp.sys.sinclair:39674 comp.sys.cbm:69183 comp.emulators.cbm:21448

Nate_DAC (natedac@dfw.dfw.net) wrote:
: 3) I suspect even the Z80's stack sits at a fixed area of memory, and
:    simply builds up from there as it is used.  How big is the Z80's stack
:    (in bytes)?  What adress does it normally start at, and where does it
:    normally end (counting the entire useable stack in a contiguous block
:    of memory).

The stack normally starts at address 0, growing down (ie first data is
pushed on to the stack at 65535,65534). However, using LD SP,nnnn or LD
SP,HL, and also INC SP, DEC SP, ADD HL,SP, EX (HL),SP, you can do whatever
you like to the stack. It'll wrap at the 64k boundary, but will suffer no
adverse effects through doing so.

Thus, you can have a screen copy routine somewhat like this:

	LD HL,buffadd ; = buffer address (last pair of bytes to copy),
        LD SP,screenadd ; = place to copy it to (
loop:   LD E,(HL)
        DEC L
        LD D,(HL)
        DEC L
        LD C,(HL)
        DEC L
        LD B,(HL)
        DEC HL   ; ensure that we can place the buffer happily on any
                 ; 4byte boundary by doing this...
        PUSH DE
        PUSH BC
        ..loop round... (although to be honest, using LDI would be faster
        but you can still use the stack to clear the screen)

Simon Cooke


Article 69055 of comp.sys.cbm:
Path: news.acns.nwu.edu!newsfeed.acns.nwu.edu!news.ece.nwu.edu!news.cse.psu.edu!uwm.edu!news-peer.gsl.net!news-peer.sprintlink.net!news-pull.sprintlink.net!news-in-east.sprintlink.net!news.sprintlink.net!Sprint!194.159.255.23!disgorge.news.demon.net!demon!dispatch.news.demon.net!demon!peernews.ftech.net!telehouse1.frontier-networks.co.uk!basilisk.pdc.nhs.gov.uk!yama.mcc.ac.uk!simonc
From: simonc@jumper.mcc.ac.uk (Cookie)
Newsgroups: comp.sys.sinclair,comp.sys.cbm,comp.emulators.cbm
Subject: Re: Spectrum Emulator for C64
Followup-To: comp.sys.sinclair,comp.sys.cbm,comp.emulators.cbm
Date: 10 Jun 1997 09:54:27 GMT
Organization: Sirius Cybernetics Corporation
Message-ID: <5nj88j$6f9@yama.mcc.ac.uk>
References: <337C5E94.388@actcom.co.il>  <5maa2u$prl@news.acns.nwu.edu>  <5mli4f$m48@news.acns.nwu.edu>
NNTP-Posting-Host: jumper.mcc.ac.uk
X-Newsreader: TIN [version 1.2 PL2]
Lines: 65
Xref: news.acns.nwu.edu comp.sys.sinclair:39585 comp.sys.cbm:69055 comp.emulators.cbm:21357

Stephen Judd (judd@merle.acns.nwu.edu) wrote:

: If you are interested in how modern chips do divisions, there is an
: article in a recent Siam Journal of Applied Mathematics on the
: Pentium FDIV bug -- as I recall, it amounted to a slightly incorrect
: number in a table.

Any address to look for, or should I do a web search?

: Now, back to chips.  Since you (or whoever it was) never bothered to
: show me the simple comparison code between a Z80 and a 6510, I went
: and did a little research.  As near as I can tell, the typical cycle
: count for a load or a store is around 7 cycles.  On a 6510 it is 4.
: So, far, the Z80 seems around a factor of two slower (so that a 4MHz
: Z80 would indeed be faster).

That'd appear to be 
: 	INX
: which takes 2 cycles.  On the other hand,
: 	INC (HL)

But INC (HL) actually increments the contents of the memory location
pointed to by HL, not moves the pointer by 1.


: takes a rather large 11 cycles -- now we're up to a factor of five
: speed difference.  The kicker, though, are the PC-relative
: instructions

: 	JR cond,e

: which take a whopping 12/7 cycles, and only test for Carry and Zero flags.
: On a 6510 the negative flag (high bit) may also be used as a branch
: condition; the instruction takes 3 cycles if the branch is taken,
: and 2 otherwise.

Alternatively, use the JP instructions which have a constant execution
time, and allow access to all of the flags. (Admittedly, that constant
execution time is about 12 cycles)

: Finally, the ROR type instructions are a factor of four faster on
: the 6510 (when dealing with the accumulator).

: So, what does this tell me?  Not too much.  If I was writing for a Z80,
: I would certainly write to its strengths and try to avoid its weaknesses,
: like I do on the 6510.  But I would postulate that a Z80 is more than
: a factor of two slower than a 6510, and probably much closer to a factor
: of three on average.  For doing raw calculations I would therefore
: expect a stock Spectrum to beat a stock 6510 easily, but of course
: any type of data (i.e. graphical) manipulation will slow down the
: computer by quite a lot.

I reckon we might get more mileage out of this discussion if we start
writing equivalent routines for Z80 and 6510, and throw them out to the
audience... (with timings included, of course).

Anyone know where I can get manufacturers datasheets on the 6510? Talking
hardware and programming info here.

: What happens to the tape when you have thirty files, delete the first,
: then add two more files?

Nah... just turn off your microdrive and plug in your +D instead ;)

Simon


Article 69061 of comp.sys.cbm:
Path: news.acns.nwu.edu!newsfeed.acns.nwu.edu!news.ece.nwu.edu!news.cse.psu.edu!news3.cac.psu.edu!howland.erols.net!ais.net!newsfeed.direct.ca!news.wildstar.net!news.ecn.uoknor.edu!munnari.OZ.AU!harbinger.cc.monash.edu.au!news.cs.su.oz.au!metro!metro!seagoon.newcastle.edu.au!BROLGA.NEWCASTLE.EDU.AU!ecbm
From: "Bruce R. McFarling" 
Newsgroups: comp.sys.cbm
Subject: Re: Why is a C64 harder to emulate, huh?
Date: Tue, 10 Jun 1997 18:36:05 +1000
Organization: The University of Newcastle
Lines: 57
Message-ID: 
References: <1997Jun3.115033.40948@ludens> <5n1g58$316@sf18.dseg.ti.com>
NNTP-Posting-Host: cc.newcastle.edu.au
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
In-Reply-To: <5n1g58$316@sf18.dseg.ti.com>

On 3 Jun 1997, Mike wrote:

> Trying to stay neutral here, but....
> 
> In article <1997Jun3.115033.40948@ludens>, sta@ludens.elte.hu says...
> >
> >You wanna tell me that the Spectrum is capable of displaying a LOT more
> >sprites out of software? Don't make me laugh! I don't believe that a
> >3.5MHz Z80 is even 1.5 times faster than a 1MHz 65xx CPU! Although the
> >C64 has a slightly inferior CPU, its sound and graphics chips are both
> >highly superior to the Spectrum.
> 
> I have a hard time believing that one.  I've programmed both.  The Z80 quite 
> frankly sucks from an instruction set standpoint and it's inefficiency.  
> Given how slow CP/M is on a 2 MHz Z80, I'd guess a 3.5 MHz Z80 is probably 
> roughly equal in power to a 1MHz 6502/6510.

	In the old days, that's what I heard: divide by 4 to compare a Z80
rating to a 6502 rating: the 6502 is directly clocked to memory access,
while the 8080 style chips run through a number of cycles per memory
access.
	But of course, your milage will vary. The Z80 will multiply faster
than a 'roughly equivalent speed' 6502, and move unusual sized pieces
of memory around more efficiently, while for variable length BCD math, the
6502 has an edge.

	At the top of my 'perfect the 6502 instruction set' wish list are
address modes like:

	CLC
	ADC TOS:STACK,X

that handle 16-bit data by toggling the low bit and keeping results in
the zero page:

	- fetch op
	- fetch zp0	(to be indexed)
	- fetch zp1, index zp0
	- load from zp1
	- adc from indexed zp0
	- store to zp1
	- load from zp1' (XOR zp1 #1)
	- adc from indexed zp0' (XOR zp0+X #1)
	- store to zp1'

Nine cycles and three bytes to handle the kind of processing that always
seems to take twice as many bytes and twice as many cycles as it should
when you're coding the 6502.  But do they ever listen to me? *Nooooo*.  Do
that with direct, indexed direct, and zero-page indirect addressing and a
1MHz 6502 would run circles around a 4MHz Z80 regardless of the task,
since it's the 16-bit processing tasks where the Z80 gain an edge.

Virtually,

Bruce R. McFarling, Newcastle, NSW
ecbm@cc.newcastle.edu.au



Article 69174 of comp.sys.cbm:
Path: news.acns.nwu.edu!newsfeed.acns.nwu.edu!news.ece.nwu.edu!news.cse.psu.edu!uwm.edu!news-peer.gsl.net!howland.erols.net!newshub2.home.com!newshub1.home.com!news.home.com!nick.arc.nasa.gov!news.Hawaii.Edu!munnari.OZ.AU!metro!metro!seagoon.newcastle.edu.au!BROLGA.NEWCASTLE.EDU.AU!ecbm
From: "Bruce R. McFarling" 
Newsgroups: comp.sys.sinclair,comp.sys.cbm,comp.emulators.cbm
Subject: Re: Spectrum Emulator for C64
Date: Wed, 11 Jun 1997 16:14:54 +1000
Organization: The University of Newcastle
Lines: 139
Message-ID: 
References: <337C5E94.388@actcom.co.il> <5li6cc$mbv$8@gerry.cc.keele.ac.uk><33823F4D.5737@isis.sund.ac.uk> <33845f94.1768387@commodore64.com><3389cd28.1468228@commodore64.com>   <5mct38$ql1$2@gerry.cc.keele.ac.uk> <01bc71c8$048eda40$090000c0@pc-david> <01bc724c$cb265820$04b8de8b@w9622136> <01bc74ba$c7e2cd40$090000c0@pc-david> <5nhubi$13l0@ds2.acs.ucalgary.ca>
NNTP-Posting-Host: cc.newcastle.edu.au
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
In-Reply-To: <5nhubi$13l0@ds2.acs.ucalgary.ca>
Xref: news.acns.nwu.edu comp.sys.sinclair:39667 comp.sys.cbm:69174 comp.emulators.cbm:21436

On Mon, 9 Jun 1997, Alvin R. Albrecht wrote:

> This is probably our fault for not explaining this properly.
> 
> The cycle times to complete an instruction has very little to do with
> how fast a processor is - it has a lot to do with how the processor was
> put together.  The 6502 and z80 have two *very* different approaches in
> their designs, so you can't simply count cycles.
 
> A 6502 is a microcoded machine.  This means you've got a little state 
> machine in there running the show.  The edges on the clock are used to
> separate sequential steps in executing an instruction.

	The WDC spec sheets lay out how this is done, and the 6502 is not
microcoded.  Each clock cycle is a single action:

> For example:  sta $2000
> 
> To do this, you need to:
> 1) fetch opcode
> 2) decode opcode, increment pc
> 3) fetch msb of $2000, stick it in the msb of of the memory
>    address register (MAR), increment pc
> 4) fetch lsb of $2000, stick it in lsb of MAR, increment pc
> 5) latch A into memory data register (MDR)
> 6) put MAR on address bus, MDR on data bus, perform memory write

	The 6502 always fetches two bytes per instruction, so the opcode
can be parsed while the fetch of the second byte takes place. The
remaining cycles have the instruction register driven by the specific
instruction,

	LDA $2000

	fetch op code
	++PC, fetch $00 to MAR low byte, decode LDA absolute op-code
	++PC, fetch $20 to MAR high byte
	load (MAR) to A
	++PC fetch next op-code ...

	Also, an instruction that can be completed without a memory access
has completion pipelined with the fetch of the first byte of the next
op-code.  So CLC actually takes 3 cycles, but the third cycle overlaps
with the fetch of the next instruction:

	fetch CLC
	++PC, fetch following byte, decode CLC op code
	clear carry, fetch next op-code (PC is stalled at step 2)

> These steps have to be done in order and can't be overlapped.  This is
> what I mean by sequential goals (6 of them here).

	If it seems as if the 6502 is microcoded, it's because of the
occasional overlapping of final access and fetch of the next op-code, 
and because the first two bytes of an instruction are going to known
destinations, so the 6502 can be designed around direct memory bussing.

> You may have a 1MHz clock external to the CPU, but internally, it may be
> 8-10MHz?  Then again, maybe it's not - only Cliff Lawson can tell us.

	Internanlly, it would be 1MHz.  Remember that the modern design of
the 65C02 is done in fewer than 4,000 gates!

> The z80 is hardwired and lets stuff get done as fast as possible.  The
> clock is used to gate data at points in the chip where you have to wait
> for something to be finished before proceeding.  And again this is
> educated guesswork as I obviously didn't design the z80.

	The Z80 is not, AFAIR recall (and the last time I looked at a
Zilog data book was the mid-80's!) directly bussed to memory. The chip is
organised around registers talking on an internal bus, including the
data address and data memory registers that let it talk to the memory bus.


> The point is, the clock tells you zilch.  You can compare them
> by looking at their relative maximum clock rates when they use the same
> technology to manufacture (ie - they have the same on chip device
> geometries).  You just have to look back in history and you'll find this:
> 
> 6502 versions: 1MHz, 2MHz, 3MHz, 4MHz
> z80 versions: 2MHz, 4MHz, 6MHz, 8MHz
> 
> You can see that at the same level of technology, the z80 is always
> clocked twice as fast.  Thus, to compare the 6502 & z80 cycle time, you
> need to multiply 6502 cycles by 2 (on par with z80) or divide z80 cycles
> by 2 (on par with 6502).

	The 4:1 was the rule of thumb way back when.  Obviously, you can
find multiply intensive tasks that will take it closer to 2:1, just as you
can find arbitrary precision arithmatic that will take it closer to 6:1.
The problem with the 'state of technology' argument is that the 6502 is a
simpler chip than the Z80, and as a direct memory bussed chip, it was also
limited by the speed of memory. So today we have 20MHz 65C02's available,
but I'm not sure there are 40MHz Z80's available. Are there?  While the
biggest selling 65C02's presently are probably 4MHz, since that can use
DRAMs instead of SRAMs. 
 
> ... 
> Yes, but your hardware has determined for you how big the sprites are, how
> many colours they have and how many you can have.  The early part of this
> little war stated that the Spectrum can do what the C64 is doing in
> software because of its quick z80 processor.  There are certain
                          ^^^^^^^^^
> fundamental areas where the Spectrum lags behind the C64 and they have to
> do with colour & sound, although I don't view the SID as a huge leap over
> the AY chip (present in US Spectrum & 128k Spectrum but *not* the original
> UK Spectrum).

	Generically, a 3.4MHz Z80 and a 1MHz 6502 are not far apart.  The
devil is in the details.  To be specific, the lack of a 16-bit index
register in the 6502, and the slow handling of 16-bit arithmatic on the
6502's zero-page.
	But that ignores the primary advantage of the 6502 design. It's
simpler, therefore cheaper.  After all, that's why there are still so many
being produced today within ASIC chips: with less than 4,000 gates
required for the 6502 core, there is plenty of room left over to put other
things on the same chip mask, and the expense of multi-chip packages can
be avoided.  The C64 advantage was the same: which a cheap processor, it
could put more money into hardware support at a given price, which helped
it sell more, which gave it the user base, which convinced programmers to
make the effort to work around its weaknesses and turn out useful
programs, which built the user base, which got volumes up and prices down,
and around and around you go.  Litl Jo (my first C64) was around $1,000
for the main unit and a disk drive -- cheapest computer with a disk crive
I could find, and with 64K RAM to boot!  As the years went by, the price
just kept dropping.  Hell, if they had figured out a way to provide the
C128 80-column screen in a way that could be directly retrofitted to a
C64, they'd probably still be making them today. 

> > *SO*, the only method to compare 2 computers (not 2 cpus) should be:

	... which one do you like more?  And there's no accounting for
taste.

Virtually,

Bruce R. McFarling, Newcastle, NSW
ecbm@cc.newcastle.edu.au



Article 69176 of comp.sys.cbm:
Path: news.acns.nwu.edu!newsfeed.acns.nwu.edu!news.ece.nwu.edu!news.cse.psu.edu!uwm.edu!vixen.cso.uiuc.edu!howland.erols.net!newshub2.home.com!newshub1.home.com!news.home.com!nick.arc.nasa.gov!news.Hawaii.Edu!munnari.OZ.AU!metro!metro!seagoon.newcastle.edu.au!BROLGA.NEWCASTLE.EDU.AU!ecbm
From: "Bruce R. McFarling" 
Newsgroups: comp.sys.sinclair,comp.sys.cbm,comp.emulators.cbm
Subject: Re: Spectrum Emulator for C64 (LONG!)
Date: Wed, 11 Jun 1997 16:16:28 +1000
Organization: The University of Newcastle
Lines: 12
Message-ID: 
References: <337C5E94.388@actcom.co.il> <5n23b1$11pa@ds2.acs.ucalgary.ca> <5n4rci$fhc@news.acns.nwu.edu> <5nag9i$7ru@ds2.acs.ucalgary.ca> <11188.imc@comlab.ox.ac.uk> <5nhvru$ote@yama.mcc.ac.uk> <5ni3ek$10am@ds2.acs.ucalgary.ca>
NNTP-Posting-Host: cc.newcastle.edu.au
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
In-Reply-To: <5ni3ek$10am@ds2.acs.ucalgary.ca>
Xref: news.acns.nwu.edu comp.sys.sinclair:39668 comp.sys.cbm:69176 comp.emulators.cbm:21439

On Mon, 9 Jun 1997, Alvin R. Albrecht wrote:

> Hey - let's not give 'em too much rope.  We can get away with EXX & EX
> AF,AF' to get at the alternate set.

	Are these re-entrant interupts?

Virtually,

Bruce R. McFarling, Newcastle, NSW
ecbm@cc.newcastle.edu.au



Article 69223 of comp.sys.cbm:
Path: news.acns.nwu.edu!newsfeed.acns.nwu.edu!news.ece.nwu.edu!news.cse.psu.edu!uwm.edu!vixen.cso.uiuc.edu!howland.erols.net!feed1.news.erols.com!disgorge.news.demon.net!demon!dispatch.news.demon.net!demon!baron.netcom.net.uk!netcom.net.uk!server3.netnews.ja.net!nntpfeed.doc.ic.ac.uk!sunsite.doc.ic.ac.uk!lyra.csx.cam.ac.uk!news.ox.ac.uk!news
From: imc@ecs.ox.ac.uk (Ian Collier)
Newsgroups: comp.sys.sinclair,comp.sys.cbm,comp.emulators.cbm
Subject: Re: Shootout at the 0K Corral (was various other things)
Date: 10 Jun 1997 14:34:16 GMT
Organization: Oxford University Computing Laboratory, UK
Message-ID: <11213.imc@comlab.ox.ac.uk>
References: <337C5E94.388@actcom.co.il> <5n4rci$fhc@news.acns.nwu.edu> <5nag9i$7ru@ds2.acs.ucalgary.ca> <5ngfdo$79r@news.acns.nwu.edu>
NNTP-Posting-Host: boothp2.ecs.ox.ac.uk
X-Local-Date: Tuesday, 10th June 1997 at 3:34pm BST
Lines: 219
Xref: news.acns.nwu.edu comp.sys.sinclair:39699 comp.sys.cbm:69223 comp.emulators.cbm:21471

In article <5ngfdo$79r@news.acns.nwu.edu>, sjudd@nwu.edu (Stephen Judd) wrote:
>Excellent!  I want to test my claim/guess that a Z80 is on average around
>three times slower than a 6510 :).  (Or, to put it in words possibly more
>acceptable to the Spectrum crowd, that you'd have to run a Z80 three times
>faster than a 6510 to get similar performance).

Although I think someone has already said here that you have to run a Z80 at
least two times faster than a 6502 to get the same internal clock speed.

>Z80 indexing looks to be 8+16 (8 bits memory, 16 bits index).
>On a 6510, it's 16+8 -- 16-bit address, 8-bit memory.

I think you didn't quite say what you meant there...

>                                  LDA $C000 is exactly as fast
>as LDA $C000,X and LDA $C000,Y.

Then it sounds like the former instruction is taking too long. :-)

>>      This z80 version multiplies an 8 bit number with a 16 bit
>>      number and keeps the least significant 16 bits of the product.
>>      There is no advantage in dealing exclusively with 8 bit
>>      multiplicands.
>>
>>      enter:  A=8 bit number
>>             DE=16 bit number
>>      exit:  HL=A*DE, least significant 16 bits
>>
>>            LD HL,0		10
>>            LD B,8		7
>>      loop  ADD HL,HL		11
>>            RLA			4
>>'A'         JR NC,noadd		12/7	; if 0: 12, 1: 18
>>'A'         ADD HL,DE		11	;  avg of 15
>>      noadd DJNZ loop		13/8

>Are you sure the above works?

>I think with a DEC B at noadd, it will work more like you want it
>to.  +4 cycles?

Nope - the DEC B is already in there: that's what the D in DJNZ stands
for.  And why it takes 13/8 cycles instead of 12/7 for a normal PC-relative
jump.

>>     First, a z80 solution without making use of the complex 
>>     instruction LDIR to find out what the advantage is of
>>     having 16 bit memory pointers.

>>     loop   LD A,(HL)		7
>>            LD (DE),A		7
>>            INC HL		6
>>            INC DE		6
>>            DEC BC		6
>>            LD A,B		4
>>            OR C		4
>>            JP NZ,loop		10/10

>>     I count nn*(7+7+6+6+6+4+4+10)=50nn cycles.

>This is the routine I use:

>:LOOP	LDA $8000,Y	4
>	STA $8000,Y	4
>	INY		2
>	BNE :LOOP	3/2
>	INC :LOOP+2	  6
>	INC :LOOP+5	  6
>	DEX		  2
>	BNE :LOOP	  3/2

>So, 13 (or 15) * nn cycles.

Hey, self-modifying code. :-)  Well it is used on the Z80 but it is nowhere
near as common as this would appear to imply for the 6502.

We can shorten the Z80 version just be reassigning the BC register into two
8-bit counters like your X and Y.

  loop LD   A,(HL)   7
       LD   (DE),A   7
       INC  HL       6
       INC  DE       6
       DJNZ loop     13/8
       DEC  C        4
       JP   NZ,loop  12/7

Which makes about 39 cycles per byte.

>Now, on a 6510, people who need fast memory fills usually unroll
>part of the loop, e.g.

>	LDA $8000,Y
>	STA $8000,Y
>	LDA $9000,Y
>	STA $9000,Y

>or whatever.  This gets it closer to the maximum possible memory
>transfer rate of 8*nn cycles.

You have to be shifting a whole lot of data to make 8*nn cycles with this
method, and the setup cost would be quite large too.  If you wanted to
move 496 bytes, for example, this technique would be useless.  Unrolling
the other loop would, however, move you closer to 10*nn cycles.

>>     Now, let's have another look with the help of one of the Z80's
>>     complex instructions.
>>
>>             LDIR		21/16
>>
>>     I count 21*nn-5 cycles.

Unrolling the loop here gives you

loop  LDI          16
      LDI          16
      ...
      LDI          16
      JP PE, loop  10

which gets you arbitrarily close to 16 cycles per byte even for relatively
small transfers.

>>to it, I'd like to write a flood fill subroutine.  I can't do that
>>in the next few minutes, so I won't include a z80 version here.  Let
>>me know if you want to try this and I'll come up with an algorithm
>>to follow (or you can come up with one - your choice).

I did one about 14 years ago and it wasn't *that* recursive.

>                So, I'd be interested in seeing a simple line routine,
>including plotting to the screen.

What sort of screen are we talking about?

>                                   I won't make you write a filled polygon :),

I did one of those too (and circles and arcs), although it's far too large
to post here (this would only have been about 7 years ago).  Knocked spots
off the floodfill routine, obviously.  It might even be at NVG somewhere.

>The line routine doesn't need to be very smart -- assume that endpoints
>lie on the screen, etc.

The one in the Spectrum ROM is pretty good, but it doesn't deal with
plotting the points (that is handed out to another subroutine).  Handling
that could make the program quite a bit faster, since plotting from
co-ordinates isn't a particularly fast operation on the spectrum.  It's
something like this, but not exactly since I've taken out the range
check and optimised memory use.  Not that this is the best possible,
by any means.

Draw:  ; (L,H) is one end of the line.  If (M,N) is the other end then
       ; C = abs(N-H), E = sgn(N-H), B = abs(M-L) and D = sgn(M-L).

; The idea is that the line will consist of vertical (or horizontal)
; moves and diagonal moves.  Adding HL+DE will make a diagonal move and
; adding HL+BC will make a vertical move.  Strictly, we want to add
; H+D and L+E, etc, but assuming endpoints lie on the screen a 16-bit
; add will be correct providing we decrement D when E is negative.

      PUSH BC      ;11       these values will be moved over to the
                   ;         alternate register set.
      LD   B,D     ;4
      BIT  7,E     ;8
      JR   Z,L1    ;12/7
      DEC  D       ;4        decrement D if E is negative.
L1:   EXX          ;4        Switch to the alternate register set.
      POP  BC      ;10
      LD   A,C     ;4        Compare the absolute differences.  We will
      CP   B       ;4        move the greater one to B as a counter and
      JR   NC,Horz ;12/7     the lesser one to L.  Go if C is greater.
      LD   L,C     ;4
      EXX          ;4
      XOR  A       ;4        The vertical step is made by setting C=0.
      LD   C,A     ;4        B already contains the appropriate value
      JP   L2      ;10       from above.
Horz: LD   L,B     ;4
      LD   B,C     ;4
      EXX          ;4
      LD   C,E     ;4        The horizontal step is made by setting C=E
      XOR  A       ;4        and B=0.
      LD   B,A     ;4
      BIT  7,C     ;8
      JR   Z,L2    ;12/7
      DEC  B       ;4        Decrement B if C is negative.
L2:   EXX          ;4
      LD   H,B     ;4
      LD   A,B     ;4        A starts off with 1/2H and will have L added
      RRA          ;4        on each step.
Loop: ADD  L       ;4
      JR   C,Diag  ;12/7     If the result is greater than H then a diagonal
      CP   H       ;4        move is made and H is subtracted.  Otherwise
      JR   C,Vert  ;12/7     a vertical move is made.
Diag: SUB  H       ;4
      LD   C,A     ;4
      EXX          ;4
      ADD  HL,DE   ;11
      JP   Move    ;10
Vert: LD   C,A     ;4
      EXX          ;4
      ADD  HL,BC   ;11
Move: CALL PLOTHL
      EXX          ;4
      LD   A,C     ;4
      DJNZ Loop    ;13       B holds the count of pixels to do.
      RET

The main loop takes 95 cycles in the worst case (when every move is
diagonal), plus whatever it takes to plot.  If the screen is so arranged
that pixel (L,H) is located at HL (not impossible - it just requires an
upside-down 8-bit display 256 pixels wide) then this will take 10 cycles.

This routine is untested so it probaly contains bugs.  Oh, and it doesn't
plot the first pixel in the line, because on the spectrum you draw a line
with "PLOT x,y: DRAW a,b".
-- 
---- Ian Collier : imc@comlab.ox.ac.uk : WWW page (including Spectrum section):
------ http://www.comlab.ox.ac.uk/oucl/users/ian.collier/imc.html


Article 69228 of comp.sys.cbm:
Path: news.acns.nwu.edu!merle!judd
From: judd@merle.acns.nwu.edu (Stephen Judd)
Newsgroups: comp.sys.sinclair,comp.sys.cbm,comp.emulators.cbm
Subject: Re: Shootout at the 0K Corral (was various other things)
Date: 11 Jun 1997 21:33:41 GMT
Organization: Northwestern University, Evanston, IL
Lines: 147
Message-ID: <5nn5jl$34u@news.acns.nwu.edu>
References: <337C5E94.388@actcom.co.il> <5nag9i$7ru@ds2.acs.ucalgary.ca> <5ngfdo$79r@news.acns.nwu.edu> <11213.imc@comlab.ox.ac.uk>
Reply-To: sjudd@nwu.edu (Stephen Judd)
NNTP-Posting-Host: merle.acns.nwu.edu
Xref: news.acns.nwu.edu comp.sys.sinclair:39706 comp.sys.cbm:69228 comp.emulators.cbm:21477

In article <11213.imc@comlab.ox.ac.uk>, Ian Collier  wrote:
>In article <5ngfdo$79r@news.acns.nwu.edu>, sjudd@nwu.edu (Stephen Judd) wrote:
>
>>Z80 indexing looks to be 8+16 (8 bits memory, 16 bits index).
>>On a 6510, it's 16+8 -- 16-bit address, 8-bit memory.
>
>I think you didn't quite say what you meant there...

Heh, oops, 16-bit address, 8-bit index. :)

>>                                  LDA $C000 is exactly as fast
>>as LDA $C000,X and LDA $C000,Y.
>
>Then it sounds like the former instruction is taking too long. :-)

I left out the "and vice-versa" :).

>>Are you sure the above works?
>
>>I think with a DEC B at noadd, it will work more like you want it
>>to.  +4 cycles?
>
>Nope - the DEC B is already in there: that's what the D in DJNZ stands
>for.  And why it takes 13/8 cycles instead of 12/7 for a normal PC-relative
>jump.

Ahhh, gotcha, very cool.

>>This is the routine I use:
>
>>:LOOP	LDA $8000,Y	4
>>	STA $8000,Y	4
>>	INY		2
>>	BNE :LOOP	3/2
>>	INC :LOOP+2	  6
>>	INC :LOOP+5	  6
>>	DEX		  2
>>	BNE :LOOP	  3/2
>
>>So, 13 (or 15) * nn cycles.
>
>Hey, self-modifying code. :-)  Well it is used on the Z80 but it is nowhere
>near as common as this would appear to imply for the 6502.

Well, it's not THAT common on the 6502, but it comes in awfully handy
at times.

Some of the fresher CS-college types think its bad coding form but
when your program has the whole machine to itself, well, who cares? :)

>>Now, on a 6510, people who need fast memory fills usually unroll
>>part of the loop, e.g.
>
>>	LDA $8000,Y
>>	STA $8000,Y
>>	LDA $9000,Y
>>	STA $9000,Y
>
>>or whatever.  This gets it closer to the maximum possible memory
>>transfer rate of 8*nn cycles.
>
>You have to be shifting a whole lot of data to make 8*nn cycles with this
>method, and the setup cost would be quite large too.  If you wanted to
>move 496 bytes, for example, this technique would be useless.  Unrolling
>the other loop would, however, move you closer to 10*nn cycles.

Well, I didn't mean for transferring an arbitrary block of memory,
but rather a fixed block from one place to another, i.e. blasting
things to the screen.  As you say, it won't ever really get down
to 8*nn cycles, but it can certainly come down from 13*nn.

>Unrolling the loop here gives you
>
>loop  LDI          16
>      LDI          16
>      ...
>      LDI          16
>      JP PE, loop  10
>
>which gets you arbitrarily close to 16 cycles per byte even for relatively
>small transfers.

OK, nifty.

>>                So, I'd be interested in seeing a simple line routine,
>>including plotting to the screen.
>
>What sort of screen are we talking about?

The Spectrum's screen :).  If there are different modes and such,
well, I don't care what mode is used, just as long as its useful
for something, i.e. a routine you would use in a 3D game.

>>                                   I won't make you write a filled polygon :),
>
>I did one of those too (and circles and arcs), although it's far too large
>to post here (this would only have been about 7 years ago).  Knocked spots
>off the floodfill routine, obviously.  It might even be at NVG somewhere.

Yep, polygonamy's is pretty straightforward theoretically and a real bear
implementation-wise (but it suuuure is fast :).  I just think almost
any routine will be too much code for posting, but if someone wants
to send me their routine, well, I'm happy to look at it!

Filled circles are really easy :).

>>The line routine doesn't need to be very smart -- assume that endpoints
>>lie on the screen, etc.
>
>The one in the Spectrum ROM is pretty good, but it doesn't deal with

Thanks, I'll go through it later on. :)

Does it plot stuff straight into the bitmap or does it OR it into
the map?  What would be the plot time for ORing it into the map?

My routine is a little more complicated in that it plots points in
chunks at a time, but it is something like

	17 cycles per normal main loop iteration
	50 cycles when column is passed (every eight pixels)
	50 cycles when Y coordinate is incremented

for lines with slope < 1.  That includes OR-plotting into an arbitrarily
sized character bitmap.  Obviously lines near 45 degrees are a little
slower but lines near the horizontal just blaze.  It involves a few more
cycles to plot points but it plots them less frequently than, e.g. my older
versions which plotted every point but averaged around 30-40 cycles/point.
(In my experience, this routine is MUCH faster for general use).

For lines with slope >1 (I am just reading this off of my dim4 code in The
Fridge, which has some other bells and whistles tacked on), it looks to
be around 28 cycles main loop, plus 13 cycles per step in the x-direction,
plus 28 or so cycles when a column is passed.

Since I think I wrote the routine up for C=Hacking Issue #13, I won't
explain it here :).

	-Steve

P.S. I'm still curious to see the fast 16-bit multiply :)


>---- Ian Collier : imc@comlab.ox.ac.uk : WWW page (including Spectrum section):
>------ http://www.comlab.ox.ac.uk/oucl/users/ian.collier/imc.html




Article 69298 of comp.sys.cbm:
Path: news.acns.nwu.edu!newsfeed.acns.nwu.edu!news.ece.nwu.edu!news.cse.psu.edu!uwm.edu!vixen.cso.uiuc.edu!ais.net!newsfeed.direct.ca!news.he.net!news.pagesat.net!decwrl!tribune.usask.ca!rover.ucs.ualberta.ca!news.ucalgary.ca!srv1.freenet.calgary.ab.ca!albrecht
From: "Alvin R. Albrecht" 
Newsgroups: comp.sys.sinclair,comp.sys.cbm,comp.emulators.cbm
Subject: Re: Spectrum Emulator for C64 [additonal translation]
Date: Wed, 11 Jun 1997 14:20:49 -0600
Organization: Calgary Free-Net
Lines: 35
Message-ID: <5nn1cj$17j4@ds2.acs.ucalgary.ca>
References: <337C5E94.388@actcom.co.il> <01bc6c89$ae5d4a00$04b8de8b@w9622136> <338EE6D2.23A6@tbaytel.net> <01bc6d0c$b90244a0$04b8de8b@w9622136> <5mmvp4$b7r@news.acns.nwu.edu> <5mvqq7$7qk$6@gerry.cc.keele.ac.uk>
NNTP-Posting-Host: albrecht@srv1.freenet.calgary.ab.ca
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
In-Reply-To: <5mvqq7$7qk$6@gerry.cc.keele.ac.uk>
Xref: news.acns.nwu.edu comp.sys.sinclair:39785 comp.sys.cbm:69298 comp.emulators.cbm:21543



On 3 Jun 1997, Spike wrote:

> Stephen Judd (judd@merle.acns.nwu.edu) wrote:
> : I will again plug my own lowly dim4, available on my web page.  No Spectrum
> : is going to match it, just due to the lack of custom characters.  
 
> What do you mean by "custom" characters?
> If you mean redefinable, then it is quite simple to redefine characters on
> the computer.... In fact, several are set aside specifically for that
> purpose.....

He means the characters are drawn on the screen by the hardware.  So
you have a 40x25 (C64 text resolution) block of memory used to store
character bytes.  As the screen is drawn, the characters are drawn on the
screen for you by the display hardware with the help of some dma access
to the bitmap character definitions.

The Spectrum must draw the characters on the screen.  Here's some free
info:  the difference in speed will not be as large as Stephen is thinking
because of the organization of the Spectrum's display file.  We can
plot 8 pixels at a time.  So to draw a single character on the screen
would need 8 pokes vs 1 for the C64 and we will not be limited to
one character set per frame.  The C64 isn't either if it changes character
sets after each ~6 lines is drawn, but that's computationally wasteful
if other things need to be done besides the graphics (the C64 *must*
do this on every frame @ 50/60Hz).  The Spectrum can do it at its leisure
at a more sane pace of say 12 frames/sec.


Alvin





Article 69309 of comp.sys.cbm:
Path: news.acns.nwu.edu!newsfeed.acns.nwu.edu!news.ece.nwu.edu!news.cse.psu.edu!uwm.edu!newsfeeds.sol.net!europa.clark.net!disgorge.news.demon.net!demon!dispatch.news.demon.net!demon!mail2news.demon.co.uk!not-for-mail
From: Jason 
Newsgroups: comp.sys.sinclair,comp.sys.cbm,comp.emulators.cbm
Subject: Re: Spectrum Emulator for C64 [additonal translation]
Followup-To: comp.sys.sinclair,comp.sys.cbm,comp.emulators.cbm
Date: Thu, 12 Jun 97 19:47:44 GMT
Organization: Cosine Systems
Message-ID: <9706121947.AA00g8r@cosine.demon.co.uk>
References: <337C5E94.388@actcom.co.il> <01bc6c89$ae5d4a00$04b8de8b@w9622136> <338EE6D2.23A6@tbaytel.net> <01bc6d0c$b90244a0$04b8de8b@w9622136> <5mmvp4$b7r@news.acns.nwu.edu> <5mvqq7$7qk$6@gerry.cc.keele.ac.uk> <5nn1cj$17j4@ds2.acs.ucalgary.ca>
X-Mail2News-User: tmr@cosine.demon.co.uk
X-Mail2News-Path: relay-1.mail.demon.net!gate.demon.co.uk!cosine.demon.co.uk
X-Newsreader: TIN [AMIGA 1.3 950726BETA PL0]
Lines: 33
Xref: news.acns.nwu.edu comp.sys.sinclair:39802 comp.sys.cbm:69309 comp.emulators.cbm:21558

Alvin R. Albrecht:
> The Spectrum must draw the characters on the screen.  Here's some free
> info:  the difference in speed will not be as large as Stephen is thinking
> because of the organization of the Spectrum's display file.  We can
> plot 8 pixels at a time.  So to draw a single character on the screen
> would need 8 pokes vs 1 for the C64 and we will not be limited to
> one character set per frame.  The C64 isn't either if it changes character
> sets after each ~6 lines is drawn, but that's computationally wasteful
> if other things need to be done besides the graphics (the C64 *must*
> do this on every frame @ 50/60Hz).  The Spectrum can do it at its leisure
> at a more sane pace of say 12 frames/sec.

The C64 *also* has dynamic redefinition, so if you update the contents of
the charset for the character A, all references onscreen will update
immediately.  Dim4, his demo, uses multiple copies of the same sections of
a charset (in different colours) to give a large number of vector objects
all over the screen (it's running now, with about fourty objects, some
filled) at a minimum of frame overhead.  The Speccy would have to refresh
*all* of these objects one at a time and that is a *lot* more processing.

Imagine just shifting fourty filled vector objects (all in the same position)
at anything approaching a fast framerate, then remember that Steve's code
does more than one object movement.
--
Jason  =-)
     _______________________________________________________________________
TMR /     /     /     /  /     /     /                                     /\
   /  /__/  /  /  /__/  /  /  /  /__/    Email: tmr@cosine.demon.co.uk    / /
  /  /\_/  /  /__   /  /  /  /  __//          Cosine Homepage:           / /
 /  /__/  /  /  /  /  /  /  /  /  /    http://www.cosine.demon.co.uk    / /
/_____/_____/_____/__/__/__/_____/_____________________________________/ /
\_____\_____\_____\__\__\__\_____\_____________________________________\/



Article 69321 of comp.sys.cbm:
Path: news.acns.nwu.edu!merle!judd
From: judd@merle.acns.nwu.edu (Stephen Judd)
Newsgroups: comp.sys.sinclair,comp.sys.cbm,comp.emulators.cbm
Subject: Re: Spectrum Emulator for C64 [additonal translation]
Date: 12 Jun 1997 21:38:54 GMT
Organization: Northwestern University, Evanston, IL
Lines: 15
Message-ID: <5npq9e$5hj@news.acns.nwu.edu>
References: <337C5E94.388@actcom.co.il> <5mvqq7$7qk$6@gerry.cc.keele.ac.uk> <5nn1cj$17j4@ds2.acs.ucalgary.ca> <9706121947.AA00g8r@cosine.demon.co.uk>
Reply-To: sjudd@nwu.edu (Stephen Judd)
NNTP-Posting-Host: merle.acns.nwu.edu
Xref: news.acns.nwu.edu comp.sys.sinclair:39810 comp.sys.cbm:69321 comp.emulators.cbm:21567

In article <9706121947.AA00g8r@cosine.demon.co.uk>,
Jason   wrote:
>Alvin R. Albrecht:
>> The Spectrum must draw the characters on the screen.  Here's some free
>
>The C64 *also* has dynamic redefinition, so if you update the contents of
>the charset for the character A, all references onscreen will update
>immediately.  Dim4, his demo, uses multiple copies of the same sections of

Folks could also check out Jason's demo, from the same competition, to
see more uses of custom characters :).

-S

>Jason  =-)


Article 69307 of comp.sys.cbm:
Path: news.acns.nwu.edu!merle!judd
From: judd@merle.acns.nwu.edu (Stephen Judd)
Newsgroups: comp.sys.sinclair,comp.sys.cbm,comp.emulators.cbm
Subject: Re: Spectrum Emulator for C64 [additonal translation]
Date: 12 Jun 1997 19:50:37 GMT
Organization: Northwestern University, Evanston, IL
Lines: 125
Message-ID: <5npjud$2rv@news.acns.nwu.edu>
References: <337C5E94.388@actcom.co.il> <5mmvp4$b7r@news.acns.nwu.edu> <5mvqq7$7qk$6@gerry.cc.keele.ac.uk> <5nn1cj$17j4@ds2.acs.ucalgary.ca>
Reply-To: sjudd@nwu.edu (Stephen Judd)
NNTP-Posting-Host: merle.acns.nwu.edu
Xref: news.acns.nwu.edu comp.sys.sinclair:39801 comp.sys.cbm:69307 comp.emulators.cbm:21554

In article <5nn1cj$17j4@ds2.acs.ucalgary.ca>,
Alvin R. Albrecht  wrote:
>On 3 Jun 1997, Spike wrote:
>> Stephen Judd (judd@merle.acns.nwu.edu) wrote:
>> : I will again plug my own lowly dim4, available on my web page.  No Spectrum
>> : is going to match it, just due to the lack of custom characters.  
> 
>> What do you mean by "custom" characters?
>> If you mean redefinable, then it is quite simple to redefine characters on
>> the computer.... In fact, several are set aside specifically for that
>> purpose.....
>
>He means the characters are drawn on the screen by the hardware.  So
>you have a 40x25 (C64 text resolution) block of memory used to store
>character bytes.  As the screen is drawn, the characters are drawn on the

More importantly, there are 256 independent 8x8 characters. 

>The Spectrum must draw the characters on the screen.  Here's some free
>info:  the difference in speed will not be as large as Stephen is thinking

Oh yes it will!  (Or do you know what I'm thinking? :)

>because of the organization of the Spectrum's display file.  We can
>plot 8 pixels at a time.  So to draw a single character on the screen

I'm not sure if you're attempting to imply anything here, but the 64 bitmap
is of course also arranged in byte fashion, so that "we" can of course
"plot" 8 pixels at a time as well.

>would need 8 pokes vs 1 for the C64 and we will not be limited to

Not just eight pokes, eight memory reads and eight memory writes --
sixteen memory accesses.  You would also have to compute a pointer to
the data for the particular character, after figuring out which character
it is to be displayed, and presumably each memory access would also
involve adjusting this pointer.  A "mere" factor of eight is bad enough:
that's the difference between 30 frames per second and 4 frames/second; an
appalling factor of twenty is more likely in this case.  Spread it over 1000
memory locations, and you're talking about an awful lot of work.

Of course, if there's some sneaky way of doing this, then feel free to
post the code.

Moreover, if a character is in place on the 64's screen, and that character's
data is changed -- well, you're done, the character is still in place.
It seems to me (who knows, maybe I'm wrong) that the entire Spectrum
screen would need to be redrawn in this case.

To update the bitmap is equivalent to moving around 8k of data.  Even if
we just did this as a block transfer, my recollection is that the Z80
LDI would require 16 cycles/byte => 128,000 cycles.  At 3.54MHz that
gives me around 0.0362 seconds, which is around 2 PAL frames.  My guess
is the overhead will about double that.  So 3-4 frames, just to update
the screen, with no other calculations going on.

(I am starting to understand why no high-speed terminal programs were
written for the Speccy).

Since nobody seems willing to take a look at dim4 I guess I'll just describe
it.  What dim4 does is to draw one 4D object and four solid 3D objects into 
a series of character bitmaps spanning a single character set (all five bitmaps
use less than 256 characters total).  These are then placed on the screen
at random, so they start to pile up on top of each other, etc.  So at
some point you've got maybe a hundred 3D objects, layered on top of one
another, all rotating.  I use two character sets as a double-buffer, so
the updating is very smooth.  Numbers seem to impress people so I'll
also mention that some of the objects are very complex -- if I remember
correctly one of the 4D objects has 32 points and 96 connections (i.e.
lines to be drawn), and its corresponding 3D objects involve 22 points
and 24 line connections.  4D points require two projections, and each
point involves three rotations.  I should probably mention that a
three-voice tune is playing while all this is going on, and that the
total compressed program size is 4095 bytes.  Not only do the custom
characters offer a much more convenient memory layout, but they mean
the program can spend its time doing calculations and such instead of
trying to update the screen.

Not all custom character sets are used as general bitmaps of course.  
Usually they are just used as characters -- a font, a background, etc.
The Ultima series and Laser Squad are examples of games which use
custom characters in small groups for the display (groups of 2x2 sets
of characters to make 16x16 pixels).  Some games use them as a backround
display -- for instance, a ship sailing on some water.  To make the
water move on the entire screen is a matter of changing the character 
data -- eight bytes.  Anytime there is data which is repeated in any
way on the screen, custom characters are extremely useful, and very
flexible.  How many programs use data which appears simultaneously
in more than one place on the screen?

Moreover, custom character sets use 256*8=2k of memory each.  Compare
with 8k for a regular bitmap.

Are custom characters the end-all and be-all of everything?  Of course
not.  What they are is just another tool available to 64 programmers
which can be used in a wide variety of circumstances to solve particular
problems (and in a very simple and efficient, not to mention extremely
flexible, way).

>one character set per frame.  The C64 isn't either if it changes character
>sets after each ~6 lines is drawn, but that's computationally wasteful

Not really: changing character sets involves moving a pointer.  Interrupt
overhead is 13 or so cycles in the worst case.  The net overhead involved
is wholly trivial -- less than a single raster line.  Every 6 rows is
the absolute worst-case scenario (unique custom characters completely 
covering the screen), and translates to a 1/50th decrease in speed.
This is supposed to be more computationally wasteful than using the
processor to update the entire screen bitmap?

Of course, the C64 could also just use a normal bitmap, too.  I suppose it
would depend on the problem being solved.

>if other things need to be done besides the graphics (the C64 *must*
>do this on every frame @ 50/60Hz).  The Spectrum can do it at its leisure
>at a more sane pace of say 12 frames/sec.

Note that the 64 can also change modes in the middle of the screen --
use character sets in one place, use bitmap in another, etc.  The
right tool for the right job.

Well, if anyone thinks my numbers and predictions are way off, please feel
free to post/email some demonstration code.  I'd be happy to take a look at it.

-S


Article 69055 of comp.sys.cbm:
Path: news.acns.nwu.edu!newsfeed.acns.nwu.edu!news.ece.nwu.edu!news.cse.psu.edu!uwm.edu!news-peer.gsl.net!news-peer.sprintlink.net!news-pull.sprintlink.net!news-in-east.sprintlink.net!news.sprintlink.net!Sprint!194.159.255.23!disgorge.news.demon.net!demon!dispatch.news.demon.net!demon!peernews.ftech.net!telehouse1.frontier-networks.co.uk!basilisk.pdc.nhs.gov.uk!yama.mcc.ac.uk!simonc
From: simonc@jumper.mcc.ac.uk (Cookie)
Newsgroups: comp.sys.sinclair,comp.sys.cbm,comp.emulators.cbm
Subject: Re: Spectrum Emulator for C64
Followup-To: comp.sys.sinclair,comp.sys.cbm,comp.emulators.cbm
Date: 10 Jun 1997 09:54:27 GMT
Organization: Sirius Cybernetics Corporation
Message-ID: <5nj88j$6f9@yama.mcc.ac.uk>
References: <337C5E94.388@actcom.co.il>  <5maa2u$prl@news.acns.nwu.edu>  <5mli4f$m48@news.acns.nwu.edu>
NNTP-Posting-Host: jumper.mcc.ac.uk
X-Newsreader: TIN [version 1.2 PL2]
Lines: 65
Xref: news.acns.nwu.edu comp.sys.sinclair:39585 comp.sys.cbm:69055 comp.emulators.cbm:21357

Stephen Judd (judd@merle.acns.nwu.edu) wrote:

: If you are interested in how modern chips do divisions, there is an
: article in a recent Siam Journal of Applied Mathematics on the
: Pentium FDIV bug -- as I recall, it amounted to a slightly incorrect
: number in a table.

Any address to look for, or should I do a web search?

: Now, back to chips.  Since you (or whoever it was) never bothered to
: show me the simple comparison code between a Z80 and a 6510, I went
: and did a little research.  As near as I can tell, the typical cycle
: count for a load or a store is around 7 cycles.  On a 6510 it is 4.
: So, far, the Z80 seems around a factor of two slower (so that a 4MHz
: Z80 would indeed be faster).

That'd appear to be 
: 	INX
: which takes 2 cycles.  On the other hand,
: 	INC (HL)

But INC (HL) actually increments the contents of the memory location
pointed to by HL, not moves the pointer by 1.


: takes a rather large 11 cycles -- now we're up to a factor of five
: speed difference.  The kicker, though, are the PC-relative
: instructions

: 	JR cond,e

: which take a whopping 12/7 cycles, and only test for Carry and Zero flags.
: On a 6510 the negative flag (high bit) may also be used as a branch
: condition; the instruction takes 3 cycles if the branch is taken,
: and 2 otherwise.

Alternatively, use the JP instructions which have a constant execution
time, and allow access to all of the flags. (Admittedly, that constant
execution time is about 12 cycles)

: Finally, the ROR type instructions are a factor of four faster on
: the 6510 (when dealing with the accumulator).

: So, what does this tell me?  Not too much.  If I was writing for a Z80,
: I would certainly write to its strengths and try to avoid its weaknesses,
: like I do on the 6510.  But I would postulate that a Z80 is more than
: a factor of two slower than a 6510, and probably much closer to a factor
: of three on average.  For doing raw calculations I would therefore
: expect a stock Spectrum to beat a stock 6510 easily, but of course
: any type of data (i.e. graphical) manipulation will slow down the
: computer by quite a lot.

I reckon we might get more mileage out of this discussion if we start
writing equivalent routines for Z80 and 6510, and throw them out to the
audience... (with timings included, of course).

Anyone know where I can get manufacturers datasheets on the 6510? Talking
hardware and programming info here.

: What happens to the tape when you have thirty files, delete the first,
: then add two more files?

Nah... just turn off your microdrive and plug in your +D instead ;)

Simon


Article 69323 of comp.sys.cbm:
Path: news.acns.nwu.edu!merle!judd
From: judd@merle.acns.nwu.edu (Stephen Judd)
Newsgroups: comp.sys.sinclair,comp.sys.cbm,comp.emulators.cbm
Subject: Re: Spectrum Emulator for C64
Date: 12 Jun 1997 21:46:34 GMT
Organization: Northwestern University, Evanston, IL
Lines: 52
Message-ID: <5npqnq$5o6@news.acns.nwu.edu>
References: <337C5E94.388@actcom.co.il>  <5mli4f$m48@news.acns.nwu.edu> <5nj88j$6f9@yama.mcc.ac.uk>
Reply-To: sjudd@nwu.edu (Stephen Judd)
NNTP-Posting-Host: merle.acns.nwu.edu
Xref: news.acns.nwu.edu comp.sys.sinclair:39812 comp.sys.cbm:69323 comp.emulators.cbm:21569

Hello Simon,

In article <5nj88j$6f9@yama.mcc.ac.uk>, Cookie  wrote:
>Stephen Judd (judd@merle.acns.nwu.edu) wrote:
>
>: If you are interested in how modern chips do divisions, there is an
>: article in a recent Siam Journal of Applied Mathematics on the
>: Pentium FDIV bug -- as I recall, it amounted to a slightly incorrect
>: number in a table.
>
>Any address to look for, or should I do a web search?

Heh, I finally found it.  Siam Review, v39 no.1 (March 1997),
"The Mathematics of the Pentium Division Bug", Alan Edelman.

(Note: NOT the Siam J of Applied Math as I previously thought :).

>But INC (HL) actually increments the contents of the memory location
>pointed to by HL, not moves the pointer by 1.

Ah, of course, how silly of me :)

>Alternatively, use the JP instructions which have a constant execution
>time, and allow access to all of the flags. (Admittedly, that constant
>execution time is about 12 cycles)

What's the difference between JPZ and JRZ (assuming those are
legal instructions :), in practical terms?  Which is generally
used/more useful?

>I reckon we might get more mileage out of this discussion if we start
>writing equivalent routines for Z80 and 6510, and throw them out to the
>audience... (with timings included, of course).

Yep :).

>Anyone know where I can get manufacturers datasheets on the 6510? Talking
>hardware and programming info here.

Well, some places to start are Marko Makela's homepage

	http://www.hut.fi/~msmakela/cbm/

which has lots of technical info and links to more technical info, and
the Driven homepage

	http://soho.ios.com/~coolhnd/

which has links to assembly tutorials, which presumably would contain
programming info.

-Steve


Article 69499 of comp.sys.cbm:
Path: news.acns.nwu.edu!newsfeed.acns.nwu.edu!news.ece.nwu.edu!news.cse.psu.edu!news3.cac.psu.edu!howland.erols.net!cpk-news-hub1.bbnplanet.com!su-news-feed4.bbnplanet.com!news.bbnplanet.com!enews.sgi.com!decwrl!tribune.usask.ca!rover.ucs.ualberta.ca!news.ucalgary.ca!srv1.freenet.calgary.ab.ca!albrecht
From: "Alvin R. Albrecht" 
Newsgroups: comp.sys.sinclair,comp.sys.cbm,comp.emulators.cbm
Subject: Re: Spectrum Emulator for C64 [additonal translation]
Date: Sat, 14 Jun 1997 12:18:25 -0600
Organization: Calgary Free-Net
Lines: 187
Message-ID: <5nunbg$t3e@ds2.acs.ucalgary.ca>
References: <337C5E94.388@actcom.co.il> <5mmvp4$b7r@news.acns.nwu.edu> <5mvqq7$7qk$6@gerry.cc.keele.ac.uk> <5nn1cj$17j4@ds2.acs.ucalgary.ca> <5npjud$2rv@news.acns.nwu.edu>
Reply-To: "Alvin R. Albrecht" 
NNTP-Posting-Host: albrecht@srv1.freenet.calgary.ab.ca
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
In-Reply-To: <5npjud$2rv@news.acns.nwu.edu>
Xref: news.acns.nwu.edu comp.sys.sinclair:39945 comp.sys.cbm:69499 comp.emulators.cbm:21678



On 12 Jun 1997, Stephen Judd wrote:


> >because of the organization of the Spectrum's display file.  We can
> >plot 8 pixels at a time.  So to draw a single character on the screen
 
> I'm not sure if you're attempting to imply anything here, but the 64 bitmap
> is of course also arranged in byte fashion, so that "we" can of course
> "plot" 8 pixels at a time as well.

I wasn't sure if the colour was separate or not.
 
> >would need 8 pokes vs 1 for the C64 and we will not be limited to
 
> Not just eight pokes, eight memory reads and eight memory writes --
> sixteen memory accesses.  You would also have to compute a pointer to

A little oversight on my part ;).

> the data for the particular character, after figuring out which character
> it is to be displayed, and presumably each memory access would also
> involve adjusting this pointer.  A "mere" factor of eight is bad enough:
> that's the difference between 30 frames per second and 4 frames/second; an

It doesn't take you a full frame to poke 40x25 characters does it?

> appalling factor of twenty is more likely in this case.  Spread it over 1000
> memory locations, and you're talking about an awful lot of work.
 
> Of course, if there's some sneaky way of doing this, then feel free to
> post the code.

It doesn't have to be sneaky:

1. Calculate bitmap location for char in A register (DE=address of
bitmaps):

LD L,A
LD H,0
ADD HL,HL  ; HL=8*A if chars are 8 pixels tall
ADD HL,HL
ADD HL,HL
ADD HL,DE

4+6+4*11=54 cycles

2. Draw char on screen at screen address in DE, HL=bitmap

     LD B,8
loop LD A,(HL)
     LD (DE),A
     INC D        ; because of display file organization,
                    vertical bytes of 8x8 chars on screen
                    are 256 bytes apart
     INC HL
     DJNZ loop

6+8*(7+7+4+6+13)-5=297 cycles.

3.  To put these together doesn't require a lot of glue.  But
keep in mind the calculations below don't include the glue.

To do the entire 32x24 screen:

32*24*(297+54)=269568 cycles or @3.54MHz=0.0837s=12 fps.

This is not how things are normally done on the Spectrum, it's just
a direct translation of what you've done in dim4.
 
> Moreover, if a character is in place on the 64's screen, and that character's
> data is changed -- well, you're done, the character is still in place.
> It seems to me (who knows, maybe I'm wrong) that the entire Spectrum
> screen would need to be redrawn in this case.

Anything that needs to be changed has to be redrawn.  Fortunately,
once it's up there, it's up there forever.  So I can, for example,
print in one font on the screen, change fonts and print in another
without worrying that the first font changes.
 
> To update the bitmap is equivalent to moving around 8k of data.  Even if

6144 bytes in monochrome in the UK 48K Spectrum case.

> we just did this as a block transfer, my recollection is that the Z80
> LDI would require 16 cycles/byte => 128,000 cycles.  At 3.54MHz that
> gives me around 0.0362 seconds, which is around 2 PAL frames.  My guess

1/0.0362=27 frames per second.  To make things look smooth I
probably need ~10 frames per second.  On the original UK 48k Spectrum
which does not have double-buffering ability, to avoid flicker on
a fully drawn screen, I need to update the entire screen in about
2 frames = 2x1/50=0.04seconds.

I can simulate double buffering by keeping a 2nd screen in memory (often 
there isn't enough RAM for full screen, so it is sometimes
applied to a portion of the screen instead) and drawing on
it.  Then I don't need to worry about drawing on screen within
0.04seconds: I draw at leisure on the background screen and then do a copy
to main screen within 0.04seconds.  The US 48k Spectrum and the 128k
Spectrums have a 2nd display file which eliminates that last copy.

This is full screen capable.  Usually only the parts of the screen
that need updating are updated.

> is the overhead will about double that. So 3-4 frames, just to update 
> the screen, with no other calculations going on.

Don't forget I don't *have* to do this on every frame.  I can reduce
the frame rate to 10 fps rather than the maximum ~27 fps found above.
That leaves a lot of time for other calculations.

> (I am starting to understand why no high-speed terminal programs were
> written for the Speccy).

Hmmm.  You need a 50Hz frame rate for your vt100 emulator?  The Spectrum
can plot in bold, underline, unlimited number of character sets and
sizes, etc.  using a method similar to that given above.  The char is
drawn once only, not at 50fps.

The lack of high-speed terminal programs is due solely to a lack of
hardware.

> Since nobody seems willing to take a look at dim4 I guess I'll just describe
> it.  What dim4 does is to draw one 4D object and four solid 3D objects into 

Very impressive :-).  I only wish I had a C64 emulator to see it - and
I'm using a lowly 386sx pc right now so I know it's futile to look for
one :(.

Your program is directly transferrable to the Spectrum, but it would
be drawn at a reduced frame rate of say 10-12fps (example above).

> Not all custom character sets are used as general bitmaps of course.  
> Usually they are just used as characters -- a font, a background, etc.

Limited to 256 chars.  With ~50 chars per character set, that's 5
independent character sets of one cell size.  The Spectrum draws 'em,
so there is no upper limit on size and number of character sets on
screen.  The C64 can go this route as well.  I hope the above has
convinced you that the text generator, though a nice tool, isn't
really an advantage over a Spectrum which does the same in software
at decent frame rates.

> The Ultima series and Laser Squad are examples of games which use
> custom characters in small groups for the display (groups of 2x2 sets
> of characters to make 16x16 pixels).  Some games use them as a backround

A very Spectrum way of doing things ;).

> >one character set per frame.  The C64 isn't either if it changes character
> >sets after each ~6 lines is drawn, but that's computationally wasteful

> Not really: changing character sets involves moving a pointer. Interrupt
> overhead is 13 or so cycles in the worst case.  The net overhead involved
> is wholly trivial -- less than a single raster line.  Every 6 rows is
> the absolute worst-case scenario (unique custom characters completely 
> covering the screen), and translates to a 1/50th decrease in speed.
> This is supposed to be more computationally wasteful than using the
> processor to update the entire screen bitmap?

Sorry, I misspoke.  I meant you will be "busy-waiting", to borrow a
term from someone else.  That can take up a significant amount of time
given that you must do this on every single frame.  The time doesn't
have to be wasted, of course.  You can use it to do fixed cycle length
stuff, but that's not always easy or practical.
 
> Of course, the C64 could also just use a normal bitmap, too.  I suppose it
> would depend on the problem being solved.

Yes, but the 6502 @ 1MHz (with its slower access to full 64k) and the Z80
@ 3.54MHz gives the Spectrum a large advantage in that department.
The C64 special hardware makes up for that but you can only do
what the hardware lets you.
 
> Note that the 64 can also change modes in the middle of the screen --
> use character sets in one place, use bitmap in another, etc.  The
> right tool for the right job.

Software is always more flexible than hardware and if you've got the
speed, you don't need the hardware.
 

Alvin




Article 69569 of comp.sys.cbm:
Path: news.acns.nwu.edu!newsfeed.acns.nwu.edu!news.ece.nwu.edu!news.cse.psu.edu!rutgers!news.sgi.com!nntprelay.mathworks.com!news.mathworks.com!out2.nntp.cais.net!news2.cais.com!news
From: postmaster@[127.0.0.1] (Rick "The Notes Guy" Dickinson)
Newsgroups: comp.sys.sinclair,comp.sys.cbm,comp.emulators.cbm
Subject: Re: Spectrum Emulator for C64 [additonal translation]
Date: Mon, 16 Jun 1997 14:43:54 GMT
Organization: Enterprise ArchiTechs
Lines: 31
Message-ID: <33a64fa5.59106111@news.pacificnet.net>
References: <337C5E94.388@actcom.co.il> <5mmvp4$b7r@news.acns.nwu.edu> <5mvqq7$7qk$6@gerry.cc.keele.ac.uk> <5nn1cj$17j4@ds2.acs.ucalgary.ca> <5npjud$2rv@news.acns.nwu.edu> <5nunbg$t3e@ds2.acs.ucalgary.ca>
Reply-To: rtd-at-notesguy.com
NNTP-Posting-Host: 207.171.22.42
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-Newsreader: Forte Agent 1.01/32.397
Xref: news.acns.nwu.edu comp.sys.sinclair:39987 comp.sys.cbm:69569 comp.emulators.cbm:21725

Sharing the wisdom of the ages with those of us reading
comp.emulators.cbm, "Alvin R. Albrecht"
 wrote:

>On 12 Jun 1997, Stephen Judd wrote:

>> Not really: changing character sets involves moving a pointer. Interrupt
>> overhead is 13 or so cycles in the worst case.  The net overhead involved
>> is wholly trivial -- less than a single raster line.  Every 6 rows is
>> the absolute worst-case scenario (unique custom characters completely 
>> covering the screen), and translates to a 1/50th decrease in speed.
>> This is supposed to be more computationally wasteful than using the
>> processor to update the entire screen bitmap?
>
>Sorry, I misspoke.  I meant you will be "busy-waiting", to borrow a
>term from someone else.  That can take up a significant amount of time
>given that you must do this on every single frame.  The time doesn't
>have to be wasted, of course.  You can use it to do fixed cycle length
>stuff, but that's not always easy or practical.

Not really - on the 64, you just need to set the VIC-II chip (the
video processor) to generate raster interrupts, and then do your
character set swapping in an interrupt routine.  No busy-waiting
involved -- Your main program can make use of every available cycle,
confident that, when it needs to switch character sets, it will
happen.

 - Rick "Still have my copy of 'Mapping the 64' "Dickinson

---
  "Maybe Godel was just ignorant." - Michael Craft, in AFU


Article 69565 of comp.sys.cbm:
Path: news.acns.nwu.edu!merle!judd
From: judd@merle.acns.nwu.edu (Stephen Judd)
Newsgroups: comp.sys.sinclair,comp.sys.cbm,comp.emulators.cbm
Subject: Re: Spectrum Emulator for C64 [additonal translation]
Date: 16 Jun 1997 23:41:19 GMT
Organization: Northwestern University, Evanston, IL
Lines: 105
Message-ID: <5o4iuv$a5s@news.acns.nwu.edu>
References: <337C5E94.388@actcom.co.il> <5nn1cj$17j4@ds2.acs.ucalgary.ca> <5npjud$2rv@news.acns.nwu.edu> <5nunbg$t3e@ds2.acs.ucalgary.ca>
Reply-To: sjudd@nwu.edu (Stephen Judd)
NNTP-Posting-Host: merle.acns.nwu.edu
Xref: news.acns.nwu.edu comp.sys.sinclair:39985 comp.sys.cbm:69565 comp.emulators.cbm:21722

In article <5nunbg$t3e@ds2.acs.ucalgary.ca>,
Alvin R. Albrecht  wrote:
>On 12 Jun 1997, Stephen Judd wrote:
>> I'm not sure if you're attempting to imply anything here, but the 64 bitmap
>> is of course also arranged in byte fashion, so that "we" can of course
>> "plot" 8 pixels at a time as well.
>
>I wasn't sure if the colour was separate or not.

It is, but it is stored in a colormap, the practical effect of which
is that you don't have to update it if you don't want to change
the color.

>> Of course, if there's some sneaky way of doing this, then feel free to
>> post the code.
>
>It doesn't have to be sneaky:
>
>1. Calculate bitmap location for char in A register (DE=address of
>bitmaps):
>2. Draw char on screen at screen address in DE, HL=bitmap
>3.  To put these together doesn't require a lot of glue.  But
>keep in mind the calculations below don't include the glue.
>
>To do the entire 32x24 screen:
>
>32*24*(297+54)=269568 cycles

*Puts eyes back in sockets*
*Shakes his head*
*Closes eyes and shakes head again after looking again*
*Places garlic around neck to ward off evil effects of the statement*

Let me put it this way: do you at least understand that OTHER people
might find a reduction of 270,000 cycles down to 6-8 cycles as an advantage?

>or @3.54MHz=0.0837s=12 fps.

This is not only a useless number, it is a misleading one.  For instance,
what would be the frame rate if the rest of the calculations took 12 frames
to complete?  How about if they could be done at 4fps?

(Hence the use of frames and not fps to describe execution time)

>> Moreover, if a character is in place on the 64's screen, and that character's
>> data is changed -- well, you're done, the character is still in place.
>
>once it's up there, it's up there forever.  So I can, for example,
>print in one font on the screen, change fonts and print in another
>without worrying that the first font changes.

Beep!  Beep!  Beep!

***Irrelevancy alert***

>Your program is directly transferrable to the Spectrum, but it would
>be drawn at a reduced frame rate of say 10-12fps (example above).

Oh dear...

>...
>screen.  The C64 can go this route as well.  I hope the above has
>convinced you that the text generator, though a nice tool, isn't
>really an advantage over a Spectrum which does the same in software
>at decent frame rates.

Oh dear...

>> >one character set per frame.  The C64 isn't either if it changes character
>> >sets after each ~6 lines is drawn, but that's computationally wasteful
>
>> Not really: changing character sets involves moving a pointer. Interrupt
>
>> covering the screen), and translates to a 1/50th decrease in speed.
>> This is supposed to be more computationally wasteful than using the
>> processor to update the entire screen bitmap?
>
>Sorry, I misspoke.  I meant you will be "busy-waiting", to borrow a
>term from someone else.  That can take up a significant amount of time
>given that you must do this on every single frame.  The time doesn't
>have to be wasted, of course.  You can use it to do fixed cycle length
>stuff, but that's not always easy or practical.

I detect a major lack of comprehension.  You can set VIC to trigger an
interrupt on a particular scan line.  I won't belittle your intelligence
by pedantically explaining the significance of that statement.

I did make an error in my calculation above though.  Let's give the
interrupt an extremely conservative full raster line, i.e. we lose
one line every six rows, or 4 lines (5 if you want the last row too)
each frame.  A PAL frame has 312 lines.  Thus the computer is slowed
down to an underwhelming 308/312=.99 times its full speed potential.

>> Of course, the C64 could also just use a normal bitmap, too.  I suppose it
>> would depend on the problem being solved.
>
>Yes, but the 6502 @ 1MHz (with its slower access to full 64k) and the Z80

I can't tell you how much it pains me to hear an undoubtably intelligent
man continuously chant such a patently absurd and meaningless statement.

By the way, have you about finished the Z80 version of the fast
multiply, and the Spectrum version of the string print routine?

-S


Article 69493 of comp.sys.cbm:
Path: news.acns.nwu.edu!newsfeed.acns.nwu.edu!news.ece.nwu.edu!news.cse.psu.edu!news3.cac.psu.edu!howland.erols.net!ais.net!su-news-hub1.bbnplanet.com!su-news-feed4.bbnplanet.com!news.bbnplanet.com!enews.sgi.com!decwrl!tribune.usask.ca!rover.ucs.ualberta.ca!news.ucalgary.ca!srv1.freenet.calgary.ab.ca!albrecht
From: "Alvin R. Albrecht" 
Newsgroups: comp.sys.sinclair,comp.sys.cbm,comp.emulators.cbm
Subject: Re: Spectrum Emulator for C64
Date: Sat, 14 Jun 1997 11:05:50 -0600
Organization: Calgary Free-Net
Lines: 28
Message-ID: <5nuj3d$m28@ds2.acs.ucalgary.ca>
References: <337C5E94.388@actcom.co.il>  <5mli4f$m48@news.acns.nwu.edu> <5nj88j$6f9@yama.mcc.ac.uk> <5npqnq$5o6@news.acns.nwu.edu>
NNTP-Posting-Host: albrecht@srv1.freenet.calgary.ab.ca
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
In-Reply-To: <5npqnq$5o6@news.acns.nwu.edu>
Xref: news.acns.nwu.edu comp.sys.sinclair:39942 comp.sys.cbm:69493 comp.emulators.cbm:21675



On 12 Jun 1997, Stephen Judd wrote:


> What's the difference between JPZ and JRZ (assuming those are
> legal instructions :), in practical terms?  Which is generally
> used/more useful?

A JP (jump) loads the PC (program counter) with a 16 bit address.
A JR (jump relative) adds an 8bit 2's complement offset to PC.

The former takes 3 byte, the latter 2 bytes.

Timing wise, a conditional jump always takes 10 cycles.  A conditional
relative jump takes 12 cycles if taken, 7 if not.  So it's faster to
use a conditional relative jump if the branch is taken with
probability < 0.6 (assuming the branches don't rejoin later on in
which case you have to consider the code fragments in each branch).

Conditional JPs also have more conditions that can be tested.  JRs
help to make things more relocatable.


Alvin





Article 69493 of comp.sys.cbm:
Path: news.acns.nwu.edu!newsfeed.acns.nwu.edu!news.ece.nwu.edu!news.cse.psu.edu!news3.cac.psu.edu!howland.erols.net!ais.net!su-news-hub1.bbnplanet.com!su-news-feed4.bbnplanet.com!news.bbnplanet.com!enews.sgi.com!decwrl!tribune.usask.ca!rover.ucs.ualberta.ca!news.ucalgary.ca!srv1.freenet.calgary.ab.ca!albrecht
From: "Alvin R. Albrecht" 
Newsgroups: comp.sys.sinclair,comp.sys.cbm,comp.emulators.cbm
Subject: Re: Spectrum Emulator for C64
Date: Sat, 14 Jun 1997 11:05:50 -0600
Organization: Calgary Free-Net
Lines: 28
Message-ID: <5nuj3d$m28@ds2.acs.ucalgary.ca>
References: <337C5E94.388@actcom.co.il>  <5mli4f$m48@news.acns.nwu.edu> <5nj88j$6f9@yama.mcc.ac.uk> <5npqnq$5o6@news.acns.nwu.edu>
NNTP-Posting-Host: albrecht@srv1.freenet.calgary.ab.ca
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
In-Reply-To: <5npqnq$5o6@news.acns.nwu.edu>
Xref: news.acns.nwu.edu comp.sys.sinclair:39942 comp.sys.cbm:69493 comp.emulators.cbm:21675



On 12 Jun 1997, Stephen Judd wrote:


> What's the difference between JPZ and JRZ (assuming those are
> legal instructions :), in practical terms?  Which is generally
> used/more useful?

A JP (jump) loads the PC (program counter) with a 16 bit address.
A JR (jump relative) adds an 8bit 2's complement offset to PC.

The former takes 3 byte, the latter 2 bytes.

Timing wise, a conditional jump always takes 10 cycles.  A conditional
relative jump takes 12 cycles if taken, 7 if not.  So it's faster to
use a conditional relative jump if the branch is taken with
probability < 0.6 (assuming the branches don't rejoin later on in
which case you have to consider the code fragments in each branch).

Conditional JPs also have more conditions that can be tested.  JRs
help to make things more relocatable.


Alvin





Article 69402 of comp.sys.cbm:
Path: news.acns.nwu.edu!merle!judd
From: judd@merle.acns.nwu.edu (Stephen Judd)
Newsgroups: comp.sys.sinclair,comp.sys.cbm,comp.emulators.cbm
Subject: Re: Spectrum Emulator for C64 (LONG!)
Date: 14 Jun 1997 01:52:54 GMT
Organization: Northwestern University, Evanston, IL
Lines: 23
Message-ID: <5nsthm$dag@news.acns.nwu.edu>
References: <337C5E94.388@actcom.co.il> <33a317c0.5107866@news.blarg.net> <5nmcjt$lk2@news.acns.nwu.edu> <33A0A197.65AD@erols.com>
Reply-To: sjudd@nwu.edu (Stephen Judd)
NNTP-Posting-Host: merle.acns.nwu.edu
Xref: news.acns.nwu.edu comp.sys.sinclair:39860 comp.sys.cbm:69402 comp.emulators.cbm:21613

In article <33A0A197.65AD@erols.com>, Greg King   wrote:
>Stephen Judd wrote:
>> An RTI (return from interrupt) removes the processor status and PC
>> from the stack, and takes six cycles to execute.
>
>The REAL meanings of some 6502 mnemonics:
>
>JSR -> Jump and Save Return-address
>RTS -> Return To Saved-address
>RTI -> Return To Interrupted-... ("..." is either "address" or "op-code,"
>       I don't know which)
>
>

Heh, interesting... where did you get these, btw?  The PRG says
RTI is Return From Interrupt, and I thought they just copied the
data sheets.

(JSR, which I always read as Jump To Subroutine, is in fact listed
as "Jump to new location saving return address", but RTS is listed
as Return From Subroutine)

_Steve


Article 69424 of comp.sys.cbm:
Path: news.acns.nwu.edu!newsfeed.acns.nwu.edu!news.ece.nwu.edu!news.cse.psu.edu!uwm.edu!chi-news.cic.net!howland.erols.net!newsfeed.internetmci.com!in1.uu.net!194.119.128.129!news.u-net.com!not-for-mail
From: jmacb@medusa.u-net.com (Jim MacBrayne)
Newsgroups: comp.sys.sinclair,comp.sys.cbm,comp.emulators.cbm
Subject: Re: Spectrum Emulator for C64 (LONG!)
Date: Sat, 14 Jun 1997 08:32:31 GMT
Organization: U-NET Ltd
Lines: 36
Message-ID: <33a24f16.55123964@news.u-net.com>
References: <337C5E94.388@actcom.co.il> <33a317c0.5107866@news.blarg.net> <5nmcjt$lk2@news.acns.nwu.edu> <33A0A197.65AD@erols.com> <5nsthm$dag@news.acns.nwu.edu>
Reply-To: jmacb@medusa.u-net.com
NNTP-Posting-Host: medusa.u-net.com
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-Newsreader: Forte Agent 1.0/32.390
Xref: news.acns.nwu.edu comp.sys.sinclair:39878 comp.sys.cbm:69424 comp.emulators.cbm:21629

On 14 Jun 1997 01:52:54 GMT, judd@merle.acns.nwu.edu (Stephen Judd)
uttered these words of wisdom:

>Heh, interesting... where did you get these, btw?  The PRG says
>RTI is Return From Interrupt, and I thought they just copied the
>data sheets.
>
>(JSR, which I always read as Jump To Subroutine, is in fact listed
>as "Jump to new location saving return address", but RTS is listed
>as Return From Subroutine)

Those were the days! I had a 32K Commodore PET away back in 1980, and
my happiest memories are of diving below the bonnet and rewriting the
ROMs. There was quite a lot of free space, and you could patch in your
own routines by burning EPROMs - I remember I added 32 extra commands
to the built-in BASIC, including a fast multiple array tagged sort.
(Goes all nostalgic).

Anyway, that's all BTW. I no longer program in assembly, but I still
have a few of my old books. In "Programming the 64" by Rae West,  JSR
is indeed listed as "Jump to new location saving return address."
Like yourself, however, to me it was always "Jump to subroutine," and
I'm sure that was what it was listed as in my original Commodore book.
Unfortunately I no longer seem to have this.

RTI is unquestionably "return from interrupt" and RTS is "return from
subroutine."

Jim

-----------------------------------------
Jim MacBrayne
jmacb@medusa.u-net.com
http://www.medusa.u-net.com/jmacb.htm
CIS 100411,461
-----------------------------------------


Article 69597 of comp.sys.cbm:
Path: news.acns.nwu.edu!newsfeed.acns.nwu.edu!news.ece.nwu.edu!news.cse.psu.edu!uwm.edu!news.he.net!news.maxwell.syr.edu!howland.erols.net!newsfeed.internetmci.com!news.telstra.net!act.news.telstra.net!news.interact.net.au!not-for-mail
From: Spit@Spam-Free.UUCP
Newsgroups: comp.sys.sinclair,comp.sys.cbm,comp.emulators.cbm
Subject: Re: Spectrum Emulator for C64 (LONG!)
Followup-To: comp.sys.sinclair,comp.sys.cbm,comp.emulators.cbm
Date: Tue, 17 Jun 97 05:17:30
Organization: Square-eyed keyboard jockeys inc.
Lines: 22
Message-ID: <19970617.119920.5038@Spam-Free.UUCP>
References: <337C5E94.388@actcom.co.il> <5nhvru$ote@yama.mcc.ac.uk> <11212.imc@comlab.ox.ac.uk> <33a317c0.5107866@news.blarg.net> <5nmcjt$lk2@news.acns.nwu.edu> <33A0A197.65AD@erols.com>
NNTP-Posting-Host: ts5-11.interact.net.au
X-Newsreader: TIN [AMIGA 1.3 950726BETA PL0]
Xref: news.acns.nwu.edu comp.sys.sinclair:40008 comp.sys.cbm:69597 comp.emulators.cbm:21743

Greg King (gngking@erols.com) wrote:
> Stephen Judd wrote:
> > An RTI (return from interrupt) removes the processor status and PC
> > from the stack, and takes six cycles to execute.
> The REAL meanings of some 6502 mnemonics:
> 
> JSR -> Jump and Save Return-address
> RTS -> Return To Saved-address
> RTI -> Return To Interrupted-... ("..." is either "address" or "op-code,"
>        I don't know which)
The *REALLY REAL* meanings of some 6502 mnemonics:

JSR -> 00100000
RTS -> 01100000
RTI -> 01000000
--

+-\___  ___  ______   __ __/=\=/=\=/=\=/=\=/=\=/=\=/=\=/=\=/=\=/=\=/=\-+
: / __)| _ \||_   _| /__/_/ "Bunch of savages in this town..." - Dante :
|:__  \:  _:: :: :   @# '') Spammers, E-mail me for my real address... |
`(____/|_|><|_||_|><><\__3- - -* <><><><><><><><><><><><><><><><><><><>'



Article 69574 of comp.sys.cbm:
Path: news.acns.nwu.edu!newsfeed.acns.nwu.edu!news.ece.nwu.edu!news.cse.psu.edu!rutgers!news.columbia.edu!panix!howland.erols.net!agate!nntpfeed.doc.ic.ac.uk!sunsite.doc.ic.ac.uk!lyra.csx.cam.ac.uk!news.ox.ac.uk!news
From: imc@ecs.ox.ac.uk (Ian Collier)
Newsgroups: comp.sys.sinclair,comp.sys.cbm,comp.emulators.cbm
Subject: Re: Spectrum Emulator for C64 (LONG!)
Date: 16 Jun 1997 15:59:04 GMT
Organization: Oxford University Computing Laboratory, UK
Lines: 10
Message-ID: <11278.imc@comlab.ox.ac.uk>
References: <337C5E94.388@actcom.co.il> <11212.imc@comlab.ox.ac.uk> <33a317c0.5107866@news.blarg.net> <5nmcjt$lk2@news.acns.nwu.edu>
NNTP-Posting-Host: boothp2.ecs.ox.ac.uk
X-Local-Date: Monday, 16th June 1997 at 4:59pm BST
Xref: news.acns.nwu.edu comp.sys.sinclair:39989 comp.sys.cbm:69574 comp.emulators.cbm:21726

In article <5nmcjt$lk2@news.acns.nwu.edu>, sjudd@nwu.edu (Stephen Judd) wrote:
>What do the EXX type instructions do?  Exchange the registers with
>an alternate set

Yes.  So if you are not using the alternate set then you can do EXX to save
the registers and another EXX to get them back.  Obviously, if you try to
do nested interrupts you lose.
-- 
---- Ian Collier : imc@comlab.ox.ac.uk : WWW page (including Spectrum section):
------ http://www.comlab.ox.ac.uk/oucl/users/ian.collier/imc.html


Article 69640 of comp.sys.cbm:
Path: news.acns.nwu.edu!newsfeed.acns.nwu.edu!news.ece.nwu.edu!news.cse.psu.edu!uwm.edu!vixen.cso.uiuc.edu!ais.net!newsfeed.direct.ca!dispatch.news.demon.net!demon!peernews.ftech.net!telehouse1.frontier-networks.co.uk!Aladdin!aladdin.net!ns2.aladdin.net!RMplc!rmplc.co.uk!yama.mcc.ac.uk!keele!not-for-mail
From: u5a77@teach.cs.keele.ac.uk (Spike)
Newsgroups: comp.sys.sinclair,comp.sys.cbm,comp.emulators.cbm
Subject: Re: Spectrum Emulator for C64
Followup-To: comp.sys.sinclair,comp.sys.cbm,comp.emulators.cbm
Date: 9 Jun 1997 13:28:08 GMT
Distribution: world
Message-ID: <5nh0d8$k0b$5@gerry.cc.keele.ac.uk>
References: <337C5E94.388@actcom.co.il> <5li6cc$mbv$8@gerry.cc.keele.ac.uk> <9705261105.AA00fms@cosine.demon.co.uk> <01bc6a65$46f2f520$04b8de8b@w9622136> <9705280224.AA00fp5@cosine.demon.co.uk> <338e6df1.3779653@news.demon.co.uk> <9705282217.AA00fpp@cosine.demon.co.uk> <338fbfc3.1385636@news.demon.co.uk> <5mkj88$iuc$1@gerry.cc.keele.ac.uk> <339955b8.5215718@news.demon.co.uk> <4888b69647@hallas.demon.co.uk>
NNTP-Posting-Host: bilbo.teach.cs.keele.ac.uk
X-Newsreader: TIN [UNIX 1.3 950515BETA PL0]
Lines: 33
Xref: news.acns.nwu.edu comp.sys.sinclair:40037 comp.sys.cbm:69640 comp.emulators.cbm:21769

Richard G. Hallas (Richard@hallas.demon.co.uk) wrote:
: > Lots. No way could you do it full screen. Maybe in an eighth of it
: > :)
: 
: On a Microdrive cartridge somewhere I've got a very simple Basic demo
: I wrote which made use of the Rainbow Processor at full size. It
: prints "Who needs a C64?" on the screen, and scrolls colour bars both
: behind and in front of the text.

Yes, we know. Most speccy users have Rainbow Processor.
The point is, that the Colour Bars are HORIZONTAL.
What was proposed here was pixel mapped colour, not just scan-line
mapped.....

: I'll have to see if I can retrieve it at some point. It was quite
: pretty. But the point is, of course, that the Rainbow effect could be
: applied to a surprisingly large portion of the screen - at least a
: third.

It could be applied to ALL the screen, but as has been said, that's just
horizonal lines....

-- 
______________________________________________________________________________
|u5a77@teach.cs.keele.ac.uk| "I'm alive!!! I can touch! I can taste!         |
|Andrew Halliwell          |  I can SMELL!!!  KRYTEN!!! Unpack Rachel and    |
|Principal subjects in:-   |  get out the puncture repair kit!"              |
|Comp Sci & Electronics    |     Arnold Judas Rimmer- Red Dwarf              |
------------------------------------------------------------------------------
|GCv3.1 GCS/EL>$ d---(dpu) s+/- a- C++ U N++ o+ K- w-- M+/++ PS+++ PE- Y t+  |
|5++ X+/++ R+ tv+ b+ D G e>PhD h/h+ !r! !y-|I can't say F**K either now! :(  |
------------------------------------------------------------------------------



Article 69580 of comp.sys.cbm:
Path: news.acns.nwu.edu!newsfeed.acns.nwu.edu!news.ece.nwu.edu!news.cse.psu.edu!rutgers!newsin.iconnet.net!www.nntp.primenet.com!nntp.primenet.com!cs.utexas.edu!news.maxwell.syr.edu!howland.erols.net!agate!nntpfeed.doc.ic.ac.uk!sunsite.doc.ic.ac.uk!lyra.csx.cam.ac.uk!news.ox.ac.uk!news
From: imc@ecs.ox.ac.uk (Ian Collier)
Newsgroups: comp.sys.sinclair,comp.sys.cbm,comp.emulators.cbm
Subject: Re: Shootout at the 0K Corral (was various other things)
Date: 16 Jun 1997 16:33:38 GMT
Organization: Oxford University Computing Laboratory, UK
Lines: 16
Message-ID: <11280.imc@comlab.ox.ac.uk>
References: <337C5E94.388@actcom.co.il> <5nag9i$7ru@ds2.acs.ucalgary.ca> <5ngfdo$79r@news.acns.nwu.edu> <5njrbj$k7k@news.acns.nwu.edu>
NNTP-Posting-Host: boothp2.ecs.ox.ac.uk
X-Local-Date: Monday, 16th June 1997 at 5:33pm BST
Xref: news.acns.nwu.edu comp.sys.sinclair:39994 comp.sys.cbm:69580 comp.emulators.cbm:21733

In article <5njrbj$k7k@news.acns.nwu.edu>, sjudd@nwu.edu (Stephen Judd) wrote:
>Also, how does CPIR work?  I.e. what does it do?  I assume it will not
>execute if the character is already zero?

CPIR =
 compare A with contents of location pointed to by HL
 increment HL
 decrement BC
 if the comparison gave "equal" or BC=0 then stop, else repeat from start.

At the end of this the flags are set according to the result of the
comparison operation, except that the parity/overflow flag is used to
indicate whether or not BC=0.
-- 
---- Ian Collier : imc@comlab.ox.ac.uk : WWW page (including Spectrum section):
------ http://www.comlab.ox.ac.uk/oucl/users/ian.collier/imc.html


Article 69548 of comp.sys.cbm:
Path: news.acns.nwu.edu!newsfeed.acns.nwu.edu!news.ece.nwu.edu!news.cse.psu.edu!uwm.edu!news.he.net!news.pagesat.net!decwrl!tribune.usask.ca!rover.ucs.ualberta.ca!news.ucalgary.ca!srv1.freenet.calgary.ab.ca!albrecht
From: "Alvin R. Albrecht" 
Newsgroups: comp.sys.sinclair,comp.sys.cbm,comp.emulators.cbm
Subject: Re: Spectrum Emulator for C64
Date: Sun, 15 Jun 1997 12:24:19 -0600
Organization: Calgary Free-Net
Lines: 148
Message-ID: <5o1c2n$12lc@ds2.acs.ucalgary.ca>
References: <33845f94.1768387@commodore64.com><3389cd28.1468228@commodore64.com>   <5mct38$ql1$2@gerry.cc.keele.ac.uk> <01bc71c8$048eda40$090000c0@pc-david> <01bc724c$cb265820$04b8de8b@w9622136> <01bc74ba$c7e2cd40$090000c0@pc-david> <5nhubi$13l0@ds2.acs.ucalgary.ca> 
Reply-To: "Alvin R. Albrecht" 
NNTP-Posting-Host: albrecht@srv1.freenet.calgary.ab.ca
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
In-Reply-To: 
Xref: news.acns.nwu.edu comp.sys.sinclair:39966 comp.sys.cbm:69548 comp.emulators.cbm:21701



On Wed, 11 Jun 1997, Bruce R. McFarling wrote:

> 	The 6502 always fetches two bytes per instruction, so the opcode

Pre-fetch, very nice.

> 	If it seems as if the 6502 is microcoded, it's because of the
> occasional overlapping of final access and fetch of the next op-code, 
> and because the first two bytes of an instruction are going to known
> destinations, so the 6502 can be designed around direct memory bussing.

Yep.

> > The z80 is hardwired and lets stuff get done as fast as possible.  The
> > clock is used to gate data at points in the chip where you have to wait
 
> 	The Z80 is not, AFAIR recall (and the last time I looked at a
> Zilog data book was the mid-80's!) directly bussed to memory. The chip is
> organised around registers talking on an internal bus, including the
> data address and data memory registers that let it talk to the memory bus.

I agree but that doesn't mean the z80 isn't microcoded.  The biggest
suggestion otherwise is the presence of undocumented instructions.
A microcoded machine won't (shouldn't?) have them.
 
> > 6502 versions: 1MHz, 2MHz, 3MHz, 4MHz
> > z80 versions: 2MHz, 4MHz, 6MHz, 8MHz

> > You can see that at the same level of technology, the z80 is always
> > clocked twice as fast.  Thus, to compare the 6502 & z80 cycle time, you

> 	The 4:1 was the rule of thumb way back when.  Obviously, you can
> find multiply intensive tasks that will take it closer to 2:1, just as you
> can find arbitrary precision arithmatic that will take it closer to 6:1.

I'm expecting a 2:1 cycle ratio for tasks (z80:6502) most of the time and
at other times as low as 1.5:1 or maybe less.

> The problem with the 'state of technology' argument is that the 6502 is a
> simpler chip than the Z80, and as a direct memory bussed chip, it was also
> limited by the speed of memory. So today we have 20MHz 65C02's available,

Perhaps, but the speed of contemporary memory does not have an effect
on maximum clock rates recorded on data sheets.  It may have been
a consideration in the early '80s when you had to make a selection for
some application, but to compare the merits of each processor I think
it's completely logical to compare the processors when using the same
state of technology.

> but I'm not sure there are 40MHz Z80's available. Are there?  While the
> biggest selling 65C02's presently are probably 4MHz, since that can use
> DRAMs instead of SRAMs. 

The fastest plain Z80 is 20MHz.  I see your point:  the z80 may not
be pursued as vigorously as before since Zilog has its new range of
improved z80s (z180/380) and the 6502 may not have been pursued as
vigorously in the past because of the memory problem so how do we
know when the two chips are using the same on chip device geometries?

I still believe it's 2:1, but I won't hold it as gospel.
  
> > little war stated that the Spectrum can do what the C64 is doing in
> > software because of its quick z80 processor.  There are certain
 
> 	Generically, a 3.4MHz Z80 and a 1MHz 6502 are not far apart.  The

Well, at a 2:1 cycle ratio, a 3.54MHz z80 will be 1.77 times faster than a
1MHz 6502.  77% faster is significant.  I'm not so sure about the 4:1 and
6:1 ratios you gave: I haven't seen the apps yet to prove this.  Arbitrary
precision arithmetic is one that I believe will be more than 2:1 but not
much more and I think you'd be hard-pressed to find many apps that fall
into this category.

In graphics applications, I expect the z80 to be even faster than 2:1
because these apps require fast access to the full 64k in a way that's
not necessarily regular.

> devil is in the details.  To be specific, the lack of a 16-bit index
> register in the 6502, and the slow handling of 16-bit arithmatic on the
> 6502's zero-page.

That's what I call the merits of each processor.  The 6502 has a few as
well:  prefetch, many pseudo-registers in zero page, fast branching
(related to prefetch), cheaper.

> 	But that ignores the primary advantage of the 6502 design. It's
> simpler, therefore cheaper.  After all, that's why there are still so many
> being produced today within ASIC chips: with less than 4,000 gates
> required for the 6502 core, there is plenty of room left over to put other
> things on the same chip mask, and the expense of multi-chip packages can

I agree, the 6502 is still around for good reasons.

> be avoided.  The C64 advantage was the same: which a cheap processor, it
> could put more money into hardware support at a given price, which helped

Agreed there, but the extra hardware wasn't so extraordinary that it
couldn't be partly or mostly emulated in software on other machines in
the C64's class.

> it sell more, which gave it the user base, which convinced programmers to
> make the effort to work around its weaknesses and turn out useful
> programs, which built the user base, which got volumes up and prices down,
> and around and around you go.  Litl Jo (my first C64) was around $1,000

Commodore was very sharp: they understood early on that the home computer
market would become a games market and incorporated hardware in the C64 to
take advantage of that.  When the market turned to games, the C64 easily
beat out other competitors w/o the hardware because it was difficult to
make games for those machines (which required time).  In the UK, the
Spectrum had a significant market share before the C64 arrived there so
it was able to hang on long enough for the software writers to develop
the skills needed to write games w/o hardware available.

> for the main unit and a disk drive -- cheapest computer with a disk crive
> I could find, and with 64K RAM to boot!  As the years went by, the price

I remember the C64 as one of the most expensive rigs around and in
particular, in the UK, the price was almost double the Spectrum's.

For example, my US 48K Spectrum (ts2068) has the following features:
1. Cartridge dock (up to 64k)
2. Two joysticks
3. Four video modes
   - 256x192 pixel, 32x24 colour (UK Spectrum mode)
   - Dual Spectrum screen for double-buffering
   - 256x192 pixel, 32x192 colour (called Rainbow processing when done
     in software)
   - 512x192 pixel monochrome
4. 24k ROM, 48K RAM
5. Hardware for unlimited memory expansion
6. AY-3-8912 chip for sound (Atari ST, 128k Spectrum, early arcade
   machines)
7. z80 @ 3.53 MHz

It retailed for $200 at a time (1983) when the C64 was selling for $300.

There were many machines w/ disk drives less expensive than a C64 w/
disk drive.  For $400 I could get a ts2068+disk interface using IBM
PC drives.


Alvin





Article 69566 of comp.sys.cbm:
Path: news.acns.nwu.edu!merle!judd
From: judd@merle.acns.nwu.edu (Stephen Judd)
Newsgroups: comp.sys.sinclair,comp.sys.cbm,comp.emulators.cbm
Subject: Re: Spectrum Emulator for C64
Date: 17 Jun 1997 00:01:42 GMT
Organization: Northwestern University, Evanston, IL
Lines: 49
Message-ID: <5o4k56$aht@news.acns.nwu.edu>
References: <33845f94.1768387@commodore64.com> <5nhubi$13l0@ds2.acs.ucalgary.ca>  <5o1c2n$12lc@ds2.acs.ucalgary.ca>
Reply-To: sjudd@nwu.edu (Stephen Judd)
NNTP-Posting-Host: merle.acns.nwu.edu
Xref: news.acns.nwu.edu comp.sys.sinclair:39986 comp.sys.cbm:69566 comp.emulators.cbm:21723

In article <5o1c2n$12lc@ds2.acs.ucalgary.ca>,
Alvin R. Albrecht  wrote:
>On Wed, 11 Jun 1997, Bruce R. McFarling wrote:

[a.k.a. Bruce The Wise And Experienced]

>
>> 	The 4:1 was the rule of thumb way back when.  Obviously, you can
>> find multiply intensive tasks that will take it closer to 2:1, just as you
>> can find arbitrary precision arithmatic that will take it closer to 6:1.
>
>I'm expecting a 2:1 cycle ratio for tasks (z80:6502) most of the time and
>at other times as low as 1.5:1 or maybe less.
>...
>
>I still believe it's 2:1, but I won't hold it as gospel.

Heavens.

Normally I'd be happy to argue, but we have a nice way of demonstrating this
in a rough approximation to real-world applications: the coding challenges.

There's still the fast multiply and string print routine to be done,
not to mention the (*LAUGH* :) "software sprite routine".  I don't
really expect the third to ever make an appearance, but I'd certainly
like to see the other two, which are very short programs, very
useful programs, and very easy to write programs.

The claims about the merits of the Z80 over the 6510 and the Spectrum over
the C64 are so repetitious and insistent that I would think you'd jump at 
the chance to conclusively demonstrate their true merit, in a form plain
for people all over the world to see.

I will now engage in silent contemplation on the wisdom and insight
of some remaining statements.

>[arbitrary precision arithmetic might be faster than 2:1 but not]
>much more and I think you'd be hard-pressed to find many apps that fall
>into this category.

>In graphics applications, I expect the z80 to be even faster than 2:1
>because these apps require fast access to the full 64k in a way that's
>not necessarily regular.

>Agreed there, but the extra hardware wasn't so extraordinary that it
>couldn't be partly or mostly emulated in software on other machines in
>the C64's class.

-S


Article 69638 of comp.sys.cbm:
Path: news.acns.nwu.edu!newsfeed.acns.nwu.edu!news.ece.nwu.edu!news.cse.psu.edu!news3.cac.psu.edu!howland.erols.net!news.maxwell.syr.edu!eerie.fr!cnusc.fr!ciril.fr!univ-angers.fr!jussieu.fr!rain.fr!easynet-fr!easynet-buggy!usenet
From: "David Virebayre" 
Newsgroups: comp.sys.sinclair,comp.sys.cbm,comp.emulators.cbm
Subject: Re: Spectrum Emulator for C64
Date: 17 Jun 1997 08:06:44 GMT
Organization: [posted via] Easynet SA
Lines: 18
Message-ID: <01bc7af4$077be420$090000c0@pc-david>
References: <33845f94.1768387@commodore64.com> <5nhubi$13l0@ds2.acs.ucalgary.ca>  <5o1c2n$12lc@ds2.acs.ucalgary.ca> <5o4k56$aht@news.acns.nwu.edu>
NNTP-Posting-Host: k-m-net.easynet.fr
X-Newsreader: Microsoft Internet News 4.70.1155
Xref: news.acns.nwu.edu comp.sys.sinclair:40033 comp.sys.cbm:69638 comp.emulators.cbm:21765

Stephen Judd  a �crit dans l'article
<5o4k56$aht@news.acns.nwu.edu>...
> There's still the fast multiply and string print routine to be done,
> not to mention the (*LAUGH* :) "software sprite routine".  I don't

That routine will have an advantage over the c64 hardware sprites: on the
c64, you're INCREDIBLY faster than anything, *UP TO 8 SPRITES*

And what if someone challenges you to display 32 sprites ? You'll have to
do 24 of them by software, do you think the C64 will still be faster ?

Sometimes, an hardware advantage may be also a limitation !

-- 

David
viro@easynet.fr



Article 69642 of comp.sys.cbm:
Path: news.acns.nwu.edu!newsfeed.acns.nwu.edu!math.ohio-state.edu!howland.erols.net!cpk-news-hub1.bbnplanet.com!news.bbnplanet.com!news-peer.sprintlink.net!news-pull.sprintlink.net!news-in-east.sprintlink.net!news.sprintlink.net!Sprint!206.172.150.11!news1.bellglobal.com!bellglobal.com!not-for-mail
From: Robin Harbron 
Newsgroups: comp.sys.sinclair,comp.sys.cbm,comp.emulators.cbm
Subject: Re: Spectrum Emulator for C64
Date: Tue, 17 Jun 1997 10:00:22 -0400
Organization: Arkanix Labs
Lines: 25
Message-ID: <33A69876.4E02@tbaytel.net>
References: <33845f94.1768387@commodore64.com> <5nhubi$13l0@ds2.acs.ucalgary.ca>  <5o1c2n$12lc@ds2.acs.ucalgary.ca> <5o4k56$aht@news.acns.nwu.edu> <01bc7af4$077be420$090000c0@pc-david>
Reply-To: macbeth@tbaytel.net
NNTP-Posting-Host: 206.47.150.198
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-Mailer: Mozilla 3.01Gold (Win95; I)
Xref: news.acns.nwu.edu comp.sys.sinclair:40043 comp.sys.cbm:69642 comp.emulators.cbm:21772

David Virebayre wrote:

> That routine will have an advantage over the c64 hardware sprites: on the
> c64, you're INCREDIBLY faster than anything, *UP TO 8 SPRITES*
> 
> And what if someone challenges you to display 32 sprites ? You'll have to
> do 24 of them by software, do you think the C64 will still be faster ?

32 hardware sprites are no problem _IF_ no more than 8 have to be
on the same horizontal line... this isn't a problem often.  With
a minimal overhead (we're talking a few percent) the hardware
sprites can be reused as the screen is redrawn from top to bottom.

> Sometimes, an hardware advantage may be also a limitation !

No, as has been stressed MANY times, the hardware advantage many
HAVE limitations, but that in itself ISN'T a limitation.  The
hardware is merely one more tool in the toolbox that can be used
if it fits the job.  And take a real good look at C64 games...
go play Mayhem in Monsterland if you want to see what a 64 can
do, and I'll gladly look at anything on a Speccy that you claim
can compare.

Robin Harbron
macbeth@tbaytel.net


Article 69702 of comp.sys.cbm:
Path: news.acns.nwu.edu!newsfeed.acns.nwu.edu!news.ece.nwu.edu!news.cse.psu.edu!rutgers!usenet.logical.net!news.mathworks.com!europa.clark.net!dispatch.news.demon.net!demon!mail2news.demon.co.uk!not-for-mail
From: Jason 
Newsgroups: comp.sys.sinclair,comp.sys.cbm,comp.emulators.cbm
Subject: Re: Spectrum Emulator for C64
Followup-To: comp.sys.sinclair,comp.sys.cbm,comp.emulators.cbm
Date: Tue, 17 Jun 97 21:40:24 GMT
Organization: Cosine Systems
Message-ID: <9706172140.AA00ge1@cosine.demon.co.uk>
References: <33845f94.1768387@commodore64.com> <5nhubi$13l0@ds2.acs.ucalgary.ca>  <5o1c2n$12lc@ds2.acs.ucalgary.ca> <5o4k56$aht@news.acns.nwu.edu> <01bc7af4$077be420$090000c0@pc-david>
X-Mail2News-User: tmr@cosine.demon.co.uk
X-Mail2News-Path: relay-1.mail.demon.net!gate.demon.co.uk!cosine.demon.co.uk
X-Newsreader: TIN [AMIGA 1.3 950726BETA PL0]
Lines: 56
Xref: news.acns.nwu.edu comp.sys.sinclair:40070 comp.sys.cbm:69702 comp.emulators.cbm:21805

Stephen Judd:
> There's still the fast multiply and string print routine to be done,
> not to mention the (*LAUGH* :) "software sprite routine".

David Virebayre:
> That routine will have an advantage over the c64 hardware sprites: on the
> c64, you're INCREDIBLY faster than anything, *UP TO 8 SPRITES*

Well, the limit is eight on a *rasterline*.  After that...

> And what if someone challenges you to display 32 sprites ? You'll have to
> do 24 of them by software, do you think the C64 will still be faster ?

Yes.  I'd like to introduce you to my thirty two sprite presorted multiplex.

Now it's not the most efficient or fastest plex on the face ot the planet,
but when this "discussion" began I sat down and coded it to prove a point.
On my screen now I have the Wizball loading picture, from the Speccy
version.  Over that I have thirty two sprites, all 24*21 pixels, spinning
around using a couple of cosine (I prefer them to sine) curves.  There are
about twelve different sprite colours in use, no two consecutive sprites
have the same colour, there's no clash between the sprites and the picture
or indeed between sprites, the whole screen is at 320*200 resolution, same
as a Speccy, but slightly higher and it all runs at framerate, 50Hz and
although I haven't tested it yet, it should be fine at 60Hz too.

I've said this about three or four times now.  If *you* think that software
sprites are better than hardware, then *beat* this routine.  Put thirty
two on the screen, don't have colour clash (and the sprites are 24*21
pixels in size, but being a generous person I'll let you get away with
16*16) and do it at *full* framerate, 50Hz.  So far nobody has even
*commented* on it, let alone taken my offer...

Basically, come and have a go if you think you're hard enough. =-)

Oh, if anyone *should* manage that, and if they do I won't believe them
until I *see* the thing running on an emu (my code will work on C64S, as
well) then I'll happily up it to 64 sprites.

> Sometimes, an hardware advantage may be also a limitation !

Limitations are just there to allow us to figure out ways around them.
The record, BTW, stands at 120 sprites over a bitmap and 144 in all borders.
Both done by Crossbow/Crest.  My 32 sprite code is crap by comparison, and
I don't claim it to be otherwise, but it proves this point, which is what
it was coded to do.
--
Jason  =-)
     _______________________________________________________________________
TMR /     /     /     /  /     /     /                                     /\
   /  /__/  /  /  /__/  /  /  /  /__/    Email: tmr@cosine.demon.co.uk    / /
  /  /\_/  /  /__   /  /  /  /  __//          Cosine Homepage:           / /
 /  /__/  /  /  /  /  /  /  /  /  /    http://www.cosine.demon.co.uk    / /
/_____/_____/_____/__/__/__/_____/_____________________________________/ /
\_____\_____\_____\__\__\__\_____\_____________________________________\/



Article 69484 of comp.sys.cbm:
Path: news.acns.nwu.edu!merle!judd
From: judd@merle.acns.nwu.edu (Stephen Judd)
Newsgroups: comp.sys.cbm,comp.sys.sinclair,comp.emulators.cbm
Subject: That Z80 line routine
Date: 16 Jun 1997 04:06:11 GMT
Organization: Northwestern University, Evanston, IL
Lines: 131
Message-ID: <5o2e3j$f5p@news.acns.nwu.edu>
References: <5o00p9$gi9@news.acns.nwu.edu>
Reply-To: sjudd@nwu.edu (Stephen Judd)
NNTP-Posting-Host: merle.acns.nwu.edu
Xref: news.acns.nwu.edu comp.sys.cbm:69484 comp.sys.sinclair:39929 comp.emulators.cbm:21670

Hola,

	Well, I finally got around to checking out Ian's line
routine, and I have a few comments/questions:

> From: imc@ecs.ox.ac.uk (Ian Collier)
> Newsgroups: comp.sys.sinclair,comp.sys.cbm,comp.emulators.cbm
> Subject: Re: Shootout at the 0K Corral (was various other things)
> Date: 10 Jun 1997 14:34:16 GMT
> 
> 
> co-ordinates isn't a particularly fast operation on the spectrum.  It's
> something like this, but not exactly since I've taken out the range
> check and optimised memory use.  Not that this is the best possible,
> by any means.

Well, it looks pretty good to me.  I'll give a freebie and suggest
a few changes to make it faster, too :).  But I like the way it
uses HL to contain the x and y coordinates, and the use of the
(handy) alternate registers.  It's also nice to be able to take
care of the four cases of sign in one swell foop (I use separate
routines for, e.g. lines with slope > 1 and slope < 1).

How much would it change if the x coordinate could be 16 bits, i.e. could
address the full width of the bitmap?

> 
> Draw:  ; (L,H) is one end of the line.  If (M,N) is the other end then
>        ; C = abs(N-H), E = sgn(N-H), B = abs(M-L) and D = sgn(M-L).
> 
> ; The idea is that the line will consist of vertical (or horizontal)
> ; moves and diagonal moves.  Adding HL+DE will make a diagonal move and
> ; adding HL+BC will make a vertical move.  Strictly, we want to add
> ; H+D and L+E, etc, but assuming endpoints lie on the screen a 16-bit
> ; add will be correct providing we decrement D when E is negative.
> 
>       PUSH BC      ;11       these values will be moved over to the
>                    ;         alternate register set.
>       LD   B,D     ;4
>       BIT  7,E     ;8
>       JR   Z,L1    ;12/7
>       DEC  D       ;4        decrement D if E is negative.
> L1:   EXX          ;4        Switch to the alternate register set.
>       POP  BC      ;10
>       LD   A,C     ;4        Compare the absolute differences.  We will
>       CP   B       ;4        move the greater one to B as a counter and
>       JR   NC,Horz ;12/7     the lesser one to L.  Go if C is greater.

This looks backwards to be.  At least, on a 6510, if C>B then carry will
be set, not clear.  (Or does NC not stand for "No Carry Set"?  The
fact that you have both a flag and a register named C confuses me to
no end :).

So if dy>dx, the below sets L to dy, the greater of the two, and
as you say you really want the lesser of the two.

>       LD   L,C     ;4
>       EXX          ;4
>       XOR  A       ;4        The vertical step is made by setting C=0.
>       LD   C,A     ;4        B already contains the appropriate value
>       JP   L2      ;10       from above.
> Horz: LD   L,B     ;4
>       LD   B,C     ;4
>       EXX          ;4
>       LD   C,E     ;4        The horizontal step is made by setting C=E
>       XOR  A       ;4        and B=0.
>       LD   B,A     ;4
>       BIT  7,C     ;8
>       JR   Z,L2    ;12/7
>       DEC  B       ;4        Decrement B if C is negative.
> L2:   EXX          ;4
>       LD   H,B     ;4
>       LD   A,B     ;4        A starts off with 1/2H and will have L added
>       RRA          ;4        on each step.
> Loop: ADD  L       ;4

You can remove the overflow check and CP H by first negating A, or else 
using SUB instead of ADD, and then check solely for underflow/overflow.

>       JR   C,Diag  ;12/7     If the result is greater than H then a diagonal
>       CP   H       ;4        move is made and H is subtracted.  Otherwise
>       JR   C,Vert  ;12/7     a vertical move is made.
> Diag: SUB  H       ;4
>       LD   C,A     ;4
>       EXX          ;4
>       ADD  HL,DE   ;11
>       JP   Move    ;10
> Vert: LD   C,A     ;4
>       EXX          ;4
>       ADD  HL,BC   ;11
> Move: CALL PLOTHL
>       EXX          ;4
>       LD   A,C     ;4
>       DJNZ Loop    ;13       B holds the count of pixels to do.
>       RET

So the only thing left is the plot routine, which I am not seeing right
now for two reasons.

The first is that I don't understand the layout of the spectrum bitmap
in memory: a) what do consecutive bytes correspond to on the screen,
and b) what is the screen resolution/how many bytes per row/column?

The second is that I'm not seeing how to get from HL, which contains
the x,y coordinates of the point, to the actual pixel location which
I assume is embedded in a byte somewhere.  (At least on a 64,
coordinates 0,0 1,0 2,0 ... 7,0 are all contained in memory location
zero, etc.).

(Presumably the plot routine will either OR or EOR (XOR) into the
bitmap, since this is a routine we might use in a game).

So, once that and the above optimizations are taken care of we can 
count cycles for real.

> The main loop takes 95 cycles in the worst case (when every move is
> diagonal), plus whatever it takes to plot.  If the screen is so arranged
> that pixel (L,H) is located at HL (not impossible - it just requires an
> upside-down 8-bit display 256 pixels wide) then this will take 10 cycles.
> 
> This routine is untested so it probaly contains bugs.  Oh, and it doesn't
> plot the first pixel in the line, because on the spectrum you draw a line
> with "PLOT x,y: DRAW a,b".
> -- 
> ---- Ian Collier : imc@comlab.ox.ac.uk : WWW page (including Spectrum section):

Anyways, thanks for the routine!

Little by little I'm beginning to learn how to read this nutty Z80 notation :).

	evetS-


Article 69773 of comp.sys.cbm:
Path: news.acns.nwu.edu!newsfeed.acns.nwu.edu!news.ece.nwu.edu!news.cse.psu.edu!rutgers!news.sgi.com!howland.erols.net!news.mathworks.com!rill.news.pipex.net!pipex!server1.netnews.ja.net!warwick!bham!bhamcs!news.ox.ac.uk!news
From: imc@ecs.ox.ac.uk (Ian Collier)
Newsgroups: comp.sys.cbm,comp.sys.sinclair,comp.emulators.cbm
Subject: Re: That Z80 line routine
Date: 18 Jun 1997 16:00:53 GMT
Organization: Oxford University Computing Laboratory, UK
Lines: 49
Message-ID: <11304.imc@comlab.ox.ac.uk>
References: <5o00p9$gi9@news.acns.nwu.edu> <5o2e3j$f5p@news.acns.nwu.edu>
NNTP-Posting-Host: boothp2.ecs.ox.ac.uk
X-Local-Date: Wednesday, 18th June 1997 at 5:00pm BST
Xref: news.acns.nwu.edu comp.sys.cbm:69773 comp.sys.sinclair:40108 comp.emulators.cbm:21847

In article <5o2e3j$f5p@news.acns.nwu.edu>, sjudd@nwu.edu (Stephen Judd) wrote:
>>       LD   A,C     ;4        Compare the absolute differences.  We will
>>       CP   B       ;4        move the greater one to B as a counter and
>>       JR   NC,Horz ;12/7     the lesser one to L.  Go if C is greater.

>This looks backwards to be.  At least, on a 6510, if C>B then carry will
>be set, not clear.  (Or does NC not stand for "No Carry Set"?  The
>fact that you have both a flag and a register named C confuses me to
>no end :).

I have A holding the C register and execute CP B, which calculates C-B and
therefore sets the carry flag if B is greater.  On the Z80, the carry flag
is also the borrow flag, whereas on the 6502 the carry flag is the opposite
of the borrow flag.  And NC stands for "not carry".  So this jumps when C is
greater.

If after that you still think I have made a mistake then it is entirely
probably that I have. :-)

>> Loop: ADD  L       ;4
>
>You can remove the overflow check and CP H by first negating A, or else 
>using SUB instead of ADD, and then check solely for underflow/overflow.
>
>>       JR   C,Diag  ;12/7     If the result is greater than H then a diagonal
>>       CP   H       ;4        move is made and H is subtracted.  Otherwise
>>       JR   C,Vert  ;12/7     a vertical move is made.

Not sure what you meant here.

>So the only thing left is the plot routine, which I am not seeing right
>now for two reasons.

>The first is that I don't understand the layout of the spectrum bitmap
>in memory: a) what do consecutive bytes correspond to on the screen,
>and b) what is the screen resolution/how many bytes per row/column?

I was completely ignoring this, as you will have seen from my other posting.
I mentioned a hypothetical machine on which the address of a pixel happens
to equal its co-ordinates because it is an 8-bit screen 256 pixels wide.

>So, once that and the above optimizations are taken care of we can 
>count cycles for real.

Well I think if I really tried to optimise a line draw for the Spectrum
screen then people would start wondering why I wasn't doing any real
work...

imc


Article 69696 of comp.sys.cbm:
Path: news.acns.nwu.edu!newsfeed.acns.nwu.edu!news.ece.nwu.edu!news.cse.psu.edu!uwm.edu!newsfeeds.sol.net!news.maxwell.syr.edu!news.he.net!news.pagesat.net!decwrl!tribune.usask.ca!rover.ucs.ualberta.ca!news.ucalgary.ca!srv1.freenet.calgary.ab.ca!albrecht
From: "Alvin R. Albrecht" 
Newsgroups: comp.sys.sinclair,comp.sys.cbm,comp.emulators.cbm
Subject: Re: Spectrum Emulator for C64 [additonal translation]
Date: Mon, 16 Jun 1997 20:44:33 -0600
Organization: Calgary Free-Net
Lines: 15
Message-ID: <5o4tos$kho@ds2.acs.ucalgary.ca>
References: <337C5E94.388@actcom.co.il> <5mmvp4$b7r@news.acns.nwu.edu> <5mvqq7$7qk$6@gerry.cc.keele.ac.uk> <5nn1cj$17j4@ds2.acs.ucalgary.ca> <5npjud$2rv@news.acns.nwu.edu> <5nunbg$t3e@ds2.acs.ucalgary.ca> <33a64fa5.59106111@news.pacificnet.net>
NNTP-Posting-Host: albrecht@srv1.freenet.calgary.ab.ca
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
In-Reply-To: <33a64fa5.59106111@news.pacificnet.net>
Xref: news.acns.nwu.edu comp.sys.sinclair:40068 comp.sys.cbm:69696 comp.emulators.cbm:21800



On Mon, 16 Jun 1997, Rick The Notes Guy Dickinson wrote:

> Not really - on the 64, you just need to set the VIC-II chip (the
> video processor) to generate raster interrupts, and then do your

>  - Rick "Still have my copy of 'Mapping the 64' "Dickinson

I seem to have misplaced my copy which may explain away my error :).


Alvin




Article 69707 of comp.sys.cbm:
Path: news.acns.nwu.edu!merle!judd
From: judd@merle.acns.nwu.edu (Stephen Judd)
Newsgroups: comp.sys.sinclair,comp.sys.cbm,comp.emulators.cbm
Subject: Re: Spectrum Emulator for C64 [additonal translation]
Date: 18 Jun 1997 01:06:03 GMT
Organization: Northwestern University, Evanston, IL
Lines: 28
Message-ID: <5o7c9r$d3g@news.acns.nwu.edu>
References: <337C5E94.388@actcom.co.il> <5nunbg$t3e@ds2.acs.ucalgary.ca> <33a64fa5.59106111@news.pacificnet.net> <5o4tos$kho@ds2.acs.ucalgary.ca>
Reply-To: sjudd@nwu.edu (Stephen Judd)
NNTP-Posting-Host: merle.acns.nwu.edu
Xref: news.acns.nwu.edu comp.sys.sinclair:40071 comp.sys.cbm:69707 comp.emulators.cbm:21807

In article <5o4tos$kho@ds2.acs.ucalgary.ca>,
Alvin R. Albrecht  wrote:
>
>
>On Mon, 16 Jun 1997, Rick The Notes Guy Dickinson wrote:
>
>> Not really - on the 64, you just need to set the VIC-II chip (the
>> video processor) to generate raster interrupts, and then do your
>
>>  - Rick "Still have my copy of 'Mapping the 64' "Dickinson
>
>I seem to have misplaced my copy which may explain away my error :).

You could buy a new one from CMD :)

...or...

You could send me your address and I could buy you one for Christmas 8^)

Many times now I've felt that I should own four copies instead of my
tattered old one: one to keep at work, one to keep in a backpack,
one to keep at home, and one to keep sealed in my safe-deposit box :).

You know, for those Commodore 64 emergencies that always come up
during the course of a day, sitting at home in the evenings, out
on dates, etc.

-S


Article 69579 of comp.sys.cbm:
Path: news.acns.nwu.edu!newsfeed.acns.nwu.edu!news.ece.nwu.edu!news.cse.psu.edu!uwm.edu!vixen.cso.uiuc.edu!howland.erols.net!agate!nntpfeed.doc.ic.ac.uk!sunsite.doc.ic.ac.uk!lyra.csx.cam.ac.uk!news.ox.ac.uk!news
From: imc@ecs.ox.ac.uk (Ian Collier)
Newsgroups: comp.sys.sinclair,comp.sys.cbm,comp.emulators.cbm
Subject: Re: Shootout at the 0K Corral (was various other things)
Date: 16 Jun 1997 16:27:48 GMT
Organization: Oxford University Computing Laboratory, UK
Lines: 71
Message-ID: <11279.imc@comlab.ox.ac.uk>
References: <337C5E94.388@actcom.co.il> <5ngfdo$79r@news.acns.nwu.edu> <11213.imc@comlab.ox.ac.uk> <5nn5jl$34u@news.acns.nwu.edu>
NNTP-Posting-Host: boothp2.ecs.ox.ac.uk
X-Local-Date: Monday, 16th June 1997 at 5:27pm BST
Xref: news.acns.nwu.edu comp.sys.sinclair:39993 comp.sys.cbm:69579 comp.emulators.cbm:21732

In article <5nn5jl$34u@news.acns.nwu.edu>, sjudd@nwu.edu (Stephen Judd) wrote:
[drawing lines]
>>The one in the Spectrum ROM is pretty good, but it doesn't deal with

>Thanks, I'll go through it later on. :)

>Does it plot stuff straight into the bitmap or does it OR it into
>the map?  What would be the plot time for ORing it into the map?

As I mentioned, it calls a separate subroutine to plot co-ordinates.
This turns out to be quite slow.  The routine looks something like...

LD  A,B   ; y co-ordinate
AND A     ; insert on the left the digits 010
RRA
SCF
RRA
AND A
RRA
XOR B     ; overlay the last three bits of y co-ordinate
AND F8
XOR B
LD  H,A   ; this is the high byte of the address
LD  A,C   ; x co-ordinate
RLCA      ; rotate left
RLCA
RLCA
XOR B     ; overlay bits 3-5 of the y co-ordinate
AND C7
XOR B
RLCA      ; now the top 3 bits of A are bits 3-5 of the y co-ordinate;
RLCA      ; the rest is the x co-ordinate divided by 8
LD  L,A   ; this is the low byte of the address
LD  A,C
AND 7     ; this is the pixel number within the byte
INC A
LD  B,A   ; prepare to shift a '1' to the appropriate position
LD  A,1
RRCA
DJNZ *-3
OR  (HL)  ; plot the point.
LD  (HL),A

This could be made a tiny bit faster by using a table instead of doing the
loop at the end.  Oh, and I haven't mentioned setting the colour of the
plotted point.  This slows it down still further, obviously.

>My routine is a little more complicated in that it plots points in
>chunks at a time,

Yes, this would obviously make the routine a lot faster.  I didn't bother
to do this because it would be too much work. :-)

PS.  To understand the above routine, you need to know that the spectrum
screen is organised in a slightly strange fashion.  It is in three thirds,
each 8 characters high.  All the top pixel lines of the characters in the
first third come first, then the second pixel lines, and so on until the
third is complete.  After that come the other two thirds in similar fashion.
This makes it quicker to print characters since pixel lines of one character
are 256 bytes apart.  A screen address therefore looks like

0 1 0 t1 t0 p2 p1 p0   l2 l1 l0 c4 c3 c2 c1 c0

where t1 t0 is the third number, p2 p1 p0 is the pixel line number within
the character, l2 l1 l0 is the line number of the character within the
third, and c4 c3 c2 c1 c0 is the column number of the character.  This
corresponds to character co-ordinates (t1 t0 l2 l1 l0, c4 c3 c2 c1 c0)
or pixel co-ordinates (t1 t0 l2 l1 l0 p2 p1 p0, c4 c3 c2 c1 c0 0 0 0).

>>---- Ian Collier : imc@comlab.ox.ac.uk : WWW page (including Spectrum section):
>>------ http://www.comlab.ox.ac.uk/oucl/users/ian.collier/imc.html


Article 69715 of comp.sys.cbm:
Path: news.acns.nwu.edu!merle!judd
From: judd@merle.acns.nwu.edu (Stephen Judd)
Newsgroups: comp.sys.sinclair,comp.sys.cbm,comp.emulators.cbm
Subject: Re: Shootout at the 0K Corral (was various other things)
Date: 18 Jun 1997 03:15:15 GMT
Organization: Northwestern University, Evanston, IL
Lines: 109
Message-ID: <5o7js3$fuq@news.acns.nwu.edu>
References: <337C5E94.388@actcom.co.il> <11213.imc@comlab.ox.ac.uk> <5nn5jl$34u@news.acns.nwu.edu> <11279.imc@comlab.ox.ac.uk>
Reply-To: sjudd@nwu.edu (Stephen Judd)
NNTP-Posting-Host: merle.acns.nwu.edu
Xref: news.acns.nwu.edu comp.sys.sinclair:40074 comp.sys.cbm:69715 comp.emulators.cbm:21810

In article <11279.imc@comlab.ox.ac.uk>, Ian Collier  wrote:
>In article <5nn5jl$34u@news.acns.nwu.edu>, sjudd@nwu.edu (Stephen Judd) wrote:
>[drawing lines]
>
>As I mentioned, it calls a separate subroutine to plot co-ordinates.
>This turns out to be quite slow.  The routine looks something like...

[eeeeew :) ]

>>My routine is a little more complicated in that it plots points in
>>chunks at a time,
>
>Yes, this would obviously make the routine a lot faster.  I didn't bother
>to do this because it would be too much work. :-)

Heh heh, I understand very well :)

By the way, I didn't understand before that the Spectrum's horizontal
resolution was only 256 characters, so a 16-bit wide x-coordinate
is pointless (in a different posting I asked how the line routine
would change if x could be 16-bits).

>PS.  To understand the above routine, you need to know that the spectrum
>screen is organised in a slightly strange fashion.  It is in three thirds,
>each 8 characters high.  All the top pixel lines of the characters in the

Strange indeed! :)  But wait until I explain the C64 bitmap layout :)

Now that I've finally figured it out though, it seems to me that
given an X,Y coordinate the memory address is given by

address = 256*((Y AND 192)/8 + (Y AND 7)) + (Y AND 56)*4 + X/8

Right?  Bleah.  So clearly it's better to just update a bitmap pointer,
and forget about the actual X,Y screen coordinates.  Ugh, I've just
spent an hour or two trying to figure out a decent way to do it,
and I'm just not seeing it :(.  That is, as long as you remain
in the third of the screen its not TOO bad, but moving between
thirds seems to be a cast-iron bitch.

So, I guess if I were writing a routine for high speed, I'd have
three separate versions, one for each third of the screen.  I'd
rotate a bit to keep track of the x-coordinate, and OR that bit
straight into the memory pointed to by HL.  I'd either update
the Y pointer (but that doesn't look like so much fun, testing
for passing through each 8th row and such), so maybe a separate
table for each third could be used... oh man, bleah and bleah
again.

Pardon my dubious nature, but how in the world does one do high
speed lines on a Spectrum?  Surely there is something I'm missing
here.

Well, fair is fair, so I ought to tell you about the C64's kooky
bitmap memory layout, and why character bitmaps are sometimes used
instead.

The normal text screen is 40x25 charss, and memory increases as you
increase the column (base+1 is immediately to the right of base, etc.).
Well, the bitmap is the same way, except that there are now eight
bytes per cell.  So memory looks like:

0	8	16	...
1	9	17
2	10	18
...
7	15	23
320	328	344
321	329	345
322	330	346
...
327

So, if the bitmap is located at BASE, coordinates (0,0) through (7,0)
are located at BASE, coordinates (0,1)-(7,1) are at BASE+1, the
coord (9,2) is located at BASE+10, the coordinate (2,9) is at location
BASE+321, and so on.  So to increment in the X-direction through a
column means adding 8 to the pointer.  Incrementing Y means incrementing
the pointer by one, unless boundaries are passed, etc. etc.

On the 64 what I do is to keep a list of 25 pointers into the start
of each bitmap row, and index off of that pointer using the Y
register say (where Y=0..7).  So, when Y passes through 8, I just
add 320 to the pointer, etc.

The nice thing about a character bitmap is that you can order it in
any memory convenient way.  The usual way is to make consecutive
memory locations move downwards, i.e. a 16x16 (192 pixels x 192 pixels)
character map might look like

0	128	256	...
1	129	257
2	130	...
...	...
125	253
126	254
127	255

The practical consequence of this is that you can just keep the Y-coordinate
in the Y register, always, and index directly into the column pointers.
Since it doesn't have to carry the extra baggage that a full-bitmap does
it tends to be a little faster.

	evetS-

>>>---- Ian Collier : imc@comlab.ox.ac.uk : WWW page (including Spectrum section):
>>>------ http://www.comlab.ox.ac.uk/oucl/users/ian.collier/imc.html




Article 69607 of comp.sys.cbm:
Path: news.acns.nwu.edu!newsfeed.acns.nwu.edu!news.ece.nwu.edu!news.cse.psu.edu!uwm.edu!chi-news.cic.net!newsfeed.internetmci.com!news-peer.sprintlink.net!news.sprintlink.net!Sprint!news.maxwell.syr.edu!news.he.net!news.pagesat.net!decwrl!tribune.usask.ca!rover.ucs.ualberta.ca!news.ucalgary.ca!srv1.freenet.calgary.ab.ca!albrecht
From: "Alvin R. Albrecht" 
Newsgroups: comp.sys.sinclair,comp.sys.cbm,comp.emulators.cbm
Subject: Re: Shootout at the 0K Corral (was various other things)
Date: Sun, 15 Jun 1997 23:13:55 -0600
Organization: Calgary Free-Net
Lines: 329
Message-ID: <5o2i4q$req@ds2.acs.ucalgary.ca>
References: <337C5E94.388@actcom.co.il> <5ngfdo$79r@news.acns.nwu.edu> <5njrbj$k7k@news.acns.nwu.edu> <5nn1le$104g@ds2.acs.ucalgary.ca> <5npebm$ir@news.acns.nwu.edu>
NNTP-Posting-Host: albrecht@srv1.freenet.calgary.ab.ca
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
In-Reply-To: <5npebm$ir@news.acns.nwu.edu>
Xref: news.acns.nwu.edu comp.sys.sinclair:40021 comp.sys.cbm:69607 comp.emulators.cbm:21751


On 9 Jun 1997, Stephen Judd wrote:

> One draws and fires!

:-) You do know who the Earps are right?  Guess not, otherwise
you would have skipped town by now :-).

> >Here's one of your statements that I'd like you to explain:  the
> >6502 is significantly different from the 6809.  Please tell me 
> >what the 6502 has that the 6809 doesn't have and then show me what
> >they have in common so we can see just how different they are.

> BTW, my only statement on this subject was "Yep" :)

The 6502 instruction set seems to be a subset of the 6809's with 
the 6809 possibly lacking the 6502's prefetch.  The 6809 has
a single 16bit index register rather than two 8 bits, two 8 bit
accumulators rather than one and a few more addressing modes.
I'll let someone familiar with both chips comment.  

> Excellent!  I want to test my claim/guess that a Z80 is on average around
> three times slower than a 6510 :).  (Or, to put it in words possibly more
> acceptable to the Spectrum crowd, that you'd have to run a Z80 three times
> faster than a 6510 to get similar performance).

I'm expecting something less than 2:1 clock ratios on the average with
ratios of 1.5:1 achieved in some classes of problems.  I do concede
clock ratios of up to 2.5:1 (maybe 3:1) on rare occasions.
 
> One thing should be perfectly clear: a good algorithm can always
> overcome hardware limitations.

I'll have to file this one for the graphics hardware discussion :).
 

> >6502
> >----

> >Eight bit registers:
> >A      : accumulator
> >F      : flags
> >X      : index register 1
> >Y      : index register 2
> >SP     : stack pointer

> >16 bit registers:
> >PC     : program counter

> >The 6502 opts for holding data in a fast area of memory: page 0
> >which is used as a repository of registers.  Its two 8 bit index
 
> >What delivers the power in the 6502 is its diversity of addressing
> >modes.  This combined with the index registers gives the 6502 a 
> >set of 256 8 bit general purpose registers, exactly analogous with

> Again, they really aren't GP regs.  For instance, I can do an

Neither are the z80 registers.
 
> 	ADC #32
 
> on the accumulator, but not on a zero page location.  They are
> just like other memory, except they have shorter cycle times
> for identical operations.

Same with the z80 registers.  I can ADD A,r (r=8 bit register) in
4 cycles, but I can't add to any register r.  I can also ADC A,#32
(7 cycles) but I can't do the same with any other 8 bit register.

I think a very strong analogy (though not perfect) can be drawn between
page 0 and the z80's 8 bit registers.  Examples:

*SUB A,r  	SUB d,x
*ADD A,r 	ADD d,x
LD A,r  	LDA d,x
LD r,A   	STA d,x
*AND r   	AND d,x
*OR r      	OR d,x
*XOR r     	XOR d,x (?)

* can't be done on any z80 register besides A

> >The index registers have a lot in common in the indexed
> >addressing mode, with the z80's also able to behave as 16 bit
> >accumulators.

> I don't think so; the indexing difference is small on paper but
> strikes me as being enormous for coding.  The cycle times are
> quite different, too.

Yes.  Indexing is not used for the same reasons on the z80.  A z80
programmer attempts to keep variables in the on chip registers just
as the 6502 programmer attempts to keep them in page 0.

In another article, Bruce was trying to say that cycle times 
can occur in ratios of 4:1 and 6:1 (z80:6502) for some tasks, but
I can only see this happening if someone were trying to program the
z80 as if it were a 6502.  He chose a good example that aims at
the 6502's strengths and the z80's weaknesses: arbitrary precision
arithmetic.  I'd expect cycle ratios to be >2:1 but only as high
as 5:1 if the z80 version used index registers like the
6502 version.

All indexing instructions have about a 5:1 ratio in cycle times in
comparison with the 6502 (8 bit displacements).  The 16 bit displacements
can be done on a z80 by adding to the IX/IY registers or, as you
did with the byte transfer routine, the high byte of IX/IY can
be reloaded (LD IXh,n = 10 cycles, LD IXh,r=8 cycles).

> Z80 indexing looks to be 8+16 (8 bits memory, 16 bits index).
> On a 6510, it's 16+8 -- 16-bit address, 8-bit memory.  Moreover,
> the indexing is just as fast as a normal memory operation (sometimes
> plus one cycle, if a page boundary is crossed, but it's not important).
> This has enormous ramifications.  LDA $C000 is exactly as fast
> as LDA $C000,X and LDA $C000,Y.

Agreed.  On the z80, rather than using 16 bit indexing, one would opt
for 16 bit pointers where memory reads/writes can be performed in 
7 cycles.  In my experience, IX & IY are rarely used as index registers
and then only when speed is not important or all the other registers
are being used.

That said, they are commonly used by compilers for stack frames.
It is not uncommon to get a 2:1 or 3:1 speed improvement if compiled
code is rewritten by hand.  The 6502 is so simple and uniform that
a compiler should be able to generate very optimal code.


> >      Z80 version

> >      enter:  A=8 bit number
> >             DE=16 bit number
> >      exit:  HL=A*DE, least significant 16 bits
> >
> >            LD HL,0		10
> >            LD B,8		7
> >      loop  ADD HL,HL	11
> >            RLA		4
> >'A'         JR NC,noadd	12/7	; if 0: 12, 1: 18
> >'A'         ADD HL,DE	11	;  avg of 15
> >      noadd DJNZ loop	13/8
 
> Are you sure the above works?  I use A=128 and DE=1, and get an
> answer of HL=2, after two loop iterations.  Actually if RLA
> doesn't set Z, I get HL=0 (after a whole bunch of iterations).

Oh yeah, it works :)  I tried it on my Spectrum.  You have to complete the
8 iterations of the loop : notice HL is being doubled on each iteration
(shifted left), so naturally after going through two times, you've
only shifted the bit one time when it needs to be shifted to bit 7 (seven
shifts in total).

RLA is rotate left A (C is carry flag):

   .-----------------.
   C <- |76543210| <-'
 
> I think with a DEC B at noadd, it will work more like you want it
> to.  +4 cycles?

The DJNZ instruction decrements B and does a relative jump if B!=0, so
no need for a DEC B (DJNZ=Decrement, Jump Not Zero).
 
> Hmmm, that's no good, use best cases and worst cases.  Average comes
> out in the wash.

OK, here's something exact.  Let n=# set bits in A.  Then the time
to complete is:  10+7+8*(11+4+13)-5+18n+12(8-n)=6n+332 cycles.

The average time is    ( SUM[n=0..8] of (8Cn*(6n+332)) )/256=356 cycles
where "aCb" is # combinations = a!/((a-b)!b!) and specifically, 8Cn is
the number of unique bytes with n set bits in them.  So setting n=4
is ok to find the average.

The extremes are a minimum of 332 cycles and a maximum of 380 cycles.

> ACC and AUX are locations in zero page.  Cycle times are to the right.
> 
> * ACC*AUX -> [.A, EXT+1] (low,hi) 16 bit result
> 
> MULT
>           LDA #0		2
>           STA EXT+1		3
>           LDY #8		2
> 
> ]LOOP     ASL 			2
>           ROL EXT+1		5
>           ASL ACC		5

Section 'A':
>           BCC MUL2		3/2
>           CLC			  2
>           ADC AUX		  3
> 	  BCC MUL2		  3/2
> 	  INC EXT+1		    5

> MUL2      DEY			2
>           BNE ]LOOP		3/2

In the part marked A above, if the C flag is clear, 3 cycles are used.
If it's set, then either 10 or 14 depending on whether the most
significant byte of the running product needs to be incremented.  Let
p=#times this has to be done (0-n).  Again, n=#set bits in ACC.

I count 2+3+2+8*(2+5+5+2+3)-1+10n+4p+3*(8-n)=166+7n+4p 6502 cycles.
The extremes are a min of 166 cycles and a max of 226 cycles.

The z80 used 6n+332 cycles for the same or a little less than a
2:1 cycle ratio (z80:6502)

By unrolling, the z80 saves 7+11+7*13+8=117 cycles, so the unrolled
version takes 6n+215 cycles.

By unrolling, the 6502 saves 2+2+5+7*(2+3)+2+2=48 cycles, so the
unrolled version takes 118+7n+4p cycles.

The ratio is moving closer to 1.8:1.

> (My normal version, as well as the divide version, is on my
> web page).

Divide.  That was what I was going to do next.  I wrote this version
many years ago, so it's no effort to put here.  It'll give you
something to chew on while I find time to write a fast draw.

It's a simple shift and subtract algorithm that depends on the
following fact:

            a/b=c+d/b  ; d=remainder
and multiplying both sides by b:
          b*(a/b)=c*b+d
Treat c as a binary number composed of sums of powers of 2 and I can
find a/b by repeatedly subtracting b*powers of 2 from a, starting with the
largest power of 2.  Every time a subtraction is possible, set the
corresponding bit in c (in the case below, I'm incrementing the quotient
and shifting it as I go along). At the end, a-everything subtractable will be
the remainder.

Here's a z80 version:

	enter:  DE=dividend
	        HL=divisor

	exit:   AC=DE div HL (all 16 bit divide)
	        HL=DE mod HL (remainder)
	        C flag set if divide OK
	        NC flag if divide by 0 error

4	DIVIDE  LD A,H			; check for divide by 0
4	        OR L
11/5	        RET Z			; return with NC if error
4	        XOR A			; A=B=C=0
4	        LD C,A
4	        LD B,A
11	shiftlp PUSH HL		; keep pushing divisor
4	        INC B			; shifted left on stack on each
11	        ADD HL,HL		; iteration until carry set.
10/10	        JP NC,shiftlp		; B counts how many are on stack
4	        EX DE,HL		; HL=dividend and DE=divisor
8	sublp   SLA C			; shift quotient AC left
4	        RLA			; C flag always reset here -
	                 		;  important for SBC (sub w carry) below
10	        POP DE 		; grab divisor*power of 2
	                  		; largest # is popped first
15	        SBC HL,DE		; is dividend > divisor*power2?
12/7	        JR C,cantsub		; if no, have to add it back
4	        INC C			; increment quotient AC
7	        ADC A,0
13/8	        DJNZ sublp		; continue until all divisors
4	        SCF			; *power2 are off stack and
10	        RET			; set carry to indicate success 
	               		; when done & return
11	cantsub ADD HL,DE		; add back divisor*power2
12/7	        DJNZ sublp		; continue until stack is empty
4	        SCF			; set carry to indicate successful
10	        RET			; divide and return


If you have a 6502 version of this algorithm, let's have a look and
compare them.

> >2.  I want to look at exactly how much slower the 6502 is at 
> >accessing the full 64k compared to the z80.  Let's copy nn bytes
> >(16 bit) from a 16 bit source address to a 16 bit destination address.

> This is the routine I use:
 
> (I leave out the initialization of registers and memory source/dest)
 
> :LOOP	LDA $8000,Y	4
> 	STA $8000,Y	4
> 	INY		2
> 	BNE :LOOP	3/2
> 	INC :LOOP+2	  6
> 	INC :LOOP+5	  6
> 	DEX		  2
> 	BNE :LOOP	  3/2

Self modifying code?  You sneeze all over the tablecloth the first
chance you get :).  I suppose this is going to be standard fair,
so I'll just have to be more careful in choosing examples where
you won't be able to do this. Then we'll see some larger cycle
gaps.

> and call it 13.06 cycles it's OK by me :).

Well, I'm not *that* pedantic (close though :).  Ian's routine took
39 cycles by the same token.

> Now, on a 6510, people who need fast memory fills usually unroll
> part of the loop, e.g.
> or whatever.  This gets it closer to the maximum possible memory
> transfer rate of 8*nn cycles.

It approaches 8nn *very* slowly.  In contrast, unrolling the z80 LDIR
instruction (to LDI) reduces the transfer rate to 16nn cycles very quickly.

Summary:  The cycle ratio for the non-complex instruction version of
the byte transfer is 3:1 (z80:6502).  The cycle ratio of the fastest
(not unrolled) version is 1.6:1.  Unrolling makes the ratio approach
2:1 from 1.3:1 but very slowly.


Phew!  That's enough for today:  this thread is a full time job.


Alvin



Article 69728 of comp.sys.cbm:
Path: news.acns.nwu.edu!merle!judd
From: judd@merle.acns.nwu.edu (Stephen Judd)
Newsgroups: comp.sys.sinclair,comp.sys.cbm,comp.emulators.cbm
Subject: Re: Shootout at the 0K Corral (was various other things)
Date: 18 Jun 1997 05:03:57 GMT
Organization: Northwestern University, Evanston, IL
Lines: 167
Message-ID: <5o7q7t$iba@news.acns.nwu.edu>
References: <337C5E94.388@actcom.co.il> <5nn1le$104g@ds2.acs.ucalgary.ca> <5npebm$ir@news.acns.nwu.edu> <5o2i4q$req@ds2.acs.ucalgary.ca>
Reply-To: sjudd@nwu.edu (Stephen Judd)
NNTP-Posting-Host: merle.acns.nwu.edu
Xref: news.acns.nwu.edu comp.sys.sinclair:40076 comp.sys.cbm:69728 comp.emulators.cbm:21816

Alvin R. Albrecht  wrote:
>On 9 Jun 1997, Stephen Judd wrote:
>>
>I'm expecting something less than 2:1 clock ratios on the average with
>ratios of 1.5:1 achieved in some classes of problems.  I do concede

And I expect, and see, the opposite.  As far as I can tell, the Z80
has three strengths:

	- Multiple internal registers
	- 16-bit operations
	- Specialized instructions like CPIR

So, as a practical matter, that means that algorithms which can fit
entirely within the registers, or involve many 16-bit operations,
will do very well.  Otherwise, they choke very quickly.  Any program
with multiple variables, tables, etc. gets very complicated, because
to access memory practically the Z80 needs to give up a register.
Here is a little piece of code from a music player I wrote:

	LDA DIV8T7,X	;Table of (X/8)*7
	CLC
	ADC CURFIELD	;Variable in memory, gives an offset
	TAY
	LDA BITP,X	;Table of bit values
	ORA MACBYTE1,Y	;Activate bit in table
	STA MACBYTE1,Y

Or how about, from BLARG,

LINEPLOT                  ;Plot the line chunk
	LDA CX
	ORA CY
	BMI :SKIP

	LDA (POINT),Y    ;Otherwise plot
	EOR BITMASK
	ORA CHUNK
	AND OLDCHUNK
	EOR CHUNK
	EOR (POINT),Y
	STA (POINT),Y
:SKIP

See how the instructions like ORA and ADC can access memory directly?
All of the major 6510 instructions -- compare, add/subtract, logical
operators -- can use _all_ of the 6510 addressing modes: zero page,
direct memory, indirect, indexed offset, and so on.

Also, instructions like INC/DEC and the rotate instructions can also
operate directly on memory

	ROR $FA
	INC $D020

these are features essential to e.g. my graphics routines, since
they mean all sorts of counters, bit positions, etc. can sit in
memory.

Since most programs of any magnitude involve a lot of variables
and memory accesses, I think Bruce's 4:1 rule of thumb is probably
more accurate than my 3:1 guesstimate.

>> One thing should be perfectly clear: a good algorithm can always
>> overcome hardware limitations.
>
>I'll have to file this one for the graphics hardware discussion :).

You just better make sure your algorithms are better than mine ;^).

>> I don't think so; the indexing difference is small on paper but
>> strikes me as being enormous for coding.  The cycle times are
>> quite different, too.
>
>Yes.  Indexing is not used for the same reasons on the z80.  A z80
>programmer attempts to keep variables in the on chip registers just
>as the 6502 programmer attempts to keep them in page 0.

Perhaps I wasn't clear: Z80 indexing doesn't look very useful.
6510 indexing, on the other hand, is the lifeblood which flows
through a program.  The "indexing difference" above is the
difference between a Z80 and a 6510 -- slow and cumbersome
on a Z80, fast and extraordinarily useful on a 6510.

Browse through the code on my web page sometime, and see how
many programs use indexing, and the wide variety of tasks it
is used for.

>z80 as if it were a 6502.  He chose a good example that aims at
>the 6502's strengths and the z80's weaknesses: arbitrary precision
>arithmetic.

Why use those examples?  I think the substring search is a fine
example, as is the fast multiply, as is drawing lines, as is...

A good example of something that aims at the 6502's strengths and
the Z80's weaknesses is a program which needs to access a number
of variables and tables scattered around memory -- if you really
want them I can give them to you in scads; I prefer to focus in on
either common or time-critical tasks that are fairly compact.

>> Are you sure the above works?  I use A=128 and DE=1, and get an
>
>Oh yeah, it works :)  I tried it on my Spectrum.  You have to complete the

Yep, I didn't understand the DJNZ :).

>> Hmmm, that's no good, use best cases and worst cases.  Average comes
>> out in the wash.
>
>The extremes are a minimum of 332 cycles and a maximum of 380 cycles.

That's all that is necessary (although an idea of distribution can
be useful).  There's no need to complicate matters with lots of
numbers and other gobbledygook.

>
>> ACC and AUX are locations in zero page.  Cycle times are to the right.
>> 
>> * ACC*AUX -> [.A, EXT+1] (low,hi) 16 bit result
>
>The extremes are a min of 166 cycles and a max of 226 cycles.

Correct.

>The z80 used 6n+332 cycles for the same or a little less than a
>2:1 cycle ratio (z80:6502)

So, as I also said, around 2:1, and for an algorithm ideally suited 
to the Z80: totally internal, and involving many 16-bit operations.

>> (My normal version, as well as the divide version, is on my
>> web page).
>
>Divide.  That was what I was going to do next.  I wrote this version
>many years ago, so it's no effort to put here.  It'll give you
>something to chew on while I find time to write a fast draw.

The divide routine (at least my divide routine) is identical to
the multiply in operation and offers no new insight.  As I said,
though, it's on my web page if you want to take a look at it.

>[self-modifying code]
>so I'll just have to be more careful in choosing examples where
>you won't be able to do this. Then we'll see some larger cycle

Why?  Why not just pick some common tasks that a program needs to
perform, and see what happens?  That's what I'm doing.

>Well, I'm not *that* pedantic (close though :).  Ian's routine took
>39 cycles by the same token.

Yep, factor of three already, and that's with an algorithm that
sits entirely within the registers.  That tells me that the Z80 
isn't quite so great at accessing memory rapidly; luckily of
course the Z80 has a special instruction designed for doing
just this task.

So far, we've found that toy problems ideally suited to the Z80
run at around 2:1 on a 6510 -- about what one might expect.

The string compare which you suggested runs at around 3:1 in
favor of the 6510.  Let's see how the time-critical applications
of the fast multiply and line routine compare, and toss in
some text routines (string print) for fun, and see what happens, eh?

	evetS-


Article 69799 of comp.sys.cbm:
Path: news.acns.nwu.edu!merle!judd
From: judd@merle.acns.nwu.edu (Stephen Judd)
Newsgroups: comp.sys.cbm,comp.sys.sinclair,comp.emulators.cbm
Subject: Re: That Z80 line routine
Date: 19 Jun 1997 00:36:23 GMT
Organization: Northwestern University, Evanston, IL
Lines: 75
Message-ID: <5o9uu7$d8f@news.acns.nwu.edu>
References: <5o00p9$gi9@news.acns.nwu.edu> <5o2e3j$f5p@news.acns.nwu.edu> <11304.imc@comlab.ox.ac.uk>
Reply-To: sjudd@nwu.edu (Stephen Judd)
NNTP-Posting-Host: merle.acns.nwu.edu
Xref: news.acns.nwu.edu comp.sys.cbm:69799 comp.sys.sinclair:40122 comp.emulators.cbm:21861

In article <11304.imc@comlab.ox.ac.uk>, Ian Collier  wrote:
>In article <5o2e3j$f5p@news.acns.nwu.edu>, sjudd@nwu.edu (Stephen Judd) wrote:
>>>       LD   A,C     ;4        Compare the absolute differences.  We will
>>>       CP   B       ;4        move the greater one to B as a counter and
>>>       JR   NC,Horz ;12/7     the lesser one to L.  Go if C is greater.
>
>>This looks backwards to be.  At least, on a 6510, if C>B then carry will
>
>I have A holding the C register and execute CP B, which calculates C-B and
>therefore sets the carry flag if B is greater.  On the Z80, the carry flag

OK, I'll take your word for it :).

>>> Loop: ADD  L       ;4
>>
>>You can remove the overflow check and CP H by first negating A, or else 
>>using SUB instead of ADD, and then check solely for underflow/overflow.
>>
>>>       JR   C,Diag  ;12/7     If the result is greater than H then a diagonal
>>>       CP   H       ;4        move is made and H is subtracted.  Otherwise
>>>       JR   C,Vert  ;12/7     a vertical move is made.
>
>Not sure what you meant here.

Well, you are basically doing

	A = H/2
loop	A = A + dx
	if A>dy then A=A-dy : y=y+1

which is exactly the same as doing

	A = H/2
loop	A = A - dx
	if A<0 then A=A+dy : y=y+1

The 6510 signals subtraction underflow by clearing the carry flag, and
I assume the Z80 does something similar, so the CP H gets taken care of
automatically by the subtraction.  And since you're subtracting, you never
have problems with overflow, so the JR C,Diag isn't necessary either.

The NEG comment refers to the fact that the algorithm could also
be written as

	A = -H/2
loop	A = A + dx
	if A>0 ...

>>The first is that I don't understand the layout of the spectrum bitmap
>>in memory: a) what do consecutive bytes correspond to on the screen,
>>and b) what is the screen resolution/how many bytes per row/column?
>
>I was completely ignoring this, as you will have seen from my other posting.
>I mentioned a hypothetical machine on which the address of a pixel happens
>to equal its co-ordinates because it is an 8-bit screen 256 pixels wide.

Got it.  NP :)

>>So, once that and the above optimizations are taken care of we can 
>>count cycles for real.
>
>Well I think if I really tried to optimise a line draw for the Spectrum
>screen then people would start wondering why I wasn't doing any real
>work...

What, not willing to give up the job/wife/family/health for the 8-bit
computer?  Priorities, man, priorities!

Long have I have yearned to write "Writes really bitchin C64 code" on my
resume, along with "Can beat the Cyber Hordes in Laser Squad" and "Can 
keep a hacky-sack aloft for extended periods of time".

(The optimization in question was just to remove the JP and CP H, BTW :)

	evetS-


Article 69815 of comp.sys.cbm:
Path: news.acns.nwu.edu!newsfeed.acns.nwu.edu!news.ece.nwu.edu!news.cse.psu.edu!rutgers!usenet.logical.net!news.mathworks.com!europa.clark.net!dispatch.news.demon.net!demon!mail2news.demon.co.uk!not-for-mail
From: Jason 
Newsgroups: comp.sys.sinclair,comp.sys.cbm,comp.emulators.cbm
Subject: Re: Spectrum Emulator for C64 [additonal translation]
Followup-To: comp.sys.sinclair,comp.sys.cbm,comp.emulators.cbm
Date: Wed, 18 Jun 97 21:04:17 GMT
Organization: Cosine Systems
Message-ID: <9706182104.AA00gg9@cosine.demon.co.uk>
References: <337C5E94.388@actcom.co.il> <5nunbg$t3e@ds2.acs.ucalgary.ca> <33a64fa5.59106111@news.pacificnet.net> <5o4tos$kho@ds2.acs.ucalgary.ca> <5o7c9r$d3g@news.acns.nwu.edu>
X-Mail2News-User: tmr@cosine.demon.co.uk
X-Mail2News-Path: relay-1.mail.demon.net!gate.demon.co.uk!cosine.demon.co.uk
X-Newsreader: TIN [AMIGA 1.3 950726BETA PL0]
Lines: 30
Xref: news.acns.nwu.edu comp.sys.sinclair:40131 comp.sys.cbm:69815 comp.emulators.cbm:21870

Stephen Judd:
> Many times now I've felt that I should own four copies instead of my
> tattered old one: one to keep at work, one to keep in a backpack,
> one to keep at home, and one to keep sealed in my safe-deposit box :).

I've got about seven copies of the PRG, one I take to work, one by the
C64C, one by the C128D, one on the bedside table and one *in* the bed.
I know that's only five, but I can't *find* the others!  I *do* know
that if you keep a PRG in the bed, make sure it's not a ring-bound copy,
you'll find out why when you wake up with minor lacerations up one arm...
=-)

> You know, for those Commodore 64 emergencies that always come up
> during the course of a day, sitting at home in the evenings, out
> on dates, etc.

All the others yeah, but if I *ever* start talking about C64's when my
girlfriend is about she hits me.  Hard.

I tend to talk about them quite a lot! =-)
--
Jason  =-)
     _______________________________________________________________________
TMR /     /     /     /  /     /     /                                     /\
   /  /__/  /  /  /__/  /  /  /  /__/    Email: tmr@cosine.demon.co.uk    / /
  /  /\_/  /  /__   /  /  /  /  __//          Cosine Homepage:           / /
 /  /__/  /  /  /  /  /  /  /  /  /    http://www.cosine.demon.co.uk    / /
/_____/_____/_____/__/__/__/_____/_____________________________________/ /
\_____\_____\_____\__\__\__\_____\_____________________________________\/



Article 69772 of comp.sys.cbm:
Path: news.acns.nwu.edu!newsfeed.acns.nwu.edu!news.ece.nwu.edu!news.cse.psu.edu!rutgers!newsin.iconnet.net!www.nntp.primenet.com!nntp.primenet.com!news.mathworks.com!cam-news-hub1.bbnplanet.com!cpk-news-hub1.bbnplanet.com!news.bbnplanet.com!dispatch.news.demon.net!demon!peernews.ftech.net!telehouse1.frontier-networks.co.uk!Aladdin!aladdin.net!ns2.aladdin.net!RMplc!rmplc.co.uk!yama.mcc.ac.uk!nntpfeed.doc.ic.ac.uk!sunsite.doc.ic.ac.uk!lyra.csx.cam.ac.uk!news.ox.ac.uk!news
From: imc@ecs.ox.ac.uk (Ian Collier)
Newsgroups: comp.sys.sinclair,comp.sys.cbm,comp.emulators.cbm
Subject: Re: Shootout at the 0K Corral (was various other things)
Date: 18 Jun 1997 15:45:08 GMT
Organization: Oxford University Computing Laboratory, UK
Message-ID: <11302.imc@comlab.ox.ac.uk>
References: <337C5E94.388@actcom.co.il> <5nn5jl$34u@news.acns.nwu.edu> <11279.imc@comlab.ox.ac.uk> <5o7js3$fuq@news.acns.nwu.edu>
NNTP-Posting-Host: boothp2.ecs.ox.ac.uk
X-Local-Date: Wednesday, 18th June 1997 at 4:45pm BST
Lines: 55
Xref: news.acns.nwu.edu comp.sys.sinclair:40106 comp.sys.cbm:69772 comp.emulators.cbm:21846

In article <5o7js3$fuq@news.acns.nwu.edu>, sjudd@nwu.edu (Stephen Judd) wrote:
>By the way, I didn't understand before that the Spectrum's horizontal
>resolution was only 256 characters,

Or pixels, as we call them. :-)

>Now that I've finally figured it out though, it seems to me that
>given an X,Y coordinate the memory address is given by

>address = 256*((Y AND 192)/8 + (Y AND 7)) + (Y AND 56)*4 + X/8

Something like that...

>Right?  Bleah.  So clearly it's better to just update a bitmap pointer,
>and forget about the actual X,Y screen coordinates.  Ugh, I've just
>spent an hour or two trying to figure out a decent way to do it,
>and I'm just not seeing it :(.  That is, as long as you remain
>in the third of the screen its not TOO bad, but moving between
>thirds seems to be a cast-iron bitch.

Here is some code that I usually use to move down one pixel line from
wherever we are (address in HL).

      INC H
      LD  A,H
      AND 7
      JR  NZ,end
      LD  A,L
      ADD 32
      LD  L,A
      JR  C,end
      LD  A,H
      SUB 8
      LD  H,A
end:

When this is done 191 times (because 192 is how tall the screen is), it takes
27 cycles in 168 cases, 49 cycles in 2 cases and 59 cycles in 21 cases.

>Pardon my dubious nature, but how in the world does one do high
>speed lines on a Spectrum?

I wouldn't know as I don't really specialise in fast code.

>Well, fair is fair, so I ought to tell you about the C64's kooky
>bitmap memory layout,

[snip description]

Looks a bit like some modes of the BBC micro, though I've never seriously
used one of those.  I did once recode a Spectrum BASIC program on to a BBC
which involved the use of a small machine code routine to scroll upwards the
first 32 columns of pixels in mode 2 (or was it 5?).

imc


Article 69803 of comp.sys.cbm:
Path: news.acns.nwu.edu!newsfeed.acns.nwu.edu!news.ece.nwu.edu!news.cse.psu.edu!news.ems.psu.edu!news3.cac.psu.edu!howland.erols.net!ais.net!newsfeed.direct.ca!news.he.net!news.pagesat.net!decwrl!tribune.usask.ca!rover.ucs.ualberta.ca!news.ucalgary.ca!srv1.freenet.calgary.ab.ca!albrecht
From: "Alvin R. Albrecht" 
Newsgroups: comp.sys.sinclair,comp.sys.cbm,comp.emulators.cbm
Subject: Re: Spectrum Emulator for C64
Date: Tue, 17 Jun 1997 17:13:06 -0600
Organization: Calgary Free-Net
Lines: 65
Message-ID: <5o75oi$14hc@ds2.acs.ucalgary.ca>
References: <33845f94.1768387@commodore64.com> <5nhubi$13l0@ds2.acs.ucalgary.ca>  <5o1c2n$12lc@ds2.acs.ucalgary.ca> <5o4k56$aht@news.acns.nwu.edu>
Reply-To: "Alvin R. Albrecht" 
NNTP-Posting-Host: albrecht@srv1.freenet.calgary.ab.ca
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
In-Reply-To: <5o4k56$aht@news.acns.nwu.edu>
Xref: news.acns.nwu.edu comp.sys.sinclair:40124 comp.sys.cbm:69803 comp.emulators.cbm:21863



On 17 Jun 1997, Stephen Judd wrote:

In comparing cycle ratios for typical tasks:

> >> 	The 4:1 was the rule of thumb way back when.  Obviously, you can
> >> find multiply intensive tasks that will take it closer to 2:1, just as you
> >> can find arbitrary precision arithmatic that will take it closer to 6:1.

> >I'm expecting a 2:1 cycle ratio for tasks (z80:6502) most of the time and
> >at other times as low as 1.5:1 or maybe less.

In finding a way to compare different processors, on the maximum clock
rates of the z80 and 6502 when using the same device geometries:

> >I still believe it's 2:1, but I won't hold it as gospel.

We're speaking of two different kinds of numbers here unless I've
misunderstood Bruce (BTW - is that a 40MHz 6502 in a discrete package or
on an ASIC?  This is the only thing that makes me doubt the 2:1
maximum clock ratios).

The first number has to do with typical cycle ratios for tasks.  This
last number has to do with maximum clock rates and is the basis for
the comparison of a generic z80 and generic 6502.

I think the evidence is quite strong to suggest a 2:1 maximum clock
speed for the z80:6502 when using the same device geometries.

As for the coding competition, that deals with the first numbers.
So far it's <2:1.  And in case you think so, I didn't just pick a 2:1
ratio for tasks when this thread started.

> There's still the fast multiply and string print routine to be done,
> not to mention the (*LAUGH* :) "software sprite routine".  I don't
> really expect the third to ever make an appearance, but I'd certainly
> like to see the other two, which are very short programs, very
> useful programs, and very easy to write programs.

Stephen, your doubt that software sprites exist is very questionable
(ludicrous actually, but I'll bite my tongue :) given that many many games
have been written with moving images that don't disturb the background.

But you have asked for a very non-trivial task.  If you wait around long
enough, it will appear.

> The claims about the merits of the Z80 over the 6510 and the Spectrum over
> the C64 are so repetitious and insistent that I would think you'd jump at 
> the chance to conclusively demonstrate their true merit, in a form plain
> for people all over the world to see.

It is being shown here slowly.  So far, everything is being done at <2:1
clock ratios.  I can also see advantages/disadvantages in the
architecture.  From that I can conclude what tasks will be done faster on
each processor.  The z80 wasn't the most popular 8 bit because of its
price (it was expensive), its ease of programming (the 6502/6809 are
much simpler for a beginner) or its simple design (it's the most complex
of all the 8 bits) - it was because of its speed and ease of incorporating
in a system (vs the 8080 on the latter).


Alvin




Article 69847 of comp.sys.cbm:
Path: news.acns.nwu.edu!newsfeed.acns.nwu.edu!news.ece.nwu.edu!news.cse.psu.edu!news.math.psu.edu!chi-news.cic.net!feeder.chicago.cic.net!feed1.news.erols.com!news.ecn.uoknor.edu!munnari.OZ.AU!metro!metro!seagoon.newcastle.edu.au!cc.newcastle.edu.au!ecbm
From: "Bruce R. McFarling" 
Newsgroups: comp.sys.sinclair,comp.sys.cbm,comp.emulators.cbm
Subject: Re: Spectrum Emulator for C64
Date: Thu, 19 Jun 1997 16:11:02 +1000
Organization: The University of Newcastle
Lines: 57
Message-ID: 
References: <33845f94.1768387@commodore64.com> <5nhubi$13l0@ds2.acs.ucalgary.ca>  <5o1c2n$12lc@ds2.acs.ucalgary.ca> <5o4k56$aht@news.acns.nwu.edu> <5o75oi$14hc@ds2.acs.ucalgary.ca>
NNTP-Posting-Host: cc.newcastle.edu.au
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
In-Reply-To: <5o75oi$14hc@ds2.acs.ucalgary.ca>
Xref: news.acns.nwu.edu comp.sys.sinclair:40148 comp.sys.cbm:69847 comp.emulators.cbm:21891

On Tue, 17 Jun 1997, Alvin R. Albrecht wrote:

> ... (BTW - is that a 40MHz 6502 in a discrete package or
> on an ASIC?  This is the only thing that makes me doubt the
> 2:1 maximum clock ratios).

	I must not have been clear.  The maximum a Z80 can do something is
50% the speed of the 6502, so I asked where was the 40MHz Z80.  20MHz
65C02's are available.  Chips, ASIC, design rules, whatever.
	2:1 is based on the state machine of the two processors.  Sure it
assumes that the chips are both accessing data in RAM, but a RAM that you
can hook a 20MHz 6502 to, you can hook a 40MHz Z80 to.
	But the 6502 instruction set is very RISC-ish, if you look on zero
page locations as a bank of 256 8-byte registers that can be accessed as a
stack or directly, with words not requiring alignment, which is the reason
that for the tasks that people were thinking of in the early 80's, a 1MHz
6502 was considered roughly equivalent to a 4MHz Z80.  You do something
like check which one of four strings is matched first, and the string
locations are redirectable, and the type of

  -	lda (source),y
	cmp (target1),y
	beq +
	cmp (target2),y
	beq ++
	cmp (target3),y
	beq +++
	cmp (target4),y
	beq ++++
	iny
	bne -

inner loop speeds things up.

	OTOH, the reason that it is still being used today is because of
its efficiency. <4,000 gates leaves a hell of a lot of real estate on a
chip mask: plenty for ample ROM, a good chunk of RAM for some zero page
and stack page locations, a couple three special support circuits on the
level of a VIA.

> It is being shown here slowly.  So far, everything is being done at <2:1
> clock ratios.  I can also see advantages/disadvantages in the
> architecture.  From that I can conclude what tasks will be done faster on
> each processor.  The z80 wasn't the most popular 8 bit because of its
> price (it was expensive), its ease of programming (the 6502/6809 are
> much simpler for a beginner) or its simple design (it's the most complex
> of all the 8 bits) - it was because of its speed and ease of incorporating
> in a system (vs the 8080 on the latter).

	The first portable 8-bit microcomputer disk operating system had
nothing to do with it?

Virtually,

Bruce R. McFarling, Newcastle, NSW
ecbm@cc.newcastle.edu.au



Article 70010 of comp.sys.cbm:
Path: news.acns.nwu.edu!newsfeed.acns.nwu.edu!news.luc.edu!uchinews!cbgw2.lucent.com!uunet!in3.uu.net!206.154.70.8!news.webspan.net!feed1.news.erols.com!news-xfer.netaxs.com!newshub2.home.com!newshub1.home.com!news.home.com!enews.sgi.com!decwrl!tribune.usask.ca!rover.ucs.ualberta.ca!news.ucalgary.ca!srv1.freenet.calgary.ab.ca!albrecht
From: "Alvin R. Albrecht" 
Newsgroups: comp.sys.sinclair,comp.sys.cbm,comp.emulators.cbm
Subject: Re: Spectrum Emulator for C64
Date: Thu, 19 Jun 1997 22:41:29 -0600
Organization: Calgary Free-Net
Lines: 169
Message-ID: <5od1oj$bgg@ds2.acs.ucalgary.ca>
References: <33845f94.1768387@commodore64.com> <5nhubi$13l0@ds2.acs.ucalgary.ca>  <5o1c2n$12lc@ds2.acs.ucalgary.ca> <5o4k56$aht@news.acns.nwu.edu> <5o75oi$14hc@ds2.acs.ucalgary.ca> 
Reply-To: "Alvin R. Albrecht" 
NNTP-Posting-Host: albrecht@srv1.freenet.calgary.ab.ca
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
In-Reply-To: 
Xref: news.acns.nwu.edu comp.sys.sinclair:40265 comp.sys.cbm:70010 comp.emulators.cbm:21970



On Thu, 19 Jun 1997, Bruce R. McFarling wrote:

> 	I must not have been clear.  The maximum a Z80 can do something is
> 50% the speed of the 6502, so I asked where was the 40MHz Z80.  20MHz

I don't think 50% is the ceiling: this is the result when all variables
are kept in registers.  It'll be faster, for example,  in recursive
algorithms (and their iterative cousins which are also stack intensive).

> 65C02's are available.  Chips, ASIC, design rules, whatever.

As already mentioned the fastest plain z80 is 20MHz.  But I'm not so
sure that Zilog pursues the original z80 as vigorously anymore.

Here's a brief description of the z180 which is closer to the z80 than the
z380:

- binary compatible with the z80
- more instructions (what these are I don't know - on the z380, which is
binary compatible with the z180, they include stack pointer relative loads
and stores, multiply, divide, 8/16/24 bit indexed offsets, etc.)
- registers/data bus 16 bits, address bus 32 bits
- more alternate register sets
- two DMA channels
- on chip interrupt controllers
- on chip wait state generators
- an MMU
- clocked serial port
- two 16bit counter/timers
- two UARTS (to 512kps)
- power down modes

As you can see, this is a much more complicated chip than the z80 but
it can be clocked at 33MHz - a faster clock than the max 20MHz z80.  You
may argue that this is a redesigned z80 from the ground up (hasn't the
6502 been redesigned since its appearance?), but:

If I look back in time, I do see the 2:1 max clock ratios very clearly at 
the same times.  An example of this is the Spectrum using a Z80A (4MHz
max) and the C64 using a 2MHz 6502; keep in mind the C64 arrived a little
later in the US than the Spectrum in the UK.

> 	But the 6502 instruction set is very RISC-ish, if you look on zero
> page locations as a bank of 256 8-byte registers that can be accessed as a
> stack or directly, with words not requiring alignment, which is the reason

Yes, this is how I view page 0 as well and why I compare page 0 to the on
chip z80 registers.

But how many subroutines that you write need all 256 registers?  There is 
a point of diminishing returns. In my experience, for
general tasks, the variables needed within loops (where most time is
spent) and within computationally intensive subroutines can usually be
kept in the z80's registers.  An example where this is not possible was in
the divide subroutine shown here - in this case, the missing variables
were stored in the stack in a sequential manner, something that can be
quickly accessed by a z80.  As long as variables can be stored in memory
sequentially, you won't see the 4:1 cycle ratio you expect and will likely
see ratios of ~2:1 (Keep in mind stack operations are 16 bit: cost of
recovery and storage per byte is 5.5 cycles vs the 6502's zero page store
of 5 cycles - 1:1, and 8 bit memory operations using HL are 7 cycles).

In the cases where a page 0 is needed, the z80 can also emulate a page
0 anywhere in RAM that is 256byte aligned using the HL register:

LD H,high byte of bottom of page 0
then L serves the same function as the 6502's index register x/y

STA d,x      LD (HL),r   (7)  ; r could be A
LDA d,x      LD r,(HL)   (7)
OR  d,x      OR (HL)     (7)
AND d,x      AND (HL)    (7)
SLA d,x      SLA (HL)    (15)  - shift left arithmetic all the shift
                                 instructions have similar cycle times
CMP d,x      CP (HL)     (7)
ADD d,x      ADD A,(HL)  (7)
INX          INC L       (4)
DEX          DEC L       (4)

The displacements 'd' can be done with:  LD A,n; ADD A,L; LD L,A
(14 cycles) creating a ratio of ~4.2:1 for the instructions above.
This is a brute force exact emulation of the 6502 page 0.  This
is your 4:1 number popping up if the z80 acts like a 6502.

On a z80, it would be smarter to arrange variables so that they are
accessed sequentially.  Then the expense of the 'd' is a DEC L/INC L
leading to a ratio of ~2.2:1.  If this is inside the loop, at the end of
the loop to move back to d=0, add a -ve number to HL: 11 cycles.

When using page 0 as a random access table (ie when displacements 'd' are
calculated as in the fast multiply), the ratio is more like
1.4:1 (for the memory access alone; other things like calculating
displacement will cause the result to move toward 2:1 - the validity of
this will be tested in comparing the two versions).

You can also use DE as a second independent page 0 or in the same 256 byte
area to emulate the 6502's other index register since the z80 has a quick
EX DE,HL (4 cycles) instruction.

When does the 6502's page 0 have significant advantage over a z80 (ie
achieve 3:1 or higher cycle ratios)? When there are a lot of variables
inside a loop or when there is a non-sequential and non-random small
table (<256 bytes) lookup.  I don't think this applies to a lot of
problems.

> that for the tasks that people were thinking of in the early 80's, a 1MHz
> 6502 was considered roughly equivalent to a 4MHz Z80.  You do something
> like check which one of four strings is matched first, and the string 
> locations are redirectable, and the type of

(not page 0 related):

>   -	lda (source),y          LD A,L; EXX; LD L,A; LD A,(HL); EXX 23

a=contents of memory location (y+source)?

 
> 	cmp (target1),y         LD H,D; CP (HL)     11
> 	beq +                   JR Z,+              12/7
> 	cmp (target2),y         LD H,E; CP (HL)     11
> 	beq ++                  JR Z,++             12/7
> 	cmp (target3),y         LD H,B; CP (HL)     11
> 	beq +++                 JR Z,+++            12/7
> 	cmp (target4),y         LD H,C; CP (HL)     11
> 	beq ++++                JR Z,++++           12/7
> 	iny                     INC L               4
> 	bne -                   JP NZ,-             10/10

The above requires that source, targetn all start on a 256 byte
boundary and are also subject to the same restrictions as the 
6502: max 256 byte length.  This has 5 sources involved, hence the
the use of the alternate registers to store the 5th source.  Using 
the alternate registers, up to 8 sources can be involved at which time
the cost of the LD A,L; EXX; LD L,A; EXX becomes small.

The compares & failed branch occur in ratios of 18:7? = 2.6:1
neglecting the requirement to switch to the alternate set for the
5th source.  The major component in the 2.6:1 ratio is the time for 
a failed branch - a hint that when there is a lot of decision making, the
6502 has an advantage.

I'm not sure what this is used for, but I would likely have combined
the four strings into a single deterministic automaton so I would only
need to do one compare rather than 4 simultaneously.  The overhead would
have to be compensated for and that would likely be the case as #
strings>8 (max with alternate register set).  Then the even 256 byte
boundary restriction would also be lost.

> 	OTOH, the reason that it is still being used today is because of
> its efficiency. <4,000 gates leaves a hell of a lot of real estate on a
> chip mask: plenty for ample ROM, a good chunk of RAM for some zero page
> and stack page locations, a couple three special support circuits on the
> level of a VIA.

Efficiency?  Low gate count you mean.  The z80 is around because it is the
favoured discrete choice for controllers.

> 	The first portable 8-bit microcomputer disk operating system had
> nothing to do with it?

:-) Maybe a little.  But the 8080 bandwagon was already rolling before 
CP/M arrived.


Alvin




Article 70019 of comp.sys.cbm:
Path: news.acns.nwu.edu!merle!judd
From: judd@merle.acns.nwu.edu (Stephen Judd)
Newsgroups: comp.sys.sinclair,comp.sys.cbm,comp.emulators.cbm
Subject: Re: Spectrum Emulator for C64 [additonal translation]
Date: 21 Jun 1997 18:12:59 GMT
Organization: Northwestern University, Evanston, IL
Lines: 100
Message-ID: <5oh5jb$62i@news.acns.nwu.edu>
References: <337C5E94.388@actcom.co.il> <5nunbg$t3e@ds2.acs.ucalgary.ca> <5o4iuv$a5s@news.acns.nwu.edu> <5o73r9$mrs@ds2.acs.ucalgary.ca>
Reply-To: sjudd@nwu.edu (Stephen Judd)
NNTP-Posting-Host: merle.acns.nwu.edu
Xref: news.acns.nwu.edu comp.sys.sinclair:40272 comp.sys.cbm:70019 comp.emulators.cbm:21977

In article <5o73r9$mrs@ds2.acs.ucalgary.ca>,
Alvin R. Albrecht  wrote:
>On 16 Jun 1997, Stephen Judd wrote:
>
>> Let me put it this way: do you at least understand that OTHER people
>> might find a reduction of 270,000 cycles down to 6-8 cycles as an advantage?
>
>Oh yes, don't get me wrong here.  All the C64 hardware features will
>take much, much longer when done in software on another machine.  But
>there seems to be a block in your thinking: I don't *need* it done that 
>fast.  At a 256x192 resolution, anything done faster than
>~10 fps isn't really required to fool the eye into thinking everything is
>done smoothly (as it's done for real on a C64 with its 50/60 fps).  What

First and foremost, you have just said, "Who cares if the C64 is five 
times faster, you don't really need that speed."  I accept the
acknowledgement.

As to why more speed/free cycles is/are useful, I take it as a wholly 
obvious concept which needs no commentary, unless you think that an 
18 MHz Spectrum is no more useful than a 3.5 MHz one.

More to the point, though, you and several others have made a number of
assertions concercing the relative merits of the Z80 and 6510, as
well as the Spectrum and C64.  There are certainly valid questions
underlying them, so since you've brought them up we ought to
investigate them in more detail.  They can be summarized as:

	1. A Z80 is much faster/more powerful than a 6510
	2. Programs running on the Spectrum's 3.54 MHz Z80 are always
	   much faster than the C64's 1MHz 6510.
	3. Because of this enormous speed advantage, the Spectrum can
	   more than make up for the C64's graphics hardware.
	4. Because of this enormous speed advantage, things like
	   3D graphics are much faster on the Spectrum.

The support for these statements has, to date, been entirely underwhelming.
I think you would do better to code up a fast multiply and a string
print routine and go from there.

>were the frame rates in old movies? 12fps?

18 fps.

Once again, we may meditate in silence upon the wisdom and insight of some
more statements:

>Do you honestly doubt us when we say that there are thousands of arcade
>games written for the Spectrum with frame rates at minimum ~10fps that
>look good?!

>But I fail
>to see how you can continue to suggest that software sprites aren't
>capable of doing the same things as the C64's hardware sprites given
>the wealth of evidence in the software archives.

>Another point I'd like you to consider is what is all that hardware useful
>for?

>If I only need
>a 10fps rate, what's the use in having an 8 cycle version versus a 
>270000 cycle version?

>> This is not only a useless number, it is a misleading one.  For instance,
>> what would be the frame rate if the rest of the calculations took 12 frames
>> to complete?  How about if they could be done at 4fps?
>
>I don't think so, though.  You can tell me:  I understood that all the
>text character bitmaps were predrawn before the animation (and if
>not, why not?).  Then the 10-12 fps is quite certain.  If not, how
>many 6502 cycles to do this?

>> I detect a major lack of comprehension.  You can set VIC to trigger an
>> interrupt on a particular scan line.  I won't belittle your intelligence
>> by pedantically explaining the significance of that statement.
>
>I know very well what all those features are and what they can do: 

If you know what an interrupt is, why did you think the 64 was busy-waiting?

>Let's not let this discussion, which had become quite pleasant and
>enlightening, degrade into name calling and questioning of "intelligence".

As enormously frustrating and annoying as you can be, the statement
was a sincere one, and no offense was intended.

>anything I say is open for constructive criticism and correction of
>errors.

Now why should I sit around correcting your errors when with a small
investment of time and thought you could have avoided them in the
first place?

>Spectrum, using a disputable 1.5:1 cycle ratio for the task, the
>3.54MHz z80 will be 2.36 times faster than the 1MHz 6502.  That's
>the difference between an acceptable 10fps and an unacceptable 4.2fps.

Talk is cheap, bub: a tool wielded by geniuses and fools alike.

-S


Article 70020 of comp.sys.cbm:
Path: news.acns.nwu.edu!merle!judd
From: judd@merle.acns.nwu.edu (Stephen Judd)
Newsgroups: comp.sys.sinclair,comp.sys.cbm,comp.emulators.cbm
Subject: Re: Shootout at the 0K Corral (was various other things)
Date: 21 Jun 1997 18:18:51 GMT
Organization: Northwestern University, Evanston, IL
Lines: 14
Message-ID: <5oh5ub$676@news.acns.nwu.edu>
References: <337C5E94.388@actcom.co.il> <11279.imc@comlab.ox.ac.uk> <5o7js3$fuq@news.acns.nwu.edu> <11302.imc@comlab.ox.ac.uk>
Reply-To: sjudd@nwu.edu (Stephen Judd)
NNTP-Posting-Host: merle.acns.nwu.edu
Xref: news.acns.nwu.edu comp.sys.sinclair:40273 comp.sys.cbm:70020 comp.emulators.cbm:21978

In article <11302.imc@comlab.ox.ac.uk>, Ian Collier  wrote:
>In article <5o7js3$fuq@news.acns.nwu.edu>, sjudd@nwu.edu (Stephen Judd) wrote:
>>By the way, I didn't understand before that the Spectrum's horizontal
>>resolution was only 256 characters,
>
>Or pixels, as we call them. :-)

Actually I just didn't mention that these characters were one pixel
wide :).

Heh heh, luckily my code is a little better than my grammatical skills
late at night (I hope!).

-S


Article 70022 of comp.sys.cbm:
Path: news.acns.nwu.edu!merle!judd
From: judd@merle.acns.nwu.edu (Stephen Judd)
Newsgroups: comp.sys.sinclair,comp.sys.cbm,comp.emulators.cbm
Subject: Re: Spectrum Emulator for C64
Date: 21 Jun 1997 18:39:07 GMT
Organization: Northwestern University, Evanston, IL
Lines: 33
Message-ID: <5oh74b$6k5@news.acns.nwu.edu>
References: <33845f94.1768387@commodore64.com> <5o75oi$14hc@ds2.acs.ucalgary.ca>  <5od1oj$bgg@ds2.acs.ucalgary.ca>
Reply-To: sjudd@nwu.edu (Stephen Judd)
NNTP-Posting-Host: merle.acns.nwu.edu
Xref: news.acns.nwu.edu comp.sys.sinclair:40275 comp.sys.cbm:70022 comp.emulators.cbm:21981

In article <5od1oj$bgg@ds2.acs.ucalgary.ca>,
Alvin R. Albrecht  wrote:
>
>But how many subroutines that you write need all 256 registers?

Sigh...

>There is 
>a point of diminishing returns. In my experience, for

Ah, an interesting point: what does your experience in these matters
consist of?  That is, what kinds of programs have you written, and
how large/involved were they?  Z80 assembly programs of course.

>This
>is your 4:1 number popping up if the z80 acts like a 6502.

You just don't seem to get it: Bruce is saying, "Based on the
general experience from a wide variety of people who spent a lot
of time programming both architechtures, the general rule was 4:1".

This is very different from your "Based on a few glaces at a spec
sheet and these simple examples and thought problems which exactly
prove my point, it's clear that the Z80 ought to be <2:1".
(How very Aristotelian of you!)

It is also, of course, different from my "Based on a few glances
at a spec sheet and my experience in writing substantial programs
and algorithms, I expect between 3:1 and 4:1, so lets write some
code to see what's what".

-S



Article 70021 of comp.sys.cbm:
Path: news.acns.nwu.edu!merle!judd
From: judd@merle.acns.nwu.edu (Stephen Judd)
Newsgroups: comp.sys.sinclair,comp.sys.cbm,comp.emulators.cbm
Subject: Re: Spectrum Emulator for C64
Date: 21 Jun 1997 18:25:26 GMT
Organization: Northwestern University, Evanston, IL
Lines: 26
Message-ID: <5oh6am$6dd@news.acns.nwu.edu>
References: <33845f94.1768387@commodore64.com> <5o1c2n$12lc@ds2.acs.ucalgary.ca> <5o4k56$aht@news.acns.nwu.edu> <5o75oi$14hc@ds2.acs.ucalgary.ca>
Reply-To: sjudd@nwu.edu (Stephen Judd)
NNTP-Posting-Host: merle.acns.nwu.edu
Xref: news.acns.nwu.edu comp.sys.sinclair:40274 comp.sys.cbm:70021 comp.emulators.cbm:21980

In article <5o75oi$14hc@ds2.acs.ucalgary.ca>,
Alvin R. Albrecht  wrote:
>On 17 Jun 1997, Stephen Judd wrote:
>
>Stephen, your doubt that software sprites exist is very questionable

The concept that they can and do exist is obvious, and not the
object of contention.

>> The claims about the merits of the Z80 over the 6510 and the Spectrum over
>> the C64 are so repetitious and insistent that I would think you'd jump at 
>> the chance to conclusively demonstrate their true merit, in a form plain
>> for people all over the world to see.
>
>It is being shown here slowly.  So far, everything is being done at <2:1

Oh dear...

Two decontextualized toy problems, one of which happens to have an
instruction specifically designed for it, constitutes an "everything"?
You ought to go through the third problem, which is the first one
to come close to being a genuine application, and count some few cycles.

But first you should write the fast multiply and string print routine.

-S


Article 69845 of comp.sys.cbm:
Path: news.acns.nwu.edu!newsfeed.acns.nwu.edu!news.ece.nwu.edu!news.cse.psu.edu!uwm.edu!vixen.cso.uiuc.edu!ais.net!europa.clark.net!feed1.news.erols.com!news.ecn.uoknor.edu!munnari.OZ.AU!metro!metro!seagoon.newcastle.edu.au!cc.newcastle.edu.au!ecbm
From: "Bruce R. McFarling" 
Newsgroups: comp.sys.cbm,comp.sys.sinclair,comp.emulators.cbm
Subject: Re: That Z80 line routine
Date: Thu, 19 Jun 1997 16:33:33 +1000
Organization: The University of Newcastle
Lines: 35
Message-ID: 
References: <5o00p9$gi9@news.acns.nwu.edu> <5o2e3j$f5p@news.acns.nwu.edu> <11304.imc@comlab.ox.ac.uk> <5o9uu7$d8f@news.acns.nwu.edu>
NNTP-Posting-Host: cc.newcastle.edu.au
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
In-Reply-To: <5o9uu7$d8f@news.acns.nwu.edu>
Xref: news.acns.nwu.edu comp.sys.cbm:69845 comp.sys.sinclair:40147 comp.emulators.cbm:21888

On 19 Jun 1997, Stephen Judd wrote:
 
> The 6510 signals subtraction underflow by clearing the carry flag, and
> I assume the Z80 does something similar, so the CP H gets taken care of
> automatically by the subtraction.  And since you're subtracting, you never
> have problems with overflow, so the JR C,Diag isn't necessary either.

	Is this right? It's been a decade, but I would have sworn that the
Z80 used a straight carry for borrow flag, rather than the 6502's inverted
carry.  One of those things you can do when you let the processor tick
over faster than your external memory access.

	NEGATE is (XOR #$ff + 1)

so if you add without carry:

	XOR #$FF is 1 less (same as a borrows)

and if you add with a carry

	XOR #$FF + 1 carry

is the negative of the original.  So to subtract in 2's complement, you 
can just invert all of the appropriate signals, use the add with carry
circuit, and invert the carry for a borrow.  Hence, inverted carry as
borrow for subtract.

	But, AFAIR, the Z80 doesn't do this: it does it the long way
around (hence the high gate count)

Virtually,

Bruce R. McFarling, Newcastle, NSW
ecbm@cc.newcastle.edu.au



Article 70023 of comp.sys.cbm:
Path: news.acns.nwu.edu!merle!judd
From: judd@merle.acns.nwu.edu (Stephen Judd)
Newsgroups: comp.sys.cbm,comp.sys.sinclair,comp.emulators.cbm
Subject: Re: That Z80 line routine
Date: 21 Jun 1997 18:43:09 GMT
Organization: Northwestern University, Evanston, IL
Lines: 16
Message-ID: <5oh7bt$6kn@news.acns.nwu.edu>
References: <5o00p9$gi9@news.acns.nwu.edu> <11304.imc@comlab.ox.ac.uk> <5o9uu7$d8f@news.acns.nwu.edu> 
Reply-To: sjudd@nwu.edu (Stephen Judd)
NNTP-Posting-Host: merle.acns.nwu.edu
Xref: news.acns.nwu.edu comp.sys.cbm:70023 comp.sys.sinclair:40276 comp.emulators.cbm:21982

In article ,
Bruce R. McFarling  wrote:
>On 19 Jun 1997, Stephen Judd wrote:
> 
>> The 6510 signals subtraction underflow by clearing the carry flag, and
>> I assume the Z80 does something similar, so the CP H gets taken care of
>> automatically by the subtraction.  And since you're subtracting, you never
>> have problems with overflow, so the JR C,Diag isn't necessary either.
>
>	Is this right?

Since it was a question based on my deep and profound general ignorance
on most matters, it is probably not, indeed is not, right. :)

>Bruce R. McFarling, Newcastle, NSW
>ecbm@cc.newcastle.edu.au
Source: geocities.com/tutorman_2000/z80

( geocities.com/tutorman_2000)