why assembly?

7
Speed Not affected by compiler optimization

Upload: carolyn-mcmahon

Post on 31-Dec-2015

18 views

Category:

Documents


1 download

DESCRIPTION

Why Assembly?. Speed Not affected by compiler optimization. Registers that can be used without saving. r0 r18-r25 r25-r27 (X) r30-r31 (Z) r1 (must be cleared before returning). Assembler function arguments. Arguments allocated left to right (r25 to r18) Even register aligned. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Why Assembly?

Speed

Not affected by compiler optimization

Page 2: Why Assembly?

r0 r18-r25 r25-r27 (X) r30-r31 (Z) r1 (must be cleared before returning)

Page 3: Why Assembly?

Arguments allocated left to right (r25 to

r18) Even register aligned

Page 4: Why Assembly?

Argument Registers

8-bit r24

16-bit r25:r24

32-bit r25:r24:r23:r22

64-bit r25:r24:r23:r22:r21:r20:r19:r18

Page 5: Why Assembly?

Data length registers

8-bit r24

16-bit r25:r24

32-bit r25:r24:r23:r22

64-bit r25-r18

Return size Registers

8-bit r24

16-bit r25:r24

32-bit r25:r24:r23:r22

64-bit r25:r24:r23:r22:r21:r20:r19:r18

Page 6: Why Assembly?

#include <avr/io.h>

.text .global subit

subit: sub r22, r20 ; subtract b (r20) from ul (r25-r22) sbc r23, r1 ; .. NOTE: gcc makes sure r1 is always 0 sbc r24, r1 ; .. sbc r25, r1 ; .. ret .end

uint32_t subit(uint32_t ul, uint8_t b){

return(ul-b);}

Page 7: Why Assembly?

#include <avr/io.h>

; defines the # of cpu cycles of overhead; (includes the ldi r16,byte0; ldi r17,byte1; ldi r18, byte2, ; ldi r19, byte3, and the call _delay_cycles)OVERHEAD = 24

; some register aliasescycles0 = 22cycles1 = 23cycles2 = 24cycles3 = 25temp = 19

.text .global delay_cycles

delay_cycles:;; subtract the overhead subi cycles0,OVERHEAD ; subtract the overhead sbc cycles1,r1 ; .. sbc cycles2,r1 ; .. sbc cycles3,r1 ; .. brcs dcx ; return if req’d delay too short

;; delay the lsb mov r30,cycles0 ; Z = jtable offset to delay 0-7 cycles com r30 ; .. andi r30,7 ; .. clr r31 ; .. subi r30,lo8 (-(gs(jtable))) ; add the table offset sbci r31,hi8 (-(gs(jtable))) ; .. ijmp ; vector into table for partial delayjtable: nop nop nop nop nop nop nop;; delay the remaining delayloop: subi cycles0,8 ; decrement the count (8 cycles per loop) sbc cycles1,r1 ; .. sbc cycles2,r1 ; .. sbc cycles3,r1 ; .. brcs dcx ; exit if done nop ; .. add delay to make 8 cycles per loop rjmp loop ; ..dcx: ret .end

void delay_cycles(uint32_t cpucycles);