![Page 1: Design Exploration of a Human-machine Interface (HMI) Application](https://reader033.vdocument.in/reader033/viewer/2022051021/5681682a550346895dddbda9/html5/thumbnails/1.jpg)
Design Exploration of a Human-machine Interface (HMI)
ApplicationFrancis Li
Sam Madden
![Page 2: Design Exploration of a Human-machine Interface (HMI) Application](https://reader033.vdocument.in/reader033/viewer/2022051021/5681682a550346895dddbda9/html5/thumbnails/2.jpg)
The Application
• Data glove interface– Wired, bulky
• SmartDust scenario– A mote on each fingertip
• Investigate implementations• Explore design alternatives
![Page 3: Design Exploration of a Human-machine Interface (HMI) Application](https://reader033.vdocument.in/reader033/viewer/2022051021/5681682a550346895dddbda9/html5/thumbnails/3.jpg)
Proof-of-Concept Prototype
• By SmartDust group– Atmel AVR Microprocessor– RFM TR1000 Radio– 6 accelerometers– Host PC performs processing
• Analysis– Power: 45 mW measured– Continuous operation of processor,
accelerometers, communication with host
![Page 4: Design Exploration of a Human-machine Interface (HMI) Application](https://reader033.vdocument.in/reader033/viewer/2022051021/5681682a550346895dddbda9/html5/thumbnails/4.jpg)
Application Analysis
• Processing (on PC)– Do 20 times per second, for each accelerometer
• Read in X and Y samples (10 bits each)• Compute rolling average to smooth input data• Convert averages to polar coordinates
– Dominates cost: sqrt, acos, atan– Secondary cost: floating point operations
– Periodically, calculate gesture via simple template matching (static hand positions)
![Page 5: Design Exploration of a Human-machine Interface (HMI) Application](https://reader033.vdocument.in/reader033/viewer/2022051021/5681682a550346895dddbda9/html5/thumbnails/5.jpg)
Application Analysis (cont)
• Communication (from Atmel to PC)– 20 samples / sec • 6 accelerometers • 4
bytes/sample 480 bytes/sec– 115.6 kb/sec RF link– Radio = 12mA @ 3V, when transmitting
1.2 mW for radio alone• Real world power >> 1.2 mW, due to
software and analog overhead( real world analysis later )
![Page 6: Design Exploration of a Human-machine Interface (HMI) Application](https://reader033.vdocument.in/reader033/viewer/2022051021/5681682a550346895dddbda9/html5/thumbnails/6.jpg)
Optimization Process
• Match Application to HW
![Page 7: Design Exploration of a Human-machine Interface (HMI) Application](https://reader033.vdocument.in/reader033/viewer/2022051021/5681682a550346895dddbda9/html5/thumbnails/7.jpg)
Optimization Process
• Match Application to HW
• Match Hardware to Application
![Page 8: Design Exploration of a Human-machine Interface (HMI) Application](https://reader033.vdocument.in/reader033/viewer/2022051021/5681682a550346895dddbda9/html5/thumbnails/8.jpg)
Optimization Process
• Match Application to HW– Local computation to reduce communication
• Match Hardware to Application
![Page 9: Design Exploration of a Human-machine Interface (HMI) Application](https://reader033.vdocument.in/reader033/viewer/2022051021/5681682a550346895dddbda9/html5/thumbnails/9.jpg)
Optimization Process
• Match Application to HW– Local computation to reduce communication– Floating point Fixed Point
• Match Hardware to Application
![Page 10: Design Exploration of a Human-machine Interface (HMI) Application](https://reader033.vdocument.in/reader033/viewer/2022051021/5681682a550346895dddbda9/html5/thumbnails/10.jpg)
Optimization Process
• Match Application to HW– Local computation to reduce communication– Floating point Fixed Point
• Match Hardware to Application– Distributed vs. Centralized
![Page 11: Design Exploration of a Human-machine Interface (HMI) Application](https://reader033.vdocument.in/reader033/viewer/2022051021/5681682a550346895dddbda9/html5/thumbnails/11.jpg)
Optimization Process
• Match Application to HW– Local computation to reduce communication– Floating point Fixed Point
• Match Hardware to Application– Distributed vs. Centralized– TI vs. Atmel
![Page 12: Design Exploration of a Human-machine Interface (HMI) Application](https://reader033.vdocument.in/reader033/viewer/2022051021/5681682a550346895dddbda9/html5/thumbnails/12.jpg)
Optimization Process
• Match Application to HW– Local computation to reduce communication– Floating point Fixed Point
• Match Hardware to Application– Distributed vs. Centralized– TI vs. Atmel– DSP
![Page 13: Design Exploration of a Human-machine Interface (HMI) Application](https://reader033.vdocument.in/reader033/viewer/2022051021/5681682a550346895dddbda9/html5/thumbnails/13.jpg)
Optimization Process
• Match Application to HW– Local computation to reduce communication– Floating point Fixed Point
• Match Hardware to Application– Distributed vs. Centralized– TI vs. Atmel– DSP
![Page 14: Design Exploration of a Human-machine Interface (HMI) Application](https://reader033.vdocument.in/reader033/viewer/2022051021/5681682a550346895dddbda9/html5/thumbnails/14.jpg)
Communication vs.Computation
• Estimates of local processing cost on Atmel (via simulation of GCC program)
• Average: 2223 instr. x 2• CalcPolar: 19017 instr.
2.83x106 instructions• Report gesture once per second
FindGestureError: 5444 instr.10 gestures, 6 accelerometers 5444 • 60 3.26x105 instr.
• Memory operations are 2 cyles/instruction• Total cycles ~ 3.7M 4Mhz 13.5 mW• Communication = 8 bits/sec negligible cost
Loop 6•20 / sec
![Page 15: Design Exploration of a Human-machine Interface (HMI) Application](https://reader033.vdocument.in/reader033/viewer/2022051021/5681682a550346895dddbda9/html5/thumbnails/15.jpg)
Communication vs.Computation 2
• Cost of communication to Host PC (measured)
• 4317 nJ/bit• From Culler, Hill, Szewczyk, Woo, “System
Architecture For Networked Sensors.” 4317nJ/bit • 480 bytes/sec • 8 = 16.57 mW
• Processor still sucks power– Current implementation requires 13.5mW– Using sleep, only 1.17 mW 17.74 mW total
![Page 16: Design Exploration of a Human-machine Interface (HMI) Application](https://reader033.vdocument.in/reader033/viewer/2022051021/5681682a550346895dddbda9/html5/thumbnails/16.jpg)
Optimization Process
• Match Application to HW– Local computation to reduce communication– Floating point Fixed Point
• Match Hardware to Application– Distributed vs. Centralized– TI vs. Atmel– DSP
![Page 17: Design Exploration of a Human-machine Interface (HMI) Application](https://reader033.vdocument.in/reader033/viewer/2022051021/5681682a550346895dddbda9/html5/thumbnails/17.jpg)
Distributed vs. Centralized
• Move some processing to each sensor– 6 processors
• Each computing average, polar transform• Transmitting 4 x 8 = 32bits once/second
• Using Atmel processor on each mote– Computation
• ~ .5M cycles/sec 2mA @ 2.7V 5.4mW– Communication
• Very small: 4317nJ • 32 = .13 mW– 5.53 mW/mote = 33.2 mW total (Bad Idea!)
![Page 18: Design Exploration of a Human-machine Interface (HMI) Application](https://reader033.vdocument.in/reader033/viewer/2022051021/5681682a550346895dddbda9/html5/thumbnails/18.jpg)
Optimization Process
• Match Application to HW– Local computation to reduce communication– Floating point Fixed Point
• Match Hardware to Application– Distributed vs. Centralized– TI vs. Atmel– DSP
![Page 19: Design Exploration of a Human-machine Interface (HMI) Application](https://reader033.vdocument.in/reader033/viewer/2022051021/5681682a550346895dddbda9/html5/thumbnails/19.jpg)
TI Microcontroller Evaluation
• A microcontroller with better specs– MSP430P112 330 A/Mhz active mode
1.5 A standby (6 ns wakeup)• Used IAR Systems compiler, profiler,
development environment• Analysis
– Centralized 3.3V, 4 Mhz: 3.8 mW– Distributed 2.5V, 1 Mhz: 0.48 mW per mote
• Six processors 2.9 mW
![Page 20: Design Exploration of a Human-machine Interface (HMI) Application](https://reader033.vdocument.in/reader033/viewer/2022051021/5681682a550346895dddbda9/html5/thumbnails/20.jpg)
Optimization Process
• Match Application to HW– Local computation to reduce communication– Floating point Fixed Point
• Match Hardware to Application– Distributed vs. Centralized– TI vs. Atmel– DSP
![Page 21: Design Exploration of a Human-machine Interface (HMI) Application](https://reader033.vdocument.in/reader033/viewer/2022051021/5681682a550346895dddbda9/html5/thumbnails/21.jpg)
TI DSP Evaluation• TMS320C54x• Used TI Code Composer Studio, compiler,
simulator• Power
– Active Mode, 3.3V 10 Mhz: 33 mW– IDLE1, 0.36 mW
• Analysis– Centralized: 7.8 mW– Distributed: 1.6 mW per mote
• Six processors = 9.6 mW total
![Page 22: Design Exploration of a Human-machine Interface (HMI) Application](https://reader033.vdocument.in/reader033/viewer/2022051021/5681682a550346895dddbda9/html5/thumbnails/22.jpg)
TI DSP Evaluation Part 2
• TMS320C55x (two parallel MACs)• Same tools, with C55x compiler, simulator• Power: No details available...
– Advertised: 0.9V, 0.05 mW/Mhz• Analysis
– Centralized: 1170240 cycles (vs 2290440 54x)• 2 Mhz: 0.1 mW
– Distributed: 195040 cycles (vs 381740 54x)• 1 Mhz: 0.05 mW• Six processors: 0.3 mW total
![Page 23: Design Exploration of a Human-machine Interface (HMI) Application](https://reader033.vdocument.in/reader033/viewer/2022051021/5681682a550346895dddbda9/html5/thumbnails/23.jpg)
Other Explorations
• Hand optimized code– Possible to massively reduce computation cost– FP/Transcendentals conspicuously painful– Outside scope of our exploration
• Radio Hardware– Bluetooth ~ 100 times more efficient
• Reconfigurable Computing• Other circuitry (e.g. accelerometers)
![Page 24: Design Exploration of a Human-machine Interface (HMI) Application](https://reader033.vdocument.in/reader033/viewer/2022051021/5681682a550346895dddbda9/html5/thumbnails/24.jpg)
Results Summary• Cost, in mW of various implementations
17.74 using sleep mode, 28 without
• 31/104 % improvement with same hardware• 170x improvement with new hardware PC Centralized Distributed
Atmel 17.74/28 13.5 33.2TI - 3.8 2.9DSP 1 - 7.8 9.6DSP 2 - 0.1 0.3
![Page 25: Design Exploration of a Human-machine Interface (HMI) Application](https://reader033.vdocument.in/reader033/viewer/2022051021/5681682a550346895dddbda9/html5/thumbnails/25.jpg)
Conclusions
• By finding better mappings from SW HW Application, big performance gains are possible.
• Effective use of local processor resources can reduce communication overheads, which are significant.
• DSPs and other specialized processors can be a big win and don’t require hand-coded assembly or reconfigurable design