w e l c o m e [symbolicscience.com]symbolicscience.com/wctutorials/part1.pdf• cell splitting –...
TRANSCRIPT
8/7/2007 ASM 2007 Tutorial Part 1 1
W E L C O M EThe 16th IASTED International Conference on
Applied Simulation and Modeling ~ ASM 2007
The Tutorialon
Modeling Cognitive Radio to Improve the Performance of Communications
byY. B. Reddy
Grambling State University, USA
8/7/2007 ASM 2007 Tutorial Part 1 2
ContentsPart 1:
– Fundamentals of Wireless Communications– Cognitive Radios– Reinforcement Learning– Genetic Algorithms
Part 2:– Modeling with Reinforcement Learning– Application of Genetic Algorithms
Part 3:– Cognitive radio Performance Analysis
• Conventional Techniques• Game Theory Models• Case Study
Part 4:– Future Research and Discussions
8/7/2007 ASM 2007 Tutorial Part 1 3
Part 1 ContentsFundamentals of Wireless Communications
– Brief History of Communications– Basics of Communications– Concepts-Data Communication– Concepts-Cellular Networks
Cognitive Radios– History and Definition– Main Functions
Reinforcement Learning– History and Definition– Elements of RL– Markov Decision Process– Q-Learning– Channel Allocation Example
Genetic Algorithms– History and definition– Operators– Structure of GS– Wireless Communication Example
8/7/2007 ASM 2007 Tutorial Part 1 4
Fundamentals of Wireless CommunicationsBrief History of Communications
1880 – 1929 Visionary Period1930 – 1959 Golden Age1960 – 1989 Wired, Zapped, and Beamed1990 – current Digitally Networked
1835 Samuel Morse invents Morse code1843 First long distance electric telegraph line
1894 Guglielmo Marconi improves wireless telegraphy1914 First cross continental telephone call made
1934 Joseph Begun invents the first tape recorder for broadcasting
1938 Broadcasts able to be taped and edited-rather than only live1939 Television broadcasts begin
1979 First cellular phone communication network started in Japan
1994 American Gov. releases control of internet and www is born
8/7/2007 ASM 2007 Tutorial Part 1 5
Fundamentals of Wireless CommunicationsBrief History of Communications
Starting– 1895: invention of the radio by Marconi
– 1901: trans-atlantic communication- 1928: Nyquist - converted the continuous-time problem to a discrete-time
problem. He showed we can communicate infinite number of bits in one continuous-valued symbol
– 1948: Claude Shannon- thought of both information sources and channels as random and used probability models for them
Source � encoder decoder � destination• showed the universality of a digital interface between the source and the
channel.
• Every source has an entropy rate H bits per second.
• Every channel has a capacity C bits per second• Reliable communication is possible if and only if H < C
8/7/2007 ASM 2007 Tutorial Part 1 6
Fundamentals of Wireless CommunicationsBasics of Communications
bit: binary digit. The smallest unit of data (0 or 1)
bit rate: The number of bits transmitted per second
Signal element: The shortest section of a signal (time-wise) that represents a data element
signal rate: The number of signal elements sent in one second
baud rate: The number of signal elements transmitted per second. A signal element consists of one or more bits.
band Width: A range within a band of frequencies or wavelengths
The range of frequencies within which the performance of the antenna, with respect to some characteristics, conforms to a specified standard. (2.4-2.5GHz antenna has 100MHz bandwidth).
A measure of the capacity of a communications channel. The higher a channel's bandwidth, the more information it can carry.
8/7/2007 ASM 2007 Tutorial Part 1 7
Fundamentals of Wireless CommunicationsBasics of Communications
Analog signal: A continuous waveform that changes smoothly over time
Digital signal: A discrete signal with a limited number of values
8/7/2007 ASM 2007 Tutorial Part 1 8
Fundamentals of Wireless CommunicationsBasics of Communications
Phase: The relative position of a signal in time
Amplitude: The strength of a signal, usually measured in volts
Frequency: the number of cycles per second of a periodic signal
Two signals with the same phase and frequency, but different amplitudes
Frequency (f) and Period (T) are
the inverse of each other
8/7/2007 ASM 2007 Tutorial Part 1 9
Fundamentals of Wireless CommunicationsBasics of Communications
Period: the amount of time required to complete one full cycle
Two signals with the same amplitudeand Phase but different frequencies
Three sine waves with the
same amplitude and frequency,but different phases
8/7/2007 ASM 2007 Tutorial Part 1 10
Fundamentals of Wireless CommunicationsBasics of Communications
Modulation : The process by which some characteristic of a higher frequencywave is varied in accordance with the amplitude of a lower frequency wave
Periodic signal : A signal that exhibits a repeating patternnonperiodic signal : used for digital signals (non-repitative)
Digital signature: A method to authenticate the sender of a message
Attenuation – the loss of signal’s energy due to the resistance of the medium
Distortion – Any change in the signal due to noise, attenuation, or other interferences
Noise – Random electrical signals that can be picked by the transmission medium and cause degradation or distortion of the data
8/7/2007 ASM 2007 Tutorial Part 1 11
Fundamentals of Wireless CommunicationsBasics of Communications
General Information:– Most communication systems are analog
– Analog signals of bandwidth W can be represented by 2W samples/s– Channels of bandwidth W support transmission of 2W symbols/s
– Engineering designs are ad-hoc, tailored for each specific application.
Current Systems– Current communication infrastructure is going fully digital
– Most modern communication systems are designed according to the principles laid down by Shannon
– Current communications have communication architecture with coding and signal processing algorithms
Questions:Is there a general methodology for designing communication systems?
Is there a limit to how fast one can communicate?
8/7/2007 ASM 2007 Tutorial Part 1 12
Fundamentals of Wireless CommunicationsConcepts – Data Communications
If the composite signal is periodic, the decomposition gives a series of signals with discrete frequencies;
if the composite signal is nonperiodic, the decomposition gives a combination of sine waves with continuous frequencies
composite periodic signal
8/7/2007 ASM 2007 Tutorial Part 1 13
Fundamentals of Wireless CommunicationsConcepts – Data Communications
Decomposition of a composite periodic signal in the time and frequency domains
8/7/2007 ASM 2007 Tutorial Part 1 14
Fundamentals of Wireless CommunicationsConcepts – Data Communications
Units of period and frequency
8/7/2007 ASM 2007 Tutorial Part 1 15
Fundamentals of Wireless CommunicationsConcepts – Data Communications
Example: The power we use at home has a frequency of 60 Hz.The period of this sine wave can be determined as follows:
Express a period of 100 ms in microsecondsFrom Table (units of period and frequency) we find the equivalents of 1 ms (1 ms is 10−3 s) and 1 s (1 s is 106 µs). We make the following substitutions:
Frequency is the rate of change with respect to time. Change in a short span of time - means high frequency.Change over a long span of time - means low frequency
• If a signal does not change at all, its frequency is zero.• If a signal changes instantaneously, its frequency is infinite.
8/7/2007 ASM 2007 Tutorial Part 1 16
Fundamentals of Wireless CommunicationsConcepts – Data Communications
The bandwidth of a composite signal is the difference between thehighest and the lowest frequencies contained in that signal
8/7/2007 ASM 2007 Tutorial Part 1 17
Fundamentals of Wireless CommunicationsConcepts – Data Communications
Example:• If a periodic signal is decomposed into five sine waves with frequencies of 100, 300,
500, 700, and 900 Hz, what is its bandwidth? Draw the spectrum, assuming all components have a maximum amplitude of 10 V.
Solution
Let fh be the highest frequency, fl the lowest frequency, and B the bandwidth. Then
The spectrum has only five spikes, at 100, 300, 500, 700, and 900 Hz
8/7/2007 ASM 2007 Tutorial Part 1 18
Fundamentals of Wireless CommunicationsConcepts – Data Communications
Representation of information through digital signals:1 can be encoded as a positive voltage and a 0 as zero voltage. A digital signal can have more than two levels. In this case, we can send more than 1 bit for each level.
8/7/2007 ASM 2007 Tutorial Part 1 19
Fundamentals of Wireless CommunicationsConcepts – Data Communications
A digital signal has eight levels. How many bits are needed per level?We calculate the number of bits from the formula
Each signal level is represented by 3 bits.
A digital signal has nine levels. How many bits are needed per level? We calculate the number of bits by using the formula. Each signal level is represented by 3.17 bits. However, this answer is not realistic. The number of bits sent per level needs to be an integer as well as a power of 2.
For this example, 4 bits can represent one level.A digital signal is a composite analog signal with an infinite bandwidthIn networking, we use the term bandwidth in two contexts .1. bandwidth in hertz, refers to the range of frequencies in a composite signal or the range
of frequencies that a channel can pass.
2. bandwidth in bits per second, refers to the speed of bit transmission in a channel or link
8/7/2007 ASM 2007 Tutorial Part 1 20
Fundamentals of Wireless CommunicationsConcepts – Data Communications
Signals travel through transmission media, which are not perfectSignals travel through transmission media, which are not perfect. The . The imperfection causes signal impairment. This means that the signaimperfection causes signal impairment. This means that the signal at the l at the beginning of the medium is not the same as the signal at the endbeginning of the medium is not the same as the signal at the end of the of the medium. What is sent is not what is received. Three causes of immedium. What is sent is not what is received. Three causes of impairment pairment are are attenuation, distortion, attenuation, distortion, andand noisenoise
8/7/2007 ASM 2007 Tutorial Part 1 21
Fundamentals of Wireless CommunicationsConcepts – Data Communications
Decibel (dB): A measure of the relative strength of two signal points.Suppose a signal travels through a transmission medium and its power (P) is
reduced to one-half. This means that P2 is (1/2)P1. In this case, the attenuation (loss of power) can be calculated as
A loss of 3 dB (–3 dB) is equivalent to losing one-half the power.
Consider an extremely noisy channel in which the value of the signal-to-noise ratio is almost zero. In other words, the noise is so strong that the signal is faint. For this channel the capacity C is calculated as
This means that the capacity of this channel is zero regardless of the bandwidth. In other words, we cannot receive any data through this channel.
8/7/2007 ASM 2007 Tutorial Part 1 22
Fundamentals of Wireless CommunicationsConcepts- Cellular Network
Cellular Network Organization• Areas divided into cells
– Each served by its own antenna (s)
– Band of frequencies allocated– Cells set up such that antennas of all neighbors are equidistant (hexagonal
pattern)
• Architecture– PSTN (Public Switched Telephone Network) - which refers to the international
telephone system based on copper wires carrying analog voice data
– MTSO (Mobile Telephone Switching Office) - The central switch that controls the entire operation of a cellular system
– Base Station and AntennaBase Station - The central radio transmitter/receiver that maintains
communications with a mobile radio telephone within a given range
Antenna - device which radiates and/or receives radio signals
8/7/2007 ASM 2007 Tutorial Part 1 23
Fundamentals of Wireless CommunicationsConcepts- Cellular Network
Frequency Reuse• Adjacent cells assigned different frequencies to avoid
interference or crosstalk• Objective is to reuse frequency in nearby cells
– 10 to 50 frequencies assigned to each cell– Transmission power controlled to limit power at that frequency
escaping to adjacent cells– The issue is to determine how many cells must intervene
between two cells using the same frequency
8/7/2007 ASM 2007 Tutorial Part 1 24
Fundamentals of Wireless CommunicationsConcepts- Cellular Network
Approaches to Cope with Increasing Capacity• Adding new channels• Frequency borrowing – frequencies are taken from adjacent cells by
congested cells
• Cell splitting – cells in areas of high usage can be split into smaller cells
• Cell sectoring – cells are divided into a number of wedge-shaped sectors, each with their own set of channels
• Microcells – antennas move to buildings, hills, and lamp posts
8/7/2007 ASM 2007 Tutorial Part 1 25
Fundamentals of Wireless CommunicationsConcepts- Cellular Network
Cellular Systems Terms• Base Station (BS) – includes an antenna, a controller, and a
number of receivers• Mobile telecommunications switching office (MTSO) – connects
calls between mobile units• Two types of channels available between mobile unit and BS
– Control channels – used to exchange information having to do with setting up and maintaining calls
– Traffic channels – carry voice or data connection between users
8/7/2007 ASM 2007 Tutorial Part 1 26
Fundamentals of Wireless CommunicationsConcepts- Cellular Network
Steps in an MTSO Controlled Call between Mobile Use rs• Mobile unit initialization• Mobile-originated call• Paging• Call accepted• Ongoing call• Handoff
Additional Functions in an MTSO Controlled Call• Call blocking• Call termination• Call drop• Calls to/from fixed and remote mobile subscriber
8/7/2007 ASM 2007 Tutorial Part 1 27
Fundamentals of Wireless CommunicationsConcepts- Cellular Network
Mobile Radio Propagation EffectsSignal strength
– Must be strong enough between base station and mobile unit to maintain signal quality at the receiver
– Must not be so strong as to create too much co-channel interference with channels in another cell using the same frequency band
Fading– Signal propagation effects may disrupt the signal and cause errors
Rayleigh fading– Rayleigh fading is caused by multipath reception.
Note: Raleigh fading is caused by multipath reception. The mobile antenna receives a large number, say N, reflected and scattered waves. Because of wave cancellation effects, the instantaneous received power seen by a moving antenna becomes a random variables, dependant on the location of the antenna.
8/7/2007 ASM 2007 Tutorial Part 1 28
Fundamentals of Wireless CommunicationsConcepts- Cellular Network
Handoff Performance Metrics• Cell blocking probability – probability of a new call being blocked
• Call dropping probability – probability that a call is terminated due to a handoff
• Call completion probability – probability that an admitted call is not dropped before it terminates
• Probability of unsuccessful handoff – probability that a handoff is executed while the reception conditions are inadequate
• Handoff blocking probability – probability that a handoff cannot be successfully completed
• Handoff probability – probability that a handoff occurs before call termination
• Rate of handoff – number of handoffs per unit time
• Interruption duration – duration of time during a handoff in which a mobile is not connected to either base station
• Handoff delay – distance the mobile moves from the point at which the handoff should occur to the point at which it does occur
8/7/2007 ASM 2007 Tutorial Part 1 29
Fundamentals of Wireless CommunicationsConcepts- Cellular Network
Handoff Strategies Used to Determine Instant of Handoff• Relative signal strength• Relative signal strength with threshold• Relative signal strength with hysteresis• Relative signal strength with hysteresis and threshold• Prediction techniques
Hysteresis – is a property of systems (usually physical systems) that do not instantly react to the forces applied to them, but react slowly, or do not return completely to their original state.
8/7/2007 ASM 2007 Tutorial Part 1 30
Fundamentals of Wireless CommunicationsConcepts- Cellular Network
Power Control• Design issues making it desirable to include dynamic power control
in a cellular system– Received power must be sufficiently above the background noise for effective
communication– Desirable to minimize power in the transmitted signal from the mobile
• Reduce co-channel interference, alleviate health concerns, save battery power
– In spread spectrum (SS) systems using CDMA, it’s desirable to equalize the received power level from all mobile units at the BS
FDMA: Frequency Division multiple access– An access method technique in which multiple sources use assigned bandwidth in a data communication band
TDMA: Time Division multiple access– A multiple access method in which the bandwidth is just on time-shared channel
CDMA: Code Division Multiple Access– A multiple access method in which one channel carries all transmissions simultaneously
8/7/2007 ASM 2007 Tutorial Part 1 31
Fundamentals of Wireless CommunicationsConcepts- Cellular Network
Types of Power Control• Open-loop power control
– Depends solely on mobile unit– No feedback from BS– Not as accurate as closed-loop, but can react quicker to
fluctuations in signal strength • Closed-loop power control
– Adjusts signal strength in reverse channel based on metric of performance
– BS makes power adjustment decision and communicates to mobile on control channel
8/7/2007 ASM 2007 Tutorial Part 1 32
Fundamentals of Wireless Communications
End of Concepts 1????
8/7/2007 ASM 2007 Tutorial Part 1 33
Cognitive RadiosDefinition and Brief History
• A software-defined radio (SDR) system is a radio communication system which can tune to any frequency band and receive any modulation across a large frequency spectrum by means of programmable hardware which is controlled by software.
• Software Radios: coined the term software radio in 1991 to signal the shift fromdigital radio to multiband multimode software-defined radios where "80%" of the functionality is provided in software, versus the "80%" hardware of the 1990's
• Cognitive Radio is drawn from SDR (first coined by Joseph Mitola)
• Nov 13, 2003 - Cognitive Radio the next step for SDR (Bruce Fate)
• August 12, 2004 – PC World – Intel Shows wireless Transceivers• Sept 20, 2005 - Virginia Tech to Smarten up Cognitive Radio
• Feb 17, 2007 - Scientific American - Cognitive Radio is arriving on the heels of SDR Technology and building on it
What About Cognitive Radio? Cognitive Radio is a hard research topic within the realm of software radio. Since this was my (Joseph Mitola ) doctoral research area and my area of current research focus, instead of providing an overview with pointers as for SDR: see dissertation of Joseph Mitola – available on Internet (Google)
8/7/2007 ASM 2007 Tutorial Part 1 34
Cognitive RadiosDefinition and Brief History
IEEE Definition:• A cognitive Radio is a radio frequency transmitter/receiver that is
designed to intelligently detect whether a particular segment of the radio spectrum is currently in use, and to jump into (and out of, as necessary) the temporarily-unused spectrum very rapidly, without interfering with the transmissions of other authorized users
Other Definition• Cognitive Radio is a form of wireless communication in which a transceiver
can intelligently detect which communication channels are in use and which are not, and instantly move into vacant channels while avoiding occupied ones. This optimizes the use of available radio-frequency spectrum while minimizing interference to other users.
8/7/2007 ASM 2007 Tutorial Part 1 35
Cognitive RadiosDefinition and Brief History
Expanded Definition:• Radios that automatically find and access un-used spectrum across
different networks (licensed and un-licensed)– Optimization : Find the best link (in space, time) based on user requirements,
e.g., cost per unit throughput, latency, etc.
– Continuously Adapt : Seamlessly roam across the networks always maintaining the “best link” possible
• This definition leads to: Cognitive Radio opportunistically use the “best”available spectrum. This is possible only primary users and secondary users submit their request for spectrum through cognitive radio
Primary User: Unique ID (licensed), does not need to provide special signaling to access the frequency band (F-Band), can tolerate specified (∆t seconds) level of interference, unaware of cognitive radio
Secondary User: possess cognitive radio capability, monitors the presence of primary users at least ∆t seconds, un-disrupts the primary user activity, facilitate the communication and coordination of SUs, use logical sub-channels to intercommunicate (universal control channels,
Group Control Channels)
8/7/2007 ASM 2007 Tutorial Part 1 36
Cognitive RadiosSpectrum is un-used now, but not available
Large segments of spectrum are un-used in space and time (FCC Data)
8/7/2007 ASM 2007 Tutorial Part 1 37
Cognitive RadiosCognitive radio and software defined Radio
Requirement to combine cognitive radio and software defined radio• Cognitive Radio for Spectrum efficiency
– Analyzing user application– Definition of wireless requirements– Spectrum scanning– Definition of radio characteristics
• Software Radio– Adjusts transmitter and receiver algorithms– Transforms algorithms to an applicable architecture– Maps the architecture on available processor platfor m– Balances between different, parallel operating radi os
• To achieve efficient receiver implementations softw are radio requires– Strong flexibility in terms of
• Algorithms• Power consumption
8/7/2007 ASM 2007 Tutorial Part 1 38
Cognitive RadiosMain Functions of Cognitive Radio
Spectrum Sensing (detecting unused spectrum)– Transmitter Detection
• Matched filter detection• Energy detection• Cyclo-stationary feature detection
– Cooperative detection
– Interference based detection
Spectrum Management (capture best possible spectrum to meet user requirements)
– Spectrum analysis– Spectrum decision
Spectrum Mobility – CR uses the spectrum in a dynamic manner by allowing the radio terminals to operate in the best available frequency band
Spectrum Sharing – Provide fail spectrum scheduling method.
8/7/2007 ASM 2007 Tutorial Part 1 39
Cognitive RadiosCognitive Radio Enhancement
– Each Receiver includes an option to ask for low rec eiver complexity
• Transmit-Power increase• High quality channel selection
– Transmit-Power increase• Other transmitters reduce power
• Other receivers increase complexity
– High quality channel selection• Find a better fitting free channel
• Exchange already allocated channels
8/7/2007 ASM 2007 Tutorial Part 1 40
Cognitive Radios
Channel State Estimation• Channel-State Estimation to Judge about channel cap acity• Semi-blind training
– Supervised training mode via short training sequenc es– Tracking via data feedback
• Rate feedback to transmitter to setup – Data rate– Transmit-power control
8/7/2007 ASM 2007 Tutorial Part 1 41
Cognitive Radios
• Transmit Power Control– Initialize power– Allocate N channels– Investigate rate– Adjust the transmit power of each transmitter to op timal to
achieve the data rate
8/7/2007 ASM 2007 Tutorial Part 1 42
Cognitive Radios
End of Concepts 2
??????
8/7/2007 ASM 2007 Tutorial Part 1 43
Reinforcement Learning (RL)History and Definition
History• Roots in the psychology of animal learning (Thorndike, 1911)
• Another independent thread was the problem of optimal control, and its solution using dynamic programming (Bellman, 1957)
• Idea of temporal difference learning (on-line method), e.g., playing board games (Samuel, 1959)
• A major breakthrough was the discovery of Q-learning (Watkins, 1989)
Definition• Reinforcement Learning is a way of programming agents by reward
and punishment without needing to specify how the task is to be achieved
8/7/2007 ASM 2007 Tutorial Part 1 44
Reinforcement LearningExamples of RL
Example of RL• How should a robot behave so as to optimize its “performance”?
(Robotics)• How to make a good chess-playing program? (Artificial Intelligence) • How to automate the motion of a helicopter? (Control Theory)
What is special about RL• RL is learning how to map states to actions, so as to maximize a
numerical reward over time• Unlike other forms of learning, it is a multistage decision-making
process (often Markovian)• An RL agent must learn by trial-and-error. (not entirely supervised, but
interactive)• Actions may affect not only the immediate reward but also subsequent
rewards (Delayed effect)
8/7/2007 ASM 2007 Tutorial Part 1 45
Reinforcement LearningElements of RL
• A policy– A map from state space to action space
– May be stochastic
• A Reward function– It maps each state (or, state-action pair) to a real number, called
reward
• A value function– Value of a state (or, state-action pair) is the total expected reward,
starting from that state (or, state-action pair)
The Precise Goal of RL• To find a policy that maximizes the value function• There are different approaches to achieve this goal in various situations
• Q-learning and A-learning are just two different approaches to this problem. But essentially both are temporal-difference methods
8/7/2007 ASM 2007 Tutorial Part 1 46
Reinforcement LearningDistinct Features of RL
• Formulate the problem as Markov Decision process (MDP)
• Use Q-learning to solve MDP
• Prior knowledge not required
• Can solve several classes of traffic. Each class has several bandwidth levels, state spaces and action sets
• Hand-off dropping probability and average allocated bandwidth are considered as Quality of service
• Apply the model to optimize the power allocation
• Learning policy is given by Q -learning
Note:• RL can generate near-optimal solutions to large and complex MDPs. In other words,
RL is able to make inroads into problems which suffer from one or more of these two curses and can not be solved by dynamic programming.
8/7/2007 ASM 2007 Tutorial Part 1 47
Reinforcement LearningThe Promise of RL
• Specify what to do, but not how to do it
– Through the reward function– Learning ‘fill in the details’
• Better final solutions– Based of actual experiences, not programmer assumptions
• Less (human) time needed for a good solution
• Background material needed– Some simple decision theory– Markov Decision Processes– Dynamic programming
8/7/2007 ASM 2007 Tutorial Part 1 48
Reinforcement LearningRL Background Material
Single Decisions
Single decisions to be made– Multiple discrete actions– Each action has a reward associated with it
Goal is to maximize reward– Not hard: Just pick the action with the largest reward
State 0 has a value of 2– Sum of rewards from taking the best action from the
state
8/7/2007 ASM 2007 Tutorial Part 1 49
Reinforcement LearningMarkov Decision Processes (MDP)
• MDP has multiple sequential decisions– Each decision affects subsequent decisions
This is formally modeled by Markov Decision process (MDP)
8/7/2007 ASM 2007 Tutorial Part 1 50
Reinforcement LearningMarkov Decision Processes
Formally, an MDP is• A set of states, S = {s1, s2, . . . , sn}• A set of actions, A = {a1,a2, . . ., am}• A reward function, R: SxAxS � R
• A transition function, – Some times T: SxA�S
We want to learn a policy, ? : S� A• Maximize sum of rewards we see over lifetime
8/7/2007 ASM 2007 Tutorial Part 1 51
Reinforcement LearningPolicies
There are 3 policies for this MDP1. 0�1 � 3 � 52. 0 � 1 � 4 � 53. 0 � 2 � 4 � 5Which is the best one?
Comparing Policies:Order policies by how much reward they see:1. 0�1 � 3 � 5 = 1+1+1=32. 0 � 1 � 4 � 5 = 1+1+10=123. 0 � 2 � 4 � 5 = 2 – 1000 + 10 = -988
8/7/2007 ASM 2007 Tutorial Part 1 52
Reinforcement LearningQ-Learning
Q-Learning iteratively approximates the state-action value function, Q
• Again, we are not going to estimate the MDP directly
• Learns the value function and policy simultaneously
Keep an estimate of Q(s,a) in a table
• Update these estimates as we gather more experience
• Estimates do not depend on exploration policy
• Q-learning is an off-policy method
8/7/2007 ASM 2007 Tutorial Part 1 53
Reinforcement LearningQ-Learning Algorithm
8/7/2007 ASM 2007 Tutorial Part 1 54
Reinforcement LearningRL Model
8/7/2007 ASM 2007 Tutorial Part 1 55
Reinforcement LearningExample-Cell phone Channel Allocation
Learns Channel allocations for cell phones• Channels are limited• Allocation affect adjacent cells• Want to minimize dropped and blocked calls
2 Channels
1
1
1
2
bad good
8/7/2007 ASM 2007 Tutorial Part 1 56
Reinforcement LearningStates
State Consists of two elements• Occupied and unoccupied channels for each cell
– Exponential of cells• Least event (arrival, departure, handoff)This is too large to use directly• 7049 states for example in paper
State space actually used has two components• Availability: Number of free channels in cell• Packing: Number of times each channel is used within
interference radius
8/7/2007 ASM 2007 Tutorial Part 1 57
Reinforcement LearningActions
Call arrival– Evaluate possible next channels– Assign one with highest value
Call termination– Free channel– Consider reassigning each ongoing call to just
released channel– Perform reassignment (if any) with highest value
8/7/2007 ASM 2007 Tutorial Part 1 58
Reinforcement LearningRewards and values
Reward is number of on-going calls
Again, this is a continuous-time system
Value function ApproximationThe value function is represented by an artificial neural network
Linear unitsEvaluates state and returns value
Trained using the TD algorithm
Value is dttce )(0∫∞
−β , where β is the number of on-going calls at time t
8/7/2007 ASM 2007 Tutorial Part 1 59
Genetic AlgorithmsBrief History
1948 Turing Proposes “genetical or evolutionary search”
1962 Bremermann Optimization through evolution and recombination
1964 Rechenberg Introduces evolution strategies
1965 L. Fogal, Owens, Introduce evolutionary programming
Walsh1975 Holland Introduces Genetic Algorithms
1992 Koza Introduces genetic programming
8/7/2007 ASM 2007 Tutorial Part 1 60
Genetic AlgorithmsWhat are Genetic Algorithms
• A class of probabilistic optimization algorithms
• Ability to efficiently guide a search through a large solution space
• Search algorithms based on the mechanics of natural selection and natural genetics.
• Inspired by the biological evolution process• Particularly well suited for hard problems where little is known about the
underlying search space • Ability to adapt solutions to changing environments
• They combine survival of the fittest among string structures with a structured yet randomized information exchange to form a search algorithm with some of the innovative flair of human search.
• “Emergent” behavior is the goal– The hopped-for emergent behavior is the design of high-quality solutions to difficult
problems and the ability to adapt these solutions in the face of a changing environment” Melenie Mitchell, An Introduction to Genetic Algorithms
8/7/2007 ASM 2007 Tutorial Part 1 61
Genetic AlgorithmsGenetic Algorithms – Biological Background
• Chromosomes - Chromosomes are strings of DNA (Deoxyribonucleic Acid) and serve as model for the whole organism.
• Genes - A chromosome consists of genes, block of DNA. Each gene encodes a particular protein
• Alleles - Each gene encodes a trait , for example color of eyes. Possible settings for a trait (e.g. blue, brown) are called alleles.
• Locus - Each gene has its own position (called locus ) in the chromosome.
• Genome - Complete set of genetic material (all chromosomes) is called genome.Particular set of genes in genome is called genotype.
• Phenotype – The genotype is with later development after birth base for theorganism’s phenotype, its physical and mental characteristics, such as eye color, intelligence etc.
• Reproduction�Crossover – or recombination occurs during the reproduction. Genes from parents
combine to form a whole new chromosome.�Mutation – Elements of DNA are a bit changed
�Fitness – measured by success of the survival
�Reproduction – Copying “fit” strings, based on objective (fitness) function values
8/7/2007 ASM 2007 Tutorial Part 1 62
Genetic AlgorithmsElemen ts and Operators of Genetic Algorithms
• Elements– Chromosomes, Genes, Alleles
• Operators– Fitness, Selection– Crossover, Mutation
– Cardinality, Defining length, Fitness function
– Genotype, Phenotype, Hamming distance, Population size
– Schema, Schema length, Schemata, Schema order, variationChromosomes are made of units – genes (features, characters, or decoders) arranged
in linear successionEach chromosome consists of “genes ” (e.g. bits)
Every gene controls the inheritance of one or several characters
Each gene being an instance of a particular “allele ” (e.g. 0 or 1)
Genes of certain characters are located at certain places of chromosome, which are called loci (string positions)
Any character of individuals (such as hair color) can manifest itself differently;The gene is said to be in several states, called alleles (feature values)
8/7/2007 ASM 2007 Tutorial Part 1 63
Genetic AlgorithmsSearch Space
Search Space• Search space - space of all feasible solutions (represented by separate points)• With GA we look for best of number of possible solutions
• Search space is large and complicated - need to use special methods for good solutions (hill climbing, tabu search, simulated annealing, and genetic algorithms)
NP-hard Problems (nondeterministic polynomial)
An example of an NP-hard problem is the decision problem SUBSET-SUM which is this: Given a set of integers, does any non empty subset of them add up to zero? That is a yes/no question, and happens to be NP-complete
• NP-Problems cannot be solved in traditional way (traveling salesman problem)
• NP-Complete Possible to guess the solution and then check
• Trying all possible solutions is very slow process (e.g. O(2n)• Difficult to find algorithms to provide exact answers for NP-problems
• Such algorithms may not exist but alternative is Genetic Algorithms
8/7/2007 ASM 2007 Tutorial Part 1 64
Genetic AlgorithmsBasic Operators
• Crossover – Recombination operator
– Mimics biological recombination
• Some portion of genetic material is swapped between chromosomes
• Typically the swapping produces an offspring– Mechanism for the dissemination of “building blocks” (schemas)
i.e.
Takes two individuals (chromosomes) and cuts their chromosomestrings at some randomly chosen position and swaps the tail positions.
Original Crossover (single Point)
xxx|xxxxx xxx00000000|0000 000xxxxx
8/7/2007 ASM 2007 Tutorial Part 1 65
Genetic Algorithms
MutationSelects a random locus – gene location – with some probability and alters the
allele at that locusThe intuitive mechanism for the preservation of variety in the population
i.e.Substitute one or more bits of an individual randomly by a new value (0 or 1)011101101100
|011001101100 (mutate 4th bit)
Other operators not discussed here
8/7/2007 ASM 2007 Tutorial Part 1 66
Genetic AlgorithmsFitness• A measure of the goodness of the organism
• Expressed as the probability that the organism will live another cycle (generation)
• Basis for the natural selection simulation
– Organisms are selected to mate with probabilities proportional to their fitness
• Probabilistically better solutions have a better chance of conferring their building blocks to the next generation (cycle)
Elitism:• At least one of a generation’s best solution is copied without changes to a new
population.
Encoding:A data structure for representing candidate solutions
– Often takes the form of a bit string
– Usually has internal structure; i.e., different parts of the string represent different aspects of the solution
8/7/2007 ASM 2007 Tutorial Part 1 67
Genetic AlgorithmsSimple Example
f(x) = {MAX(x2): 0<=x<=25 - 1}Encode Solution: Just use 5 bits (1 or 0)Generate initial population
A 0 1 1 0 1B 1 1 0 0 0C 0 1 0 0 0D 1 0 0 1 1
8/7/2007 ASM 2007 Tutorial Part 1 68
Genetic AlgorithmsExample
Evaluate each solution against objectivef(x) = {MAX(x2): 0<=x<=32}Encode Solution: Just use 5 bits (1 or 0)
String String x-value Fitness fi/Σf Expected Actual CountNo. Randomly Unsigned f(x)=x2 % of Count from Roullette
generated integer fi total fi/f wheelA 01101 13 169 0.144 14.4 0.58 1B 11000 24 576 0.492 49.2 1.97 2C 01000 8 64 0.55 5.5 0.22 0D 10011 19 361 0.309 30.9 1.23 1
Sum(Σf ) 1170 1.0 4.0 4.0Average 1170/4=293 0.25 1.0 1.0Max 576 0.49 1.97 2.0
8/7/2007 ASM 2007 Tutorial Part 1 69
Genetic AlgorithmsA Simple GA
1. [Start] Generate random population of n chromosomes (suitable solutions for the problem) 2. [Fitness] Evaluate the fitness f(x) of each chromosome x in the population 3. [New population] Create a new population by repeating following steps until the new
population is complete a. [Selection] Select two parent chromosomes from a population according to their fitness
(the better fitness, the bigger chance to be selected) b. [Crossover] With a crossover probability cross over the parents to form new offspring
(children). If no crossover was performed, offspring is the exact copy of parents. c. [Mutation] With a mutation probability mutate new offspring at each locus (position in
chromosome). d. [Accepting] Place new offspring in the new population
4. [Replace] Use new generated population for a further run of the algorithm 5. [Test] If the end condition is satisfied, stop, and return the best solution in current population 6. [Loop] Go to step 2
Note: Algorithm begins with set of chromosomes called population. New population will be formed using GA operators and best solutions will be selected according to fitness.
The questions are how to form the chromosome, how to select parents, and how to save best solutions while generating new population (elitism)
Elitism: At least one of a generation’s best solution is copied without changes to a new population.
8/7/2007 ASM 2007 Tutorial Part 1 70
Genetic AlgorithmsStructure of GA
8/7/2007 ASM 2007 Tutorial Part 1 71
Genetic AlgorithmsPros and Cons of GA
• Works well for global optimization with discontinuities or with several local maxima
• Slow convergence rate for well-behaved objective functions
• Can be used for unconstrained and constrained optimization problems, nonlinear programming, stochastic programming, combinatorial optimization problems
8/7/2007 ASM 2007 Tutorial Part 1 72
Genetic AlgorithmsGA & Wireless Communications
• Optimal multicast route discoveryThe problem of (optimal) multicast route discovery is NP-hard when the network state information is inaccurate, which is so common in the wireless domain. In general, this makes it difficult to determine multicast routes on demand, and hence the network resources are never used to their potential. So GA methods helps in such situations.
• Trade-off between bit error rate (BER) and bandwidt hWireless links suffer from high BER and fading (results packet loss and turns to packet delay and jitter) and can be reduced at the cost of extra bandwidth allocation. So there is a trade-off between BER and bandwidth for a fixed wireless spectrum. But due to the scarcity of wireless bandwidth, in this domain the uncertainty of information could not be fully eradicated. So look for GA methods.
• Mobility and hand-offUncertainty due to mobility and hand-off. As a mobile station moves and hand-offs from one access point to another, there is a change in the wireless network resources which can give rise to uncertainty in network conditions. We may end with the problem of finding paths satisfying end-to-end delay and bandwidth constraint (NP-hard problem). Computations to solve such cases are expensive. GA methods will be an alternative.
8/7/2007 ASM 2007 Tutorial Part 1 73
Genetic AlgorithmsProblems - Spectral Awareness
• Sensing high priority user�Adaptive frequency – find a frequency�Adaptive TDMA – find an unused time slot in between a periodic
user(TDMA – Time division multiple access)
• Spectral Reuse – Beam steering and null Steering• Small spectral holes can be filled by one or a few carriers that fit the
time• Multi-User Decomposition• Adaptive power control (APC)• AD Hoc Networking (Shortest hop routing with APC)
8/7/2007 ASM 2007 Tutorial Part 1 74
Genetic AlgorithmsHow GA helps in Adaptive Power
• Sensing high priority user
– Adaptive frequency - find a frequency
– Adaptive TDMA – find an unused time slot in between a periodic user
• Spectral Reuse – Beam steering and null Steering• Small spectral holes can be filled by one or a few
carriers that fit the time• Multi-user Decomposition• Adaptive power control (APC)• AD Hoc Networking (Shortest hop routing with APC)
8/7/2007 ASM 2007 Tutorial Part 1 75
Genetic AlgorithmsHow GA helps in Adaptive Power
• Optimal multicast route discovery The problem of (optimal) multicast route discovery is NP-hard when the network state information is inaccurate, which is so common in the wireless domain. In general, this makes it difficult to determine multicast routes on demand, and hence the network resources are never used to their potential. So GA methods helps in such situations.
• Trade-off between bit error rate (BER) and bandwidthWireless links suffer from high BER and fading (results packet loss and turns to packet delay and jitter) and can be reduced at the cost of extra bandwidth allocation. So there is a trade-off between BER and bandwidth for a fixed wireless spectrum. But due to the scarcity of wireless bandwidth, in this domain the uncertainty of information could not be fully eradicated. So look for GA methods.
• Mobility and hand-offUncertainty due to mobility and hand-off. As a mobile station moves and hand-offs from one access point to another, there is a change in the wireless network resources which can give rise to uncertainty in network conditions. We may end with the problem of finding paths satisfying end-to-end delay and bandwidth constraint (NP-hard problem). Computations to
solve such cases are expensive. GA methods will be an alternative
8/7/2007 ASM 2007 Tutorial Part 1 76
Genetic AlgorithmsCurrent status and Future work
• The basic problems involved and research needed to complete in using GAs are:– Create chromosomes with the help of wireless scenarios (preparation of data sets)– Representation of chromosomes with respect to the scenario data.– Identify the information required for decisions. – Select the required attributes only from scenario data. Let us call these attributes are axis attributes.– Identify when the environmental data required incorporating as part of learning process (this helps for better
optimization of the system).– Develop Meta GAs that learns to adjust to axis attributes (the GA parameters used for rapid solution) based
on the current state of the system. – Use of bucket brigade algorithm - a reinforcement learning system (learning guided by rewards coming from
environment) to solve incumbent interference. • The basic problems involved and research needed to complete in using GAs are:
– Create chromosomes with the help of wireless scenarios (preparation of data sets)– Representation of chromosomes with respect to the scenario data.– Identify the information required for decisions. – Select the required attributes only from scenario data. Let us call these attributes are axis attributes.– Identify when the environmental data required incorporating as part of learning process (this helps for better
optimization of the system).– Develop Meta GAs that learns to adjust to axis attributes (the GA parameters used for rapid solution) based
on the current state of the system. – Use of bucket brigade algorithm - a reinforcement learning system (learning guided by rewards coming from
environment) to solve incumbent interference.
8/7/2007 ASM 2007 Tutorial Part 1 77
Genetic AlgorithmsAppendix - Terms
cardinality: The cardinality is the number of alphabet characters in a scheme. For example, in the binary alphabet {0, 1} cardinality is 2 and the number of elements are 3, because we can add another symbol * - do not care (can take 0 or 1), If the length of the string is 5, that is represented as = {10101}, there are 35 different similarity templates (schemata) can be created. defining length: This is the distance between the first and last specific string positions. d(011*1**) = 4 because the last specific position is 5, and the first is 1. fitness function: A fitness function must be devised for each problem; given a particular chromosome, the fitness function returns a single numerical fitness value, which is proportional to the ability, or utility, of the individual represented by that chromosome. genotype: In natural systems, one or more chromosomes combine to form the total generic prescription for the construction and operation of some organism. The total genetic package is called the genotype. hamming distance: The simplest distance calculation between two sequences. If two sequences of characters are equal length (number of characters in each sequence), then the hamming distance is the number of characters differ in their relative positions. For example: Sequence s SMITH JONE Sequence t SMELL JOHN hamming distance (s,t) 3 2
8/7/2007 ASM 2007 Tutorial Part 1 78
Genetic AlgorithmsAppendix - Terms
phenotype: The organism formed by the interaction of the total genetic package with its environment (product of the interaction of all genes) population size: The population of size n means the number of schemas considered. For example, the population of size 5 is: 10011 10001 00111 11100 01010 variation: Change the bits in a way that the number encoded by them is slightly incremented or decremented. 1001.0010.0101.1100 9 2 5 13 1001.0001.0101.1100 9 1 5 13
8/7/2007 ASM 2007 Tutorial Part 1 79
Genetic AlgorithmsAppendix - Terms
schema: Schema H of length 7: String H (b1 b2b3b4b5b6b7) = *11*0** or string A (a1 a2a3a4a5a6a7) = 0110000 schema length: The length of schema H i.e. δ(H), is the distance between the first and last specific string position.
The schema 011*1** has defining length δ = 4 because the last position is 5 and the first specific position is 1, and the distance between them is δ(H) = 5 – 1 = 4.
The schema 0****** has the length δ(H) = 0. schemata: A schema is a string over an extended alphabet, {0, 1, *}, where the 0 and the 1 retain their normal meaning and the * is a wild card or do not care symbol. This notational device greatly simplifies the analysis of the genetic algorithm method because it explicitly recognizes all the possible similarities in a population of strings. Schema order: This is the number of fixed positions present in the similarity template. e.g. o(011*1**) = 4, o(0*****)=5
8/7/2007 ASM 2007 Tutorial Part 1 80
Data Communications, Reinforcement Learning, and Genetic Algorithms
Conclusions• Provided basic concepts on
– Data Communications and Cellular Networks
– Cognitive Radios
– Reinforcement Learning– Genetic Algorithms
• The concepts will help to understand the research topics during the second and third hour
Note:• The part 1of the tutorial is prepared from Books, Tutorials, Lectures,
and some conference papers of various authors
8/7/2007 ASM 2007 Tutorial Part 1 81
The 16th IASTED International Conference on Applied Simulation and Modeling ~ ASM 2007
Tutorial Part 1
Questions
? ? ?