preprocessing, compiling, assembling, and linking introduction
TRANSCRIPT
- 1 of 22 -
Preprocessing, Compiling, Assembling, and Linking
Introduction In this lesson will examine
Architecture of C program Introduce C preprocessor and preprocessor directives How to use preprocessor’s directives to manage program
Have examined process Creating program
Have developed program Written in C
Source code
Next step Translate into something computer can use
Called object code Things to think about along the way
How to accommodate different • Versions
Called localization • Features • Targets - Machines • Operating Systems
The First Step
Model the process Examine at three levels
Each with increasing detail
Start with top level Begin with source file End with object or machine code
Also called object file or machine code file Machine code will be unique to specific computer or microprocessor
Transformation from source to object Called compilation or compiling
Top Level
- 2 of 22 -
Level 1 Level 2:
The Pieces
Let’s begin with preprocessor Preprocessor
Simple and handy tool Its job is to process C source code
Before compiler Reads source program
- 3 of 22 -
Translates it into machine code Question might be
Why do we have to do this Overview
Many of useful features and capabilities of C Not implemented by compiler Rather
Selected by user Brought in on demand by preprocessor
When program written
User includes various directives to preprocessor Preprocessor
Reads source file Interprets directives Effects operations specified by directives
Example directives tell the preprocessor
• Which library files to include • Which user written files to include • Which portions of the program to include or exclude
We may want slightly different versions of program For different applications
May want to conditionally include debug code • Specify certain constant identifiers
Called symbolic constants Make reading and managing program easier
Structure
From above discussion we see Preprocessor has
Input C source code
Containing embedded preprocessor directives Output
Preprocessed C source file Input to compiler
Implementation
Separate Program
- 4 of 22 -
Reads original C source file Looks for lines beginning with # symbol
Evaluates each such line Writes out C source to compiler
Based upon directives included in line
Single Program Performs
Preprocessing Compilation
In single pass
Preprocessor Language Preprocessor language specified as set of directives Directives typically begin in column 1 (caution) of source file
Depends upon version of preprocessor As it goes through source file line by line
Preprocessor looks for lines beginning with the special character # Syntax
Completely independent of the C language Number of directives
Approximately 12 – 15 Shown in following table
Preprocessor in Action
Examine each line in program source Those that do not begin with #
Viewed as source text These are ignored and sent directly to output
Those that begin with # Expand Transform
As directed by the command
- 5 of 22 -
Assuming process runs correctly Must result in a C program Preprocessor does not correct
User design errors Syntactic errors Grammatical errors
Directive Definition
#define Define a preprocessor macro or symbolic constant
#undef Undefine or remove a preprocessor macro
#include Insert contents of another source file
#if Conditionally include contents of another source file
#ifdef Conditionally include contents of another source file if macro name is defined
#ifndef Conditionally include contents of another source file if macro name is not defined
#elif Conditionally include contents of another source file if macro name is defined and previous #if, #ifdef, #ifndef, or #elif failed
#else Alternative action if preceding #if, #ifdef, #ifndef, or #elif directive fails
#endif Closes #if, #ifdef, #ifndef, or #elif construct
#line Return line number for compiler message
defined name
defined(name)
Directive that returns 1 if name is defined as preprocessor macro and 0 otherwise
# operator Directive to replace macro parameter with string constant containing parameter’s value
## operator Create single token from two adjacent tokens
#pragma Specify proprietary information to the compiler
#error Return a compile time error with associated message
- 6 of 22 -
Lexical Conventions Line beginning with #
Preprocessor command Name of command must follow # ISO C - International Standards Organization
White space can precede or follow # on the same source line
Older versions do not permit
Line with only # ISO C
Null directive Treated as blank line
Older versions May be different
Remainder of the line following the command
May contain command args Args subject to macro replacement
If no args required Remainder of line should be empty
White space and comments allowed Often old compilers will ignore
Preprocessor lines are recognized
Before macro expansion Will talk about macros shortly
If macro expands into something that looks like preprocessor directive Directive not recognized
Example #define STRLIB #include<string.h> STRLIB 1. #define processed
STRLIB is interpreted to mean #include<string.h> 2. STRLIB substitution executed
Based upon the definition in previous line 3. Token sequence #include<string.h> passed to compiler as code
- 7 of 22 -
The preprocessor recognizes the line continuation character
Commands can extend to multiple lines with the \ character Example
#define DOLLAR $ #define BACKSLASH \ #define MODULUS | Results in 2 lines not 3 as might be expected
Line 2 continued on to line 3 – these interpreted as single line
Example
#define SWAP(a, b) {
a ^= b; \
b ^= a; \
a ^= b; \
}
Preprocessor Directives File Inclusion
Directive #include Simplest preprocessor directive Has two forms
Either form
Replaces the current line with Entire contents of the named file
If complete path not given
Search determined by form used < >
Search in certain standard places System type places
syntax #include <fileName> #include “fileName”
- 8 of 22 -
Determined by implementation Defined by search rules Specific location set at time of compiler installation
“ “ First search some local places
Current directory Second
Certain standard places General intent < >
Standard implementation files “ “
Programmer written files Included file
May contain #include commands Number
Implementation dependent ANSI C requires support for 8 minimum Error if included file cannot be found
Third form of #include recognized
The tokens undergo normal macro expansion
Result must match one of the first two forms Example
#define COMMS “G:/mySystem/include/comms.h” #include COMMS Causes the preprocessor to look in directory and for the file specified
G:/mySystem/include/comms.h
Note: the forward slash / or back slash \ used to separate directories along a directory path depends upon operating system. Typically UNIX or LINUX derivatives use the forward slash and Windows derivatives use the back slash
syntax #include preprocessor tokens
- 9 of 22 -
Macro Substitution
Directives #define #undef
#define i. The first form of #define directive
Causes name To be defined as a macro to the preprocessor Instructs the preprocessor To replace all (unquoted) occurrences of name with text
name
Must be an identifier as defined by the C language U/L case letters.....
text Called the body of the macro
Process called macro substitution
Simple macros Common use
Symbolic constants Example
#define MAXSIZE 2048 #define PI 3.14 #define TWOPI 6.28 #define TWOPI (3.14*2.0) #define TWOPI (PI + PI) int myArray [MAXSIZE]; circumference = TWOPI * radius area = PI * pow(r,2);
syntax i. #define name text ii. #define name (arg1, arg2, ... argn,) text iii. #undef name
- 10 of 22 -
Macros with Parameters Second form of #define directive
Declares a formal parameter list
Parameter list Immediately follows macro name
No intervening whitespace If whitespace
Definition assumed to be macro with no args Enclosed in () Separated by commas
Args in the parameter list Must be identifiers No two the same Need not be used in macro body
Parameter list may be empty
Using a Parameterized Macro Macro invoked
Writing name Left parenthesis 1 actual arg for each formal parameter Separated by commas Right parenthesis
If no formal parameters
Must include empty arg list Whitespace may appear
Between Name Left parenthesis
Formal arg
May contain Properly balanced parenthesis Commas
If within set of parenthesis Braces and subscripting brackets
Cannot contain commas Do not have to balance
- 11 of 22 -
Example
#define sum(x,y) ((x) + (y)) x = sum(2*a, b) / sum (c,d); x = sum(2 * g(a,b), h(a,b)) / sum (c,d);
Example
#define getModem() getc(modemIn) while ((c = getModem()) != EOF)
Example
Can define a macro that takes arbitrary statement as its argument
#define assign(anyStatement) anyStatement assign( {a = 1 ; b = 2;}) assign (c = 0; d = 1; e = 2;)
Example
#define max(a,b) ((A) > (B) ? (A) : (B)) max (3, 4); max (6, 5);
Potential problems
Consider max(i++, j++);
Appears to be simple use of max ()
Observe
((A) > (B)) replaced by ((i++) > (j++))
(A) : (B) replaced by ((i++) : (j++))
Potentially each variable is incremented twice
#undef The #undef macro Companion to #define
- 12 of 22 -
Used to make name
No longer defined Causes preprocessor to forget Macro definition of name
Once name is undefined Can be given new definition
Using #define Not an error
To undefine a name that is not defined
Macro expansion Not performed within #undef directive
Conditional Compilation
Directives #if #else, #elif #endif
Conditional Compilation directives
Based upon computed condition Allow lines of source code to be
Passed through Eliminated
Used to control the way the source code
Assembled Compiled Semantics
syntax #if constant-expression #else, #elif constant-expression #endif
syntax #undef name
- 13 of 22 -
As expected
#if constant-expression
constant-expression Must evaluate to constant arithmetic value May include macro substitution
if constant-expression non- zero
Subsequent C code lines Intended to be included in program All C source lines
Sent to preprocessor output Until #else, #elif, or #endif
Expression encountered
#else, #elif constant-expression
#else Like familiar if - else
If if previous conditions fail Lines follow #else are included
#elif constant-expression Equivalent to else if
Like if constant-expression evaluated Consequences are the same as #if
#endif Closes the #if sequence
Example
Let the variable SYSTEM identify the host system LINUX OSX UNIX WIN7
- 14 of 22 -
Want different header file included depending upon system Each defines system specific information #if LINUX #define HDR “linuxHeader.h” #elif OSX #define HDR “osxHeader.h” #elif UNIX #define HDR “unixHeader.h” #else #define HDR “win7Header.h”
Conditional Directives
Directives #ifdef #ifndef
Conditional directives Test if an identifier
Defined Not defined
#ifdef
Equivalent to if 1
if the identifier is defined if 0
if the identifier is not defined #ifndef
Equivalent to if 0
if the identifier is defined if 1
if the identifier is not defined or undefined
syntax #ifdef name #ifndef name
- 15 of 22 -
Example Want different debug code included depending upon system
Conditionally include debug code Don’t want to
Include in the final version Take out
For future upgrades
Each defines system specific information
#define LINUX 0 #define WIN7 0 #define UNIX 0 #define OSX 1 #if def LINUX
Linux debug code #endif #if def WIN7
Win 7 debug code #endif #if def UNIX
Unix debug code #endif #if def OSX
osx debug code #endif Example
Program Multiple files
Several files share Common .h file
May want to Debug separately Use for multiple targets Use for different programs
In final build Will have multiple definitions for variable if .h file included multiple times
- 16 of 22 -
Example
May have added debug code to source For use during development
Want to remove for release Bad style to individually comment out
Each line of debug code Preprocessor can help
Example preproc0.c
#include <string.h> #include <stdio.h> #define DEBUG // commenting out this line will
// prevent debug code from inclusion in final build int main() { char* myString = "Hello"; #ifdef DEBUG
printf ("The string length is %d\n", strlen(myString)); #endif return 0; }
Miscellaneous Directives
Directives #line #error #pragma #line
- 17 of 22 -
If program built from
Multiple other files Sometimes useful to annotate
With line numbers from original file Instead of normal sequential numbering
Info provided by #line directive
Used to instantiate the __LINE__ __FILE__
__LINE__ Line number of current source program Decimal integer constant
__FILE__ Name of current source file String constant
Example
preproc1.c
#include <stdio.h> #include <string.h> int main() { char* myString = "Hello"; #line 123 "myFile" printf ("This line is %d from %s\n", __LINE__, __FILE__); printf ("The string length is %d\n", strlen(myString)); return 0; }
syntax #line line-number “fileName” #line line-number
- 18 of 22 -
#error Used to write Compile time error message
error-message is subject to macro expansion Typically used in conditionals
Warn of inconsistencies Constraint violations Incomplete information
Example
preproc2.c #include <stdio.h> #include <string.h> #define SYSTEM Linux #ifndef SYSTEM #error "You must specify the system type" #endif int main() { char* myString = "Hello"; printf ("The string length is %d\n", strlen(myString)); return 0; }
#pragma Used to
Add new preprocessor or compiler functionality Provide implementation defined information to the compiler
No restrictions on tokens Compilers should ignore what they do not understand
syntax #error error-message
syntax #pragma tokens
- 19 of 22 -
args to directive Subject to macro expansion
No agreement on standard pragmas
Example #pragma pagesize (number of lines)
MS Visual C++ Sets the number of lines desired per page of source listing
#pragma pages (<pages>)
Generate <pages> (formfeeds) in source listing Default value is 1
#pragma inline
Compile with fast calling convention
Typedef Names C provides facility for creating new data type names called typedef
Typedef creates alias or synonym for existing type
After declaration identifier becomes synonym for typeName
Caution
Typedef does not create a new type It is merely a synonym or alias for an existing type
Cannot redefine the built-in meaning of a type
typedef int double is illegal
Example typedef int* INTPTR;
INTPTR is not a pointer to an int It may be used where ever and int* can be used
INTPTR myPtr;
myPtr is now a pointer to an integer
syntax typedef typeName identifier
- 20 of 22 -
typedef
Very useful in simplifying complicated declarations Thus
Helps to simplify program Makes intent more obvious
Use carefully
Rather than clarify Overuse can serve to confuse
The Compiler Compiler is a tool for translating programs
Into variety of forms One such form
Assembly language – the instruction set for the machine
As we saw in level 2 diagram above Top level program
Can be made up of number of modules Module can be
C source file Standard library file
Defined as part of the language such as Math library String library Library that manages all input and output
Custom library Under the Hood
As program compiled Compiler has a lot of record keeping to do
Translation Unit
As compilation process proceeds Each .c or source file compiled individually
Called translation unit Symbol Table
As each source file compiled Table of identifiers of symbols within program created
Called symbol table
- 21 of 22 -
How compiler keeps track of
All identifiers used Where in memory variables placed
Allocate Memory – Yes or No
Each symbol name entered into symbol table Declaration – brings name into name space
No memory allocated Definition – brings name into name space
Sufficient memory allocated to hold variable If definition appears in different translation unit
Identify as extern Want only single definition – memory allocation For each variable or function body in system
Prior to this stage Program did not depend upon machine Now program in form that will execute only on particular machine
The Assembler Assembler is tool we use for converting
Assembly language into machine language Program expressed as collection of 0’s and 1’s machine understands
The Linker Although program now in machine language Not ready to be executed Problem
All variables and data structures we use Must reside in computer memory Each needs an address in memory
Question
Which address should we use
Unfortunately Cannot always use the same address What if someone else wants to use same address
- 22 of 22 -
To solve problem assembler generates Relocatable code
Code that can be placed anywhere in memory Second question arises at this time
We’d like to be able to use existing code Our own Other peoples
How do we get this into our program without typing in each time
Tool called linker loader can help with both problems Does two jobs
1. Links collection of program modules together 2. Resolves address problems
Summary In this lesson examined
Architecture of C program Introduced C preprocessor and preprocessor directives How to use preprocessors directives to manage program Should now be comfortable working with basic C preprocessor directives Know when and how to use Aware if tools compiler, assembler, and linker and role they play in building C
program