nicolascormier.comnicolascormier.com/documentation/sys-programming/...table of contents introduction...

338
Programming in standard C and C++

Upload: others

Post on 05-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

Programming in standard C and C++

Page 2: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

Table of ContentsIntroduction to programming in standard C and C++...................................................................................1

Summary of contents...............................................................................................................................1Creating an executable......................................................................................................................1Advanced programming utilities.......................................................................................................2Program analysis...............................................................................................................................2C and C++ compilation.....................................................................................................................2

C language...............................................................................................................................................4Modular programming in C...............................................................................................................4

C++ language...........................................................................................................................................4Libraries and header files.........................................................................................................................5

How C and C++ programs communicate with the shell...................................................................5Other tools................................................................................................................................................6

C and C++ compilation system..........................................................................................................................7Compiling and linking.............................................................................................................................7

Components of the C compilation system.........................................................................................7Components of the C++ compilation system....................................................................................9Basic cc and CC command line syntax...........................................................................................10Commonly used command line options..........................................................................................14Link editing.....................................................................................................................................17Checking for run−time compatibility..............................................................................................26Dynamic linking programming interface........................................................................................27Guidelines for building shared objects............................................................................................27Library maintenance........................................................................................................................30C++ and dynamic linking................................................................................................................31C++ external function name encoding............................................................................................32Accessing C functions from C++....................................................................................................34Quick−reference guide....................................................................................................................35

Libraries and header files.......................................................................................................................36Header files......................................................................................................................................37C++ precompiled headers................................................................................................................38How to use library functions...........................................................................................................42C library (libc).................................................................................................................................43Math library (libm)..........................................................................................................................46General purpose library (libgen).....................................................................................................49Standard I/O....................................................................................................................................50Reentrant libraries...........................................................................................................................53BSD system libraries and header files.............................................................................................54

C language compilers........................................................................................................................................55Compilation modes................................................................................................................................55

Feature test macros..........................................................................................................................56Global behavior...............................................................................................................................56Phases of translation........................................................................................................................57

Source files and tokenization.................................................................................................................58Tokens.............................................................................................................................................58Identifiers.........................................................................................................................................58Keywords.........................................................................................................................................58

Programming in standard C and C++

i

Page 3: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

Table of ContentsC language compilers

Constants.........................................................................................................................................58String literals...................................................................................................................................61Wide string literals..........................................................................................................................61Comments........................................................................................................................................61

Preprocessing.........................................................................................................................................61Trigraph sequences..........................................................................................................................62Preprocessing tokens.......................................................................................................................62Preprocessing directives..................................................................................................................63

Declarations and definitions..................................................................................................................69Basic types.......................................................................................................................................69Scope...............................................................................................................................................71Storage duration..............................................................................................................................72Storage class specifiers....................................................................................................................72Declarators.......................................................................................................................................73Function definitions.........................................................................................................................75

Conversions and expressions.................................................................................................................75Implicit conversions........................................................................................................................75Expressions......................................................................................................................................77Operators.........................................................................................................................................77Associativity and precedence of operators......................................................................................84Constant expressions.......................................................................................................................84Initialization.....................................................................................................................................84

Statements..............................................................................................................................................87Expression statement.......................................................................................................................87Compound statement.......................................................................................................................87Selection statements........................................................................................................................87Iteration statements..........................................................................................................................88Jump statements..............................................................................................................................89

Portability considerations......................................................................................................................90

Complying with standard C.............................................................................................................................92Mixing old and new style functions.......................................................................................................92

Writing new code............................................................................................................................92Updating existing code....................................................................................................................92Mixing considerations.....................................................................................................................93Examples.........................................................................................................................................93

Functions with varying arguments.........................................................................................................94Example...........................................................................................................................................94

Promotions: unsigned vs. value preserving...........................................................................................96Background.....................................................................................................................................96Compilation behavior......................................................................................................................96First example: using a cast..............................................................................................................97Bit−fields.........................................................................................................................................97Second example: result is the same.................................................................................................98Integral constants.............................................................................................................................98Third example: integral constants...................................................................................................98

Tokenization and preprocessing............................................................................................................99

Programming in standard C and C++

ii

Page 4: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

Table of ContentsComplying with standard C

ANSI C translation phases..............................................................................................................99Old C translation phases................................................................................................................100Logical source lines.......................................................................................................................100Macro replacement........................................................................................................................100Stringizing.....................................................................................................................................101Token pasting................................................................................................................................101

Using const and volatile.......................................................................................................................102Types for lvalues...........................................................................................................................102Type qualifiers in derived types....................................................................................................102Using const to read character values.............................................................................................103When to use volatile......................................................................................................................104

Multibyte characters and wide characters............................................................................................105``Asianization'' means multibyte characters..................................................................................105Encoding variations.......................................................................................................................105Wide characters.............................................................................................................................105Conversion functions.....................................................................................................................106C language features.......................................................................................................................106

Standard headers and reserved names..................................................................................................107Balancing process..........................................................................................................................107Standard headers............................................................................................................................108Names reserved for implementation use.......................................................................................108Names reserved for expansion......................................................................................................109Names safe to use..........................................................................................................................109

Internationalization..............................................................................................................................109Locales...........................................................................................................................................109The setlocale function...................................................................................................................110Changed functions.........................................................................................................................111New functions................................................................................................................................111

Grouping and evaluation in expressions..............................................................................................112Definitions.....................................................................................................................................112The Kernighan and Ritchie C rearrangement license....................................................................112The ANSI C rules..........................................................................................................................113Parentheses grouping and evaluation............................................................................................113The ``As If'' rule............................................................................................................................113

Incomplete types..................................................................................................................................114Types.............................................................................................................................................114Completing incomplete types........................................................................................................114Declarations...................................................................................................................................114Expressions....................................................................................................................................115Justification...................................................................................................................................115Examples.......................................................................................................................................115

Compatible and composite types.........................................................................................................116Multiple declarations.....................................................................................................................116Separate compilation compatibility...............................................................................................116Single compilation compatibility..................................................................................................116Compatible pointer types...............................................................................................................117Compatible array types..................................................................................................................117

Programming in standard C and C++

iii

Page 5: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

Table of ContentsComplying with standard C

Compatible function types.............................................................................................................117Special cases..................................................................................................................................117Composite type..............................................................................................................................118

C++ language...................................................................................................................................................119Compilation modes..............................................................................................................................119C++ dialect accepted............................................................................................................................119

Normal C++ mode.........................................................................................................................119Extensions accepted in normal C++ mode....................................................................................120Anachronisms accepted.................................................................................................................121Extensions accepted in cfront transition mode..............................................................................122

Instantiating C++ templates...........................................................................................................................127The instantiation problem....................................................................................................................127Coding standards for template definitions...........................................................................................128Manual instantiation............................................................................................................................128

Pragma interface............................................................................................................................129Instantiation via the command line...............................................................................................130Single files.....................................................................................................................................130

Automatic instantiation........................................................................................................................130Dependency management..............................................................................................................131Performance...................................................................................................................................132What can go wrong........................................................................................................................132

Other considerations............................................................................................................................132Inlines............................................................................................................................................132Specializations...............................................................................................................................132Libraries.........................................................................................................................................133More on linking template code into archives................................................................................133The one instantiation per object scheme.......................................................................................135Special symbols.............................................................................................................................138

Using C++ exception handling.......................................................................................................................139Performance implications....................................................................................................................139Mixed language programming.............................................................................................................140Other implementation dependencies....................................................................................................141

Compiler diagnostics......................................................................................................................................142C compiler diagnostics.........................................................................................................................142

Message types and applicable options...........................................................................................143Operator names in messages.........................................................................................................143

Messages..............................................................................................................................................145Other error messages............................................................................................................................214C++ compiler diagnostics....................................................................................................................214

Object files.......................................................................................................................................................216File format............................................................................................................................................216Data representation..............................................................................................................................217

Programming in standard C and C++

iv

Page 6: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

Table of ContentsObject files

ELF header...........................................................................................................................................218ELF identification..........................................................................................................................221

Sections................................................................................................................................................225Rules for linking unrecognized sections.......................................................................................232Special sections.............................................................................................................................233

String table...........................................................................................................................................236Symbol table........................................................................................................................................237

Symbol values...............................................................................................................................243Relocation............................................................................................................................................243Program header....................................................................................................................................245

Base address..................................................................................................................................248Segment permissions.....................................................................................................................248

Program loading (Processor specific)..................................................................................................251Dynamic linking..................................................................................................................................252

Program interpreter........................................................................................................................252Dynamic linker..............................................................................................................................252Dynamic section............................................................................................................................253Shared object dependencies...........................................................................................................259Global offset table.........................................................................................................................261Procedure linkage table.................................................................................................................261Hash table......................................................................................................................................262Initialization and termination functions........................................................................................262

Floating point operations...............................................................................................................................265IEEE arithmetic....................................................................................................................................265

Data types and formats..................................................................................................................265Normalized numbers.....................................................................................................................267Denormalized numbers..................................................................................................................267Special−case values.......................................................................................................................268NaNs and infinities........................................................................................................................269Rounding control...........................................................................................................................270Exceptions, sticky bits, and trap bits.............................................................................................270

Single−precision floating point operations..........................................................................................271Double−extended−precision................................................................................................................272IEEE requirements...............................................................................................................................273

Conversion of floating point formats to integer............................................................................273Square root....................................................................................................................................273Compares and unordered condition...............................................................................................273

Analyzing your code with lint........................................................................................................................275Why lint is an important tool...............................................................................................................275

Options and directives...................................................................................................................275lint and the compiler......................................................................................................................275Message formats............................................................................................................................276

What lint does......................................................................................................................................276Consistency checks........................................................................................................................276Portability checks..........................................................................................................................276

Programming in standard C and C++

v

Page 7: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

Table of ContentsAnalyzing your code with lint

Suspicious constructs....................................................................................................................278Usage....................................................................................................................................................279

lint libraries....................................................................................................................................280lint filters.......................................................................................................................................281Options and directives listed.........................................................................................................281

lint−specific messages.........................................................................................................................285

m4 macro processor........................................................................................................................................306Defining macros...................................................................................................................................307Quoting................................................................................................................................................308Arguments............................................................................................................................................309Arithmetic built−ins.............................................................................................................................311File inclusion........................................................................................................................................312Diversions............................................................................................................................................312System command.................................................................................................................................312Conditionals.........................................................................................................................................313String manipulation..............................................................................................................................313Printing.................................................................................................................................................314

Linking with the mapfile option....................................................................................................................315Using the mapfile option......................................................................................................................315

Mapfile structure and syntax.........................................................................................................316Segment declarations.....................................................................................................................316Mapping directives........................................................................................................................318Size−symbol declarations..............................................................................................................319

Mapping example.................................................................................................................................320Mapfile option defaults........................................................................................................................321Internal map structure..........................................................................................................................322Error messages.....................................................................................................................................324

Warnings.......................................................................................................................................324Fatal errors.....................................................................................................................................324

Enhanced asm facility.....................................................................................................................................326Definition of terms...............................................................................................................................326Example...............................................................................................................................................327

Definition.......................................................................................................................................327Use.................................................................................................................................................327

Using asm macros................................................................................................................................328Definition.......................................................................................................................................328

Writing asm macros.............................................................................................................................330

Programming in standard C and C++

vi

Page 8: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

Introduction to programming in standard C andC++Programming in standard C and C++ describes the C and C++ compilation system for UnixWare 7 anddiscusses how to transition to ANSI/ISO C. In addition, this topic includes material that most programmerswill find invaluable, but that does not lend itself to the reference manual format, such as the diagnosticsmessages found in ``Compiler diagnostics''.

Programming in standard C and C++ does not attempt to teach you how to program in C or C++, nor does itcover every system tool you might conceivably use in creating a C or C++ program. For details concerningother compilation system tools, see Software development tools and Programming with System Calls andLibraries. Tools for debugging , analyzing and profilling your programs are discussed in Debugging andanalyzing C and C++ programs.

The following two texts are recommended for programmers new to the C language:

Kernighan and Ritchie, The C Language, Second Edition, 1988, Prentice−Hall;• Harbison and Steele, C: A Reference Manual, Second Edition, 1987, Prentice−Hall.•

For programmers new to the C++ language, the following texts are recommended:

Stroustrup, The C++ Programming Language, Third Edition, Addison−Wesley, 1997.•

For implementation−specific details not covered in this book, refer to the Application Binary Interface foryour machine.

Summary of contents

The programming support tools covered in this topic can be grouped by function:

Creating an executable• Advanced programming utilities• Program Analysis•

In addition to the topics discussed here, Programming in standard C and C++ includes discussions ofassembly language escapes that use the keyword asm, and of mapfiles, a facility for mapping object file inputsections to executable file output segments.

Creating an executable

``C and C++ compilation system'' describes the C and C++ compilation system, the set of software tools thatyou use to generate an executable program from C or C++ language source files. It contains material that is ofinterest to the novice and expert programmer alike. It has been broken into two main subtopics:

``Compiling and linking'' details the command line syntax that is used to produce a binaryrepresentation of a program −− an executable object file. It also describes the options that let youtailor the link editor's behavior to your needs. A discussion of the advantages and disadvantages ofeach model is included.

1.

Introduction to programming in standard C and C++ 1

Page 9: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

``Libraries and header files'' focuses on the standard C library and the functions you use for standardI/O. It also describes the math library and libgen. The header files that you need to include in yourprogram if you call a function in these libraries are listed.

2.

``C language compilers'' provides a reference guide to the C language accepted by the C compilation system.``Compiler diagnostics'' lists the warning and error messages produced by the compilers. Check the codeexamples given in ``C compiler diagnostics'' when you need to clarify your understanding of the rules ofsyntax and semantics summarized in ``C language compilers''.

``C++ language'' provides an introduction to the C++ language and discusses the C++ dialect accepted by thecompiler. A description of error and warning messages produced by the C++ compiler can be found in``Compiler diagnostics''. ``Instantiating C++ templates'' describes manual and automatic templateinstantiation.

Advanced programming utilities

``Object files'' describes the executable and linking format (ELF) of the object code produced by the C andC++ compilation system. Strictly speaking, this discussion is required reading only for programmers whoneed to access and manipulate object files. However, because it provides a larger perspective on the workingsof the compilation system, especially the dynamic linking mechanism, it may prove useful to readers whoseek to widen their understanding of the material presented in other topics.

``Floating point operations'' details the standard single−precision, double−precision, and long−double datatypes, operations, and conversions for floating point arithmetic that are generated by the C and C++compilers. It also describes the low−level library functions that are provided to programmers who need thefull range of floating point support. Most users will not need to call low−level functions to use floating pointoperations in their programs.

``m4 macro processor'' describes m4, a general purpose macro processor that can be used to preprocess C andassembly language programs.

Program analysis

The lint program, described in ``Analyzing your code with lint'', checks for code constructs that may causeyour C program not to compile, or to execute with unexpected results. lint issues every error and warningmessage produced by the C compiler. It also issues ``lint−specific'' warnings about inconsistencies indefinition and use across files and about potential portability problems. A list of these warnings, withexamples of source code that would elicit them, is included.

Additional tools for program analysis and debugging are described in Debugging and analyzing C and C++programs. These tools include debug, prof, lprof and prof, and cscope.

NOTE: Use lint to check your program for portability and cross−file consistency, and to assure it willcompile. Use debug to locate a bug.

C and C++ compilation

The most important of the tools discussed in these pages is the C and C++ compilation system, whichtranslates C or C++ source code into the machine instructions of the computer your program is to run on. On

Introduction to programming in standard C and C++

Advanced programming utilities 2

Page 10: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

the UnixWare operating system, the command to do this for C source code is cc:

$ cc mycode.c

If your program is in multiple source files, then the command is

$ cc file1.c file2.c file3.c

and so on. As the examples suggest, the source files to be compiled must have names that end in thecharacters .c.

Similarly for C++ source code, the command is CC:

$ CC mycode.C

There are other things going on invisibly in these command lines that you will want to read about in ``C andC++ compilation system''. For now it's enough to note that either of these commands will create an executableprogram in a file called a.out in your current directory. The second command will also create in your currentdirectory object files that correspond to each of your source files:

$ ls −1 a.out file1.c file1.o file2.c file2.o file3.c file3.o

Each .o file contains a binary representation of the C language code in the corresponding source file. The cccommand creates and then links these object files to produce the executable object file a.out. The standard Clibrary functions that you have called in your program −− printf, for example −− are automatically linkedwith the executable at run time. You can, of course, avoid these default arrangements by using the commandline options to cc that we describe in ``C and C++ compilation system''.

You execute the program by entering its name after the system prompt:

$ a.out

Because the name a.out is only of temporary usefulness, you will probably want to rename your executable:

$ mv a.out myprog

You can also give your program a different name when you compile it −− with a cc command line option:

$ cc −o myprog file1.c file2.c file3.c

Here, too, you execute the program by entering its name after the prompt:

$ myprog

Introduction to programming in standard C and C++

Advanced programming utilities 3

Page 11: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

C language

The C language was developed on the UNIX operating system, and is used to code the UnixWare operatingsystem kernel. Most UNIX application programs are also written in C. However, the UnixWare operatingsystem supports many programming languages, and C compilers are available on many different operatingsystems.

``C language compilers'' provides a complete reference guide to the C language. Here are some features of thelanguage:

Basic data types−− characters, integers of various sizes, and floating point numbers;• Derived data types−− functions, arrays, pointers, structures, and unions;• A rich set of operators, including bit−wise operators;• Flow of control−− if, if−else, switch, while, do−while, and for statements.•

Application programs written in C usually can be transported to other machines without difficulty. Programswritten in ANSI standard C (conforming to standards set down by the American National Standards Institute)have an even higher degree of portability.

Programs that require direct interaction with the UnixWare operating system kernel for low−level I/O,memory management, and interprocess communication, can be written efficiently in C using the calls tosystem functions contained in the standard C library. See cpio(1), umask(1) and stat(2) for more information.

Modular programming in C

C is a language that lends itself readily to modular programming. Because the functions of a C program can becompiled separately, the next logical step is to put each function, or group of related functions, in its own file.Each file can then be treated as a component, or a module, of your program.

``C language compilers'' describes how to write C code so that the modules of your program can communicatewith each other. Coding a program in small pieces eases the job of making changes because you only need torecompile the revised modules. It also makes it easier to build programs from code you have written already.As you write functions for one program, you will find that many can be called into another.

C++ language

The C++ language was developed at AT&T Bell Laboratories as a general−purpose programming languagebased on C. In addition to the features provided by C, C++ provides data abstraction and support forobject−oriented design and programming through classes, multiple inheritance, virtual functions andtemplates. C++ also provides operator and function name overloading, reference types, memory managementoperators and inline functions. Features of C++ such as constant types and function argument checking andtype conversions have become part of the ANSI C programming language.

As with C, application programs written in C++ can usually be transported to a variety of machines. It isimportant to remember, however, that C++ was standardized more recently than C, and as a result differentC++ compilers still may differ in how closely they track the standard. Implementations may also differ invarious anachronisms of the language allowed. Users are discouraged from using anachronisms unless theyoccur in existing code that is difficult to change.

Introduction to programming in standard C and C++

C language 4

Page 12: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

Libraries and header files

The standard libraries supplied by the C and C++ compilation system contain functions that you can use inyour program to perform input/output, string handling, and other high−level operations that are not explicitlyprovided by the C language. Header files contain definitions and declarations that your program will need if itcalls a library function. The functions that perform standard I/O, for example, use the definitions anddeclarations in the header file stdio.h. Using the line:

#include <stdio.h>

in your program, assures that the interface between your program and the standard I/O library agrees with theinterface that was used to build the library.

``C and C++ compilation system'' describes some of the more important standard libraries and lists the headerfiles that you need to include in your program if you call a function in those libraries. It also shows you howto use library functions in your program and how to include a header file. You can, of course, create your ownlibraries and header files, following the examples of modular programming described in ``C and C++compilation system''.

C standard library header files have the file extension .h and can be used by both C and C++ programs. C++standard library header files have no file extension and can be used only by C++ programs.

How C and C++ programs communicate with the shell

Information or control data can be passed to a C or C++ program as an argument on the shell command line.For example, you can invoke the cc command with the names of your source files as arguments:

$ cc file1.c file2.c file3.c

When you execute either a C or C++ program, command line arguments are made available to the functionmain in two parameters, an argument count, conventionally called argc, and an argument vector,conventionally called argv. (Every C and C++ program is required to have an entry point named main.) argcis the number of arguments with which the program was invoked. argv is an array of pointers to characterstrings that contain the arguments, one per string. Because the command name itself is considered to be thefirst argument, or argv[n], the count is always at least one. Here is the declaration for main:

int main(int argc, char argv[])

For two examples of how you might use run−time parameters in your program, see ``C and C++ compilationsystem''.

The shell, which makes arguments available to your program, considers an argument to be any sequence ofnon−blank characters. Characters enclosed in single quotes ('abc def') or double quotes ("abc def") arepassed to the program as one argument even if blanks or tabs are among the characters. You are responsiblefor error checking and otherwise making sure that the argument received is what your program expects it tobe.

In addition to argc and argv, you can use a third argument; envp. envp is an array of pointers to environmentvariables. See envp(1) exec(2) and environ(5) for more information.

Introduction to programming in standard C and C++

Libraries and header files 5

Page 13: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

C and C++ programs exit voluntarily, returning control to the operating system, by returning from main or bycalling the exit function. A return(n) from main is equivalent to the call exit(n). (Remember that main hastype ``function returning int.'')

Your program should return a value to the operating system to say whether it completed successfully or not.The value gets passed to the shell, where it becomes the value of the $? shell variable if you executed yourprogram in the foreground. By convention, a return value of zero denotes success, and a non−zero returnvalue means an error occurred. Use the macros EXIT_SUCCESS and EXIT_FAILURE, defined in theheader file stdlib.h, as return values from main or argument values for exit.

Other tools

There are other programming support tools that do not receive extended treatment in the programming series.See cflow(1), ctrace(1), cxref(1), cof2elf(1), dis(1), dump(1), lorder(1), mcs(1), nm(1), size(1), and strip(1)for more information.

Tools for analyzing source code:

cflow produces a chart of the external references in C, lex, yacc, and assembly language files. Use itto check program dependencies.

ctrace prints out variables as each program statement is executed. Use it to follow the execution of aC program statement by statement.

cxref analyzes a group of C source files and builds a cross−reference table for the automatic, static,and global symbols in each file. Use it to check program dependencies and to expose programstructure.

Tools for reading and manipulating object files:

cof2elf translates object files in the common object file format (COFF) to the executable and linkingformat (ELF).

dis disassembles object files.• dump prints out selected parts of object files.• lorder generates an ordered listing of object files.• mcs manipulates the comment sections of an object file.• nm prints the symbol table of an object file.• size reports the number of bytes in an object file's sections or loadable segments.• strip removes symbolic debugging information and symbol tables from an object file.•

Introduction to programming in standard C and C++

Other tools 6

Page 14: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

C and C++ compilation systemThis topic describes the UNIX®/ operating system tools used to generate an executable program from C andC++ language source files.

``Compiling and linking'', details the command line syntax used to produce a binary representation of aprogram −− an executable object file. It concentrates on the options to the cc(1) and CC(1C++) commandsthat control the process in which object files are created from source files, then linked with each other andwith the library functions that are called in your program. As outlined in ``Introduction to programming instandard C and C++'', the major focus of this topic is on static vs. dynamic linking: how each model isimplemented and invoked, and its relative merits.

Standard libraries are the focus of ``Libraries and header files''. Because the C language contains no intrinsicinput/output facility, I/O must be carried out by explicitly called functions. On the UNIX® operating system,the functions that perform these and other high−level tasks have been standardized and grouped in libraries.They are convenient, portable, and optimized for your machine.

Likewise, C++ input/putput is supported by the iostream library, which is a component in the C++ StandardLibraries. These libraries are described in Programming with the pre−standard C++ libraries.

Header files contain definitions and declarations that serve as the interface between your program and thefunctions in these libraries. They also contain several functions such as getc and putc, that actually aredefined as macros. The manual page will generally tell you whether what you are using is a macro or afunction. Macros and functions are both used in the same way in your program. The descriptions of standardlibraries in ``Libraries and header files'' show the header files that you need to include in your program if youcall a function in those libraries. The manual page for each function also lists the required header files. ``Howto use library functions'' shows how to use library functions in your program and how to include header files.

Compiling and linking

Once you have created the C or C++ source program, your next task is to transform this source into anexecutable file. The cc command provides the compilation and linking capabilities, among others, for Csource programs. The CC command provides the same capabilities for C++ source programs.

Components of the C compilation system

The C compilation system consists of a preprocessor, compiler, assembler, and link editor. The cc commandinvokes each of these components automatically unless you use command line options to specify otherwise.An executable C program is created by exposing the source code to these components via the cc command.

C preprocessor

The preprocessor component of the compiler reads lines in your source files that direct it to replace a namewith a token string (#define), perhaps conditionally (#if, for example). It also accepts directives in your sourcefiles to include the contents of a named file in your program (#include). Included header files for the mostpart consist of #define directives and declarations of external symbols, definitions and declarations that youwant to make available to more than one source file. See ``Libraries and header files'' for details.

C and C++ compilation system 7

Page 15: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

C compiler

The compiler proper translates the C language code in your source files, which now contain the preprocessedcontents of any included header files, into assembly language code.

C assembler

The assembler translates the assembly language code into the machine instructions of the computer yourprogram is to run on. As indicated in ``Introduction to programming in standard C and C++'', theseinstructions are stored in object files that correspond to each of your source files. Each object file contains abinary representation of the C language code in the corresponding source file. Object files are made up ofsections, of which there are usually at least two. The text section consists mainly of program instructions. Textsections normally have read and execute, but not write, permissions. Data sections normally have read, write,and execute permissions. See ``Object files'' for details of the object file format.

C linker

The link editor links these object files with each other and with any library functions that you have called inyour program. When it links with the library functions depends on the link editing model you have chosen.

An archive, or statically linked library, is a collection of object files each of which contains the codefor a function or a group of related functions in the library. When you use a library function in yourprogram, and specify a static linking option on the cc command line, a copy of the object file thatcontains the function is incorporated in your executable at link time.

A shared object, or dynamically linked library, is a single object file that contains the code for everyfunction in the library. When you call a library function in your program, and specify a dynamiclinking option on the cc command line, the entire contents of the shared object are mapped into thevirtual address space of your process at run time. As its name implies, a shared object contains codethat can be used simultaneously by different programs at run time.

See ``Link editing'' for more information on the two ways in which libraries are implemented, and details onhow to combine the static and dynamic linking approaches in different ways according to your needs.

Organization of the C compilation system

``Organization of C Compilation System'' shows the organization of the C compilation system. Detailsconcerning the optimizer are omitted here because it is optional. See ``Commonly used command lineoptions'' for more information concerning invoking the optimizer.

C and C++ compilation system

Components of the C compilation system 8

Page 16: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

Organization of C Compilation System

Components of the C++ compilation system

The organization of the C++ compilation system is similar to that of the C compilation system. It consists of apreprocessor, compiler, assembler and link editor. In fact, the code generation portion of the compiler, theassembler and link editor are those used in a C compilation. There is also a C++ name filter (demangler) and aC++ prelink phase. The CC command invokes each of these components automatically, unless otherwisedirected, to create an executable C++ program from the source code.

C++ preprocessor

The C++ preprocessor performs the same function as the C preprocessor. See ``C preprocessor'' for details.

C++ compiler

Like the C compiler, the C++ compiler proper translates the C++ language code in your source files, whichnow contain the preprocessed contents of any included header files, into assembly language code. Thecompiler does not use C language as an intermediate form.

C++ assembler

The C++ assembler is the same as that used for C, and performs the same functions for your C++ program.See ``C assembler'' for details.

C++ prelinker

The C++ prelinker is a "linker feedback" mechanism which supports user−requested automatic templateinstantiation. The CC command invokes the prelinker once a complete set of object files has been generatedto determine whether any new instantiations are required or if any existing instantiations are no longer needed.Recompilation of source files is done as needed to generate any required instantiation. See ``Instantiating C++templates'' for an explanation of how C++ templates are instantiated.

C and C++ compilation system

Components of the C++ compilation system 9

Page 17: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

C++ linker

The C++ linker is the same as that used for C, and performs the same functions for your C++ program. See``C linker''

C++ name filter

To support C++ operator functions and overloaded functions, enable type−safe linkage, implementnamespaces, provide unique function names for the link editor, and so on, the C++ compiler encodes many ofthe names it uses. These encoded names can be quite cryptic.

The c++filt demangling name filter is used by the CC command to display diagnostics from the codegenerator, prelinker and linker in a format corresponding to the declaration or definition in your C++ sourceprogram. In addition, c++filt(1C++) is available to decode output of other utilities that may work with C++object files or libraries.

Organization of the C++ compilation system

``Organization of C++ Compilation System'' shows the organization of the C compilation system. Detailsconcerning the optimizer are omitted here because it is optional. See ``Commonly used command lineoptions'' for more information concerning invoking the optimizer.

Organization of C++ Compilation System

Basic cc and CC command line syntax

Now let's look at how this process works for a C language program to print the words hello, world.

hello world program

Here is the source code for the program, which we have written in the file hello.c:

#include <stdio.h> main() {

C and C++ compilation system

Components of the C++ compilation system 10

Page 18: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

printf("hello, world\n"); }

The code above is accepted by both the C and C++ compilers.

An alternative C++ source file that makes use of the C++ input/output classes would be:

#include <iostream.h> main () { cout << "hello, world" << endl; }

Creating the executable

As indicated in ``Introduction to programming in standard C and C++'', the UNIX® operating systemcommand to create an executable program from C language source files is cc:

$ cc hello.c

The source files to be compiled by the C compiler must have names that end in the characters .c. C++ sourcefiles are compiled with the CC command:

$ CC hello.c

The source files to be compiled by the C++ compiler must have names that end in any of the following:

.c .C .cpp .CPP .cxx .CXX .CC .c++ .C++

Because there aren't any syntactic or semantic errors in the source code, either of the above commands willcreate an executable program in the file a.out in the current directory:

$ ls −1 a.out hello.c

Note that a .o file is not created when you compile a single source file.

Executing the program

Execute the program by entering its name after the system prompt:

$ a.out hello, world

Because the name a.out is only of temporary usefulness, we'll rename the executable:

C and C++ compilation system

Basic cc and CC command line syntax 11

Page 19: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

$ mv a.out hello

then, invoke the program using this new name:

$ hello hello, world

Specifying a different executable name

You can also give the program the name hello when you compile it, with the −o option to the cc command:

$ cc −o hello hello.c

or

$ CC −o hello hello.c

Execute the program by entering hello after the system prompt:

$ hello hello, world

Invoking the preprocessor only

Now let's look at how the cc and CC commands control the steps in the process described in ``Components ofthe C compilation system'' When you specify the −P option to cc or CC, only the preprocessor component ofthe compiler is invoked:

$ cc −P hello.c

The preprocessor's output −− the source code plus the preprocessed contents of the header file −− is left in thefile hello.i in the current directory:

$ ls −1 hello.c hello.i

That output could be useful if, for example, you received a compiler error message for the undefined symbol ain the following fragment of source code:

if (i > 4) { / declaration follows int a; / end of declaration / a = 4; }

The unterminated comment on the third line will cause the compiler to treat the declaration that follows it aspart of a comment.

NOTE: The C++ compiler will identify the bad comment and tell you about it.

C and C++ compilation system

Basic cc and CC command line syntax 12

Page 20: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

Because the preprocessor removes comments, its output

if (i > 4) {

a = 4; } will clearly show the effect of the unterminated comment on the declaration. You can also use thepreprocessed output to examine the results of conditional compilation and macro expansion.

Invoking the preprocessor and compiler only

If you specify the −S option to the cc or CC command, only the preprocessor and compiler phases areinvoked:

$ cc −S hello.c

The output −− the assembly language code for the compiled source −− is left in the file hello.s in the currentdirectory. That output could be useful if you were writing an assembly language routine and wanted to seehow the compiler went about a similar task.

Suppressing the linker phase

If, finally, you specify the −c option to cc or CC, all the components but the link editor are invoked:

$ cc −c hello.c

The output −− the assembled object code for the program −− is left in the object file hello.o in our currentdirectory. You would typically want this output when using make.

NOTE: See Software development tools for further information regarding make.

Enter the command

$ cc hello.o

to create the executable object file a.out. By default, the link editor arranges for the standard C or C++ libraryfunction that was called in your program −− printf or cout −− to be linked with the executable at run time.

NOTE: If you are compiling C++ source files, they must be linked with CC. C source files can be linked witheither cc or CC.

Passing files to cc and CC

The outputs described above are, of course, inputs to the components of the compilation system. They are notthe only inputs, however. The link editor, for example, will supply code that runs just before and just afteryour program to do startup and cleanup tasks. This code is automatically linked with your program only whenthe link editor is invoked through cc or CC. Also, CC automatically runs the prelinker to instantiatetemplates. Therefore, cc hello.o was specified in the previous example rather than ld hello.o. For similar

C and C++ compilation system

Basic cc and CC command line syntax 13

Page 21: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

reasons, you should invoke the assembler through cc or CC rather than as:

$ cc hello.s

if you want to assemble and link hello.s.

Compiling multiple source files

As indicated in ``Introduction to programming in standard C and C++'', the compilation process is largelyidentical if your program is in multiple source files. The only difference is that the default cc or CC commandline will create object files, as well as the executable object file a.out, in your current directory:

$ cc file1.c file2.c file3.c $ ls −1 a.out file1.c file1.o file2.c file2.o file3.c file3.o

If one of your source files fails to compile, you need not recompile the others. If you receive a compiler errordiagnostic for file1.c in the above command line, your current directory will look like this:

$ ls −1 file1.c file2.c file2.o file3.c file3.o

Compilation proceeds but linking is suppressed. Assuming you have fixed the error, the following command

$ cc file1.c file2.o file3.o

will create the object file file1.o and link it with file2.o and file3.o to produce the executable program a.out.As the example suggests, source files are compiled separately and independently. To create an executableprogram, the link editor must connect the definition of a symbol in one source file with external references toit in another.

A note on command line options

Not all the command line options discussed are compiler options. The −o option is actually an ld option that isaccepted by the cc or CC command and passed to the link editor which creates the executable program. Seethe cc(1), CC(1C++) and ld(1) manual pages for more information.

Commonly used command line options

cc and CC command line options let you

specify the order in which directories are searched for included header files• prepare your program for symbolic debugging or profiling• optimize your program•

C and C++ compilation system

Basic cc and CC command line syntax 14

Page 22: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

administer your software•

Searching for a header file

Recall that the first line of our sample C program was

#include <stdio.h>

The format of that directive is the one you should use to include any of the standard header files that aresupplied with the C and C++ compilation systems. The angle brackets (< >) tell the preprocessor to search forthe header file in the standard place for header files on your system. For C, this is usually the /usr/includedirectory. For C++ the standard place is usually /usr/include/CC, or, if the file is not found there, /usr/include.The format is different for header files that you have stored in your own directories:

#include "header.h"

The quotation marks (``" "'') tell the preprocessor to search for header.h first in the directory of the filecontaining the #include line, which will usually be your current directory, then in the standard place.

If your header file is not in the current directory, specify the path of the directory in which it is stored with the−I option to cc or CC. For instance, if you have included both stdio.h and header.h in the source file test.c:

#include <stdio.h> #include "header.h"

and header.h is stored in the directory ../defs, the command

$ cc −I../defs test.c

will direct the preprocessor to search for header.h first in the current directory, then in the directory ../defs,and finally in the standard place. It will also direct the preprocessor to search for stdio.h first in ../defs, then inthe standard place. The only difference is that the current directory is searched only for header files whosename you have enclosed in quotation marks.

You can specify the −I option more than once on the cc or CC command line. The preprocessor will searchthe specified directories in the order they appear on the command line. You can therefore specify multipleoptions to cc or CC on the same command line:

$ cc −o prog −I../defs test.c

The C++ compiler provides a mechanism which may allow you to avoid recompiling a set of header files,producing an improvement in compilation time. See ``C++ precompiled headers'' for an explanation of thismechanism.

Preparing your program for debugging

When you specify the −g option to cc or CC

$ cc −g test.c

C and C++ compilation system

Commonly used command line options 15

Page 23: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

you arrange for the compiler to generate information about program variables and statements that will be usedby the debugger debug. The information supplied to debug will allow you to use it to trace function calls,display the values of variables, and set breakpoints. See ``Using the command line interface of debug'' and``Using the graphical interface of debug'' for further information.

Preparing your program for profiling

To use one of the profilers that are supplied with the C and C++ compilation systems, you must do twothings:

Compile and link your program with a profiling option:

prof: $ cc −qp test.clprof: $ cc −ql test.cfprof: $ cc −qf test.c

1.

Run the profiled program:

$ a.out

2.

At the end of execution, data about your program's run−time behavior are written to a file in your currentdirectory:

prof: mon.out lprof: prog.cnt

where prog is the name of the profiled program. The files are inputs to the profilers. See ``Analyzingrun−time behavior'' for further information.

Optimizing your program

The −O option to either cc or CC invokes the optimizer:

$ cc −O test.c

The optimizer improves the efficiency of the assembly language code generated by the compiler. That, in turn,will speed the execution time of your object code. Use the optimizer when you have finished debugging andprofiling your program.

If you know the processor your code will normally be run on, you can specify that the compilation processshould generate code that is optimized specifically for that processor. For more information, see the −Koption on the cc(1) and CC(1C++) manual pages.

Software administration

The −Qy option causes identification information about each invoked compilation tool to be added to theoutput file. This information can be useful for software administration. The information is placed in the.comment section of the resulting object file and can be printed by using the mcs command. For example:

mcs −p object_file | grep acomp

will find the identification information associated with the compiler, acomp.

C and C++ compilation system

Commonly used command line options 16

Page 24: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

Link editing

This topic examines the link editing process in detail. It starts with the default arrangement, and with thebasics of linking your program with the standard libraries supplied by the C and C++ compilation system. Italso details the implementation of the dynamic linking mechanism, and looks at some coding guidelines andmaintenance tips for shared library development.

NOTE: Because this topic tries to cover the widest possible audience, it may provide more background thanmany users will need to link their programs with C and C++ language libraries. If you are interested only inthe how−to, and are comfortable with a purely formal presentation that scants motivation and backgroundalike, you may want to skip to the quick−reference guide in the last subsection.

Link editing refers to the process in which a symbol referenced in one module of your program is connectedwith its definition in another −− more concretely, the process by which the symbol printf in our samplesource file hello.c is connected with its definition in the standard C library. Whichever link editing model youchoose, static or dynamic, the link editor will search each module of your program, including any libraries youhave used, for definitions of undefined external symbols in the other modules. If it does not find a definitionfor a symbol, the link editor will report an error by default, and fail to create an executable program. Multiplydefined symbols are treated differently, however, under each approach. For details, see ``Handling multiplydefined symbols''. The principal difference between static and dynamic linking lies in what happens after thissearch is completed.

Static linking

Under static linking, copies of the archive library object files that satisfy still unresolved external references inyour program are incorporated in your executable at link time. External references in your program areconnected with their definitions −− assigned addresses in memory −− when the executable is created.

Dynamic linking

Under dynamic linking, the contents of a shared object are mapped into the virtual address space of yourprocess at run time. External references in your program are connected with their definitions when theprogram is executed.

You might prefer dynamic to static linking because of the following reasons:

Dynamically linked programs save disk storage and system process memory by sharing library codeat run time.

Dynamically linked code can be fixed or enhanced without having to relink applications that dependon it.

Default arrangement

The default cc command line

$ cc file1.c file2.c file3.c

creates object files corresponding to each of your source files, and links them with each other to create an

C and C++ compilation system

Link editing 17

Page 25: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

executable program. These object files are called relocatable object files because they contain references tosymbols that have not yet been connected with their definitions −− have not yet been assigned addresses inmemory.

This command line arranges for the standard C library functions that you have called in your program to belinked with your executable automatically. By default, the linker looks for these functions in the file libc.so.

NOTE: The standard C library, libc.so, is not a pure shared object library, as it contains both run−timeloadable and statically linkable functions. If you use one of these static functions in your program, the codefor the function will be statically bound to your executable at link time.

The standard C library contains the system calls described in Section 2 and the C language functionsdescribed in Section 3, Subsections 3C and 3S.

NOTE: See ``Libraries and header files'' for details.

The C++ Standard Library is contained within the dynamic library libC.so and the archive library libC.a. Itscontents are documented in the 3C++std man pages, see Intro(3C++std) and the 3C++ man pages, seeIntro(3C++).

Now let's look at the formal basis for this arrangement:

By convention, shared objects, or dynamically linked libraries, are designated by the prefix lib and thesuffix .so; archives, or statically linked libraries, are designated by the prefix lib and the suffix .a withthe exception of libc.so as explained in the previous NOTE section.

1.

These conventions are recognized, in turn, by the −l option to the cc and CC commands.

$ cc file1.c file2.c file3.c −lx

directs the link editor to search the shared object libx.so or the archive library libx.a. The cc commandautomatically passes −lc to the link editor. The CC command automatically passes −lC −lc to the linkeditor.

2.

By default, the link editor chooses the shared object implementation of a library, libx.so, in preferenceto the archive library implementation, libx.a, in the same directory.

3.

By default, the link editor searches for libraries in the standard places on your system, /usr/ccs/lib and/usr/lib, in that order. The standard libraries supplied by the compilation system normally are kept in/usr/ccs/lib.

4.

Therefore, the default cc command line will direct the link editor to search /usr/ccs/lib/libc.so rather than itsarchive library counterpart.

In ``Creating and linking with archive libraries'' we'll show you how to link your program with the archiveversion of libc to avoid the dynamic linking default. Of course, you can link your program with libraries thatperform other tasks as well. Finally, you can create your own shared objects and archive libraries.

Under the default arrangement the cc command creates and then links relocatable object files to generate anexecutable program, then arranges for the executable to be linked with the shared C library at run time. If youare satisfied with this arrangement, you need make no other provision for link editing on the cc command line.

C and C++ compilation system

Link editing 18

Page 26: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

Linking with standard libraries

A shared object is a single object file that contains the code for every function in a given library. When youcall a function in that library, and dynamically link your program with it, the entire contents of the sharedobject are mapped into the virtual address space of your process at run time.

Archive libraries are configured differently. Each function, or small group of related functions (typically, therelated functions that you will sometimes find on the same manual page), is stored in its own object file.These object files are then collected in archives that are searched by the link editor when you specify thenecessary options on the cc command line. The link editor makes available to your program only the objectfiles in these archives that contain a function you have called in your program.

Turning off dynamic linking

As noted, libc.a is the archive version of the standard C library. The cc command will direct the link editor tosearch libc.a if you turn off the dynamic linking default with the −dn option:

$ cc −dn file1.c file2.c file3.c

Copies of the object files in libc.a that resolve still unresolved external references in your program will beincorporated in your executable at link time.

Linking with other standard libraries

If you need to point the link editor to standard libraries that are not searched automatically, you specify the −loption explicitly on the cc command line. As shown previously, −lx directs the link editor to search the sharedobject libx.so or the archive library libx.a. So if your program calls the function sin, for example, in thestandard math library libm, the command

$ cc file1.c file2.c file3.c −lm

will direct the link editor to search for /usr/ccs/lib/libm.so, and if it does not find it, /usr/ccs/lib/libm.a, tosatisfy references to sin in your program. Because the compilation system does not supply a shared objectversion of libm, the above command will direct the link editor to search libm.a unless you have installed ashared object version of libm in the standard place. Note that because the dynamic linking default was notturned off with the −dn option, the above command will direct the link editor to search libc.so rather thanlibc.a. You would use the same command with the −dn option to link your program statically with libm.a andlibc.a. The contents of libm are described in ``Math library (libm)''

NOTE: Because the link editor searches an archive library only to resolve undefined external references it haspreviously seen, the placement of the −l option on the cc command line is important. The command

$ cc −dn file1.c −lm file2.c file3.c

will direct the link editor to search libm.a only for definitions that satisfy still unresolved external referencesin file1.c. As a rule, then, it is best to put −l at the end of the command line.

C and C++ compilation system

Link editing 19

Page 27: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

Creating and linking with archive libraries

This topic describes the basic mechanisms by which archives and shared objects are built. The idea is to giveyou some sense of where these libraries come from, as a basis for understanding how they are implementedand linked with your programs.

The following commands

$ cc −c function1.c function2.c function3.c $ ar −r libfoo.a function1.o function2.o function3.o

will create an archive library, libfoo.a, that consists of the named object files.

NOTE: See ar(1) for details of usage.

When you use the −l option to link your program with libfoo.a

$ cc −Ldir file1.c file2.c file3.c −lfoo

the link editor will incorporate in your executable only the object files in this archive that contain a functionyou have called in your program. Note, again, that because the dynamic linking default was not turned offwith the −dn option, the above command will direct the link editor to search libc.so as well as libfoo.a.

Creating and linking with C shared object libraries

Create a shared object library by specifying the −G option to the link editor:

$ cc −G −o libfoo.so −K PIC function1.c function2.c function3.c

That command will create the shared object libfoo.so consisting of the object code for the functions containedin the named files.

NOTE: See ``Implementation'' for details of compiler option −K PIC.

When you use the −l option to link your program with libfoo.so

$ cc −Ldir file1.c file2.c file3.c −lfoo

the link editor will record in your executable the name of the shared object and a small amount ofbookkeeping information for use by the system at run time. Another component of the system −− the dynamiclinker −− does the actual linking.

NOTE: Because shared object code is not copied into your executable object file at link time, a dynamicallylinked executable normally will use less disk space than a statically linked executable. For the same reason,shared object code can be changed without breaking executables that depend on it. Even if the shared Clibrary were enhanced in the future, you would not have to relink programs that depended on it as long as theenhancements were compatible with your code. The dynamic linker would simply use the definitions in thenew version of the library to resolve external references in your executables at run time. See ``Checking forrun−time compatibility'' for more information.

C and C++ compilation system

Link editing 20

Page 28: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

Naming your shared object

You can specify the name of the shared object that you want to create under the −G option. The followingcommand, for example, will create a shared object called a.out:

$ cc −G function1.o function2.o function3.o

You can then rename the shared object:

$ mv a.out libfoo.so

As noted, you use the lib prefix and the .so suffix because they are conventions recognized by −l, just as arelib and .a for archive libraries. So while it is legitimate to create a shared object that does not follow thenaming convention, and to link it with your program

$ cc −G −o sharedob function1.o function2.o function3.o $ cc file1.c file2.c file3.c /path/sharedob

we recommend against it. Not only will you have to enter a path name on the cc command line every time youuse sharedob in a program, that path name will be hard−coded in your executables.

The command line

$ cc −Ldir file1.c file2.c file3.c −lfoo

directs the link editor to record in your executable the name of the shared object with which it is to be linkedat run time.

NOTE: cc links the name of the shared object, not its path name.

When you use the −l option to link your program with a shared object library, not only must the link editor betold which directory to search for that library, so must the dynamic linker (unless the directory is the standardplace, which the dynamic linker searches by default). See ``Specifying directories to be searched by thedynamic linker'' for more information about pointing to the dynamic linker. However, as long as the pathname of a shared object is not hard−coded in your executable, you can move the shared object to a differentdirectory without breaking your program. You should avoid using path names of shared objects on the cccommand line. Those path names will be hard−coded in your executable. They won't be if you use −l.

Linking a shared object with another library

Finally, the cc −G command will not only create a shared object, it will accept a shared object or archivelibrary as input. When you create libfoo.so, you can link it with a library you have already created such aslibsharedob.so:

$ cc −G −o libfoo.so −Ldir function1.o function2.o \ function3.o −lsharedob

That command will arrange for libsharedob.so to be linked with libfoo.so when, at run time, libfoo.so islinked with your program. It will also arrange for ld to search libsharedobj.so for unresolved symbols whenyou link a program with libfoo.so. Note that here you will have to point the dynamic linker to the directoriesin which both libfoo.so and libsharedob.so are stored.

C and C++ compilation system

Link editing 21

Page 29: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

In order to link your program with libfoo.so, you will have to point the link editor to the directories in whichlibfoo.so and libsharedobj.so are stored. In the following discussions, libsharedobj.so will be referred to asthe needed library.

Specifying directories to be searched by the link editor

In the previous section you created the archive library libfoo.a and the shared objects libsharedobj.so andlibfoo.so. For this example, all three of these libraries are stored in the directory /home/mylibs, and theexecutable is being created in a different directory. This example reflects the way most programmers organizetheir work on the UNIX® operating system.

In order to link your program with either of these libraries, the link editor must access the /home/mylibsdirectory. Specify the directory's path name with the −L option:

$ cc −L/home/mylibs file1.c file2.c file3.c −lfoo

The −L option directs the link editor to search for the libraries named with −l and the needed libraries first inthe specified directory, then in the standard places. In this example, having found the directory /home/mylibs,the link editor will search libfoo.so rather than libfoo.a. As shown earlier, when the link editor encountersotherwise identically named shared object and archive libraries in the same directory, it searches the librarywith the .so suffix by default. For the same reason, it will search libc.so here rather than libc.a. Note that youmust specify −L if you want the link editor to search for libraries in your current directory. You can use aperiod (.) to represent the current directory.

To direct the link editor to search libfoo.a, you can turn off the dynamic linking default:

$ cc −dn −L/home/mylibs file1.c file2.c file3.c −lfoo

Under −dn, the link editor will not accept shared objects as input. Here, then, it will search libfoo.a ratherthan libfoo.so, and libc.a rather than libc.so. Note that libsharedobj.so will not be searched because libfoo.ais an archive library.

To link your program statically with libfoo.a and dynamically with libc.so, you can do either of two things.First, you can move libfoo.a to a different directory −− /home/archives, for example −− then specify/home/archives with the −L option:

$ cc −L/home/archives −L/home/mylibs file1.c file2.c \ file3.c −lfoo

As long as the link editor encounters the /home/archives directory before it encounters the /home/mylibsdirectory, it will search libfoo.a rather than libfoo.so. When otherwise identically named .so and .a librariesexist in different directories, the link editor will search the first one it finds. The same thing is true, by theway, for identically named libraries of either type. If you have different versions of libfoo.a in yourdirectories, the link editor will search the first one it finds.

A better alternative might be to leave libfoo.a where you had it in the first place and use the −Bstatic and−Bdynamic options to turn dynamic linking off and on. The following command will link your programstatically with libfoo.a and dynamically with libc.so:

$ cc −L/home/mylibs file1.c file2.c file3.c −Bstatic \ −lfoo −Bdynamic

C and C++ compilation system

Link editing 22

Page 30: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

When you specify −Bstatic, the link editor will not accept a shared object as input until you specify−Bdynamic. You can use these options as toggles as often as needed on the cc command line:

$ cc −L/home/mylibs file1.c file2.c −Bstatic −lfoo \ file3.c −Bdynamic −lsharedob

That command will direct the link editor to search the following libraries:

libfoo.a to resolve still unresolved external references in file1.c and file2.c;1. libsharedob.so to resolve still unresolved external references in all three files and in libfoo.a;2. libc.so to resolve still unresolved external references in all three files and the preceding libraries.3.

Files, including libraries, are searched for definitions in the order they are listed on the cc command line. Thestandard C library is always searched after the libraries named with −l and before the needed libraries..

You can add to the list of directories to be searched by the link editor by using the environment variableLD_LIBRARY_PATH. LD_LIBRARY_PATH must be a list of colon−separated directory names. Anoptional second list is separated from the first by a semicolon:

$ LD_LIBRARY_PATH=dir:dir;dir:dir export LD_LIBRARY_PATH

The directories specified before the semicolon are searched, in order, before the directories specified with −L;the directories specified after the semicolon are searched, in order, after the directories specified with −L.Note that you can use LD_LIBRARY_PATH in place of −L altogether. In that case the link editor willsearch for libraries named with −l and the needed libraries first in the directories specified before thesemicolon, next in the directories specified after the semicolon, and last in the standard places. You should useabsolute path names when you set this environment variable.

NOTE: LD_LIBRARY_PATH is also used by the dynamic linker. If LD_LIBRARY_PATH exists in yourenvironment, the dynamic linker will search the directories named in it for shared objects to be linked withyour program at execution. In using LD_LIBRARY_PATH with the link editor or the dynamic linker, then,you should keep in mind that any directories you give to one you are also giving to the other.

Specifying directories to be searched by the dynamic linker

When you use the −l option, you must point the dynamic linker to the directories of the shared objects that areto be linked with your program at execution. The environment variable LD_RUN_PATH lets you do that atlink time. To set LD_RUN_PATH, list the absolute path names of the directories you want searched in theorder you want them searched. Separate path names with a colon, as shown in the following example:

$ LD_RUN_PATH=/home/mylibs1:/home/mylibs2 export LD_RUN_PATH

The command

$ cc −o prog −L/home/mylibs file1.c file2.c file3.c −lfoo

will direct the dynamic linker to search for libfoo.so in /home/mylibs1 then /home/mylibs2 when you executeyour program:

$ prog

C and C++ compilation system

Link editing 23

Page 31: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

The dynamic linker searches the standard place by default, after searching the directories you have assigned toLD_RUN_PATH (/home/mylibs1 and /home/mylibs2). Note that as far as the dynamic linker is concerned,the standard place for libraries is /usr/lib. Any executable versions of libraries supplied by the compilationsystem are kept in /usr/lib.

The environment variable LD_LIBRARY_PATH lets you do the same thing at run time. Suppose you havemoved libfoo.so to /home/sharedobs. It is too late to replace /home/mylibs with /home/sharedobs inLD_RUN_PATH, at least without link editing your program again. You can, however, assign the newdirectory to LD_LIBRARY_PATH, as follows:

$ LD_LIBRARY_PATH=/home/sharedobs export LD_LIBRARY_PATH

Now when you execute your program

$ prog

the dynamic linker will search for libfoo.so first in /home/mylibs1 then in /home/mylibs2 and, not finding it ineither directory, in /home/sharedobs. The directories assigned to LD_RUN_PATH are searched before thedirectories assigned to LD_LIBRARY_PATH. The important point is that because the path name oflibfoo.so is not hard−coded in prog, you can direct the dynamic linker to search a different directory whenyou execute your program. You can move a shared object without breaking your application.

You can set LD_LIBRARY_PATH without first having set LD_RUN_PATH. The main difference betweenthem is that once you have used LD_RUN_PATH for an application, the dynamic linker will search thespecified directories every time the application is executed (unless you have relinked the application in adifferent environment). In contrast, you can assign different directories to LD_LIBRARY_PATH each timeyou execute the application. LD_LIBRARY_PATH directs the dynamic linker to search the assigneddirectories before it searches the standard place. Directories, including those in the optional second list, aresearched in the order listed.

NOTE: For security, the dynamic linker ignores LD_LIBRARY_PATH for set−user and set−group IDprograms and for privileged processes. It does, however, search LD_RUN_PATH directories and /usr/lib.

Implementation

The following lists the basic implementation of the static and dynamic linking mechanisms:

When you use an archive library function, a copy of the object file that contains the function isincorporated in your executable at link time. External references to the function are assigned virtualaddresses when the executable is created.

When you use a shared library function, the entire contents of the library are mapped into the virtualaddress space of your process at run time. External references to the function are assigned virtualaddresses when you execute the program. The link editor records in your executable only the name ofthe shared object and a small amount of bookkeeping information for use by the dynamic linker at runtime.

There are one or two cases in which you might not want to use dynamic linking. Because shared object codeis not copied into your executable object file at link time, a dynamically linked executable normally will useless disk space than a statically linked executable. If your program calls only a few small library functions,however, the bookkeeping information to be used by the dynamic linker may take up more space in your

C and C++ compilation system

Link editing 24

Page 32: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

executable than the code for those functions. You can use the size command to determine the difference. Seesize(1) for more information.

In a similar way, using a shared object may occasionally add to the memory requirements of a process.Although a shared object's text is shared by all processes that use it, its writable data typically is not. See``Guidelines for building shared objects'' for details. Every process that uses a shared object usually gets aprivate copy of its entire data segment, regardless of how much of the data is needed. If an application usesonly a small portion of a shared library's text and data, executing the application might require more memorywith a shared object than without one. For example, it would waste memory to use the standard C sharedobject library to access only strcmp. Although sharing strcmp saves space on your disk and memory on thesystem, the memory cost to your process of having a private copy of the C library's data segment would makethe archive version of strcmp the more appropriate choice.

Now let's consider dynamic linking in a bit more detail. First, each process that uses a shared object referencesa single copy of its code in memory. That means that when other users on your system call a function in ashared object library, the entire contents of that library are mapped into the virtual address space of theirprocesses as well. If they have called the same function as you, external references to the function in theirprograms will, in all likelihood, be assigned different virtual addresses. Because the function may be loaded ata different virtual address for each process that uses it, the system cannot calculate absolute addresses inmemory until run time.

Second, the memory management scheme underlying dynamic linking shares memory among processes at thegranularity of a page. Memory pages can be shared as long as they are not modified at run time. If a processwrites to a shared page while relocating a reference to a shared object, it gets a private copy of that page andloses the benefits of code sharing without affecting other users of the page.

Third, to create programs that require the least possible amount of page modification at run time, the compilergenerates position−independent code under the −K PIC option. Whereas executable code normally must betied to a fixed address in memory, position−independent code can be loaded anywhere in the address space ofa process. Because the code is not tied to specific addresses, it will execute correctly −− without pagemodification −− at a different address in each process that uses it. As we have indicated, you should specify−K PIC when you create a shared object:

$ cc −K PIC −G −o libfoo.so function1.c function2.c \ function3.c

Relocatable references in your object code will be moved from its text segment to tables in the data segment.See ``Object files'' for details.

Handling multiply defined symbols

Multiply defined symbols −− except for different−sized initialized data objects −− are not reported as errorsunder dynamic linking. The link editor will not report an error for multiple definitions of a function or asame−sized data object when each such definition resides within a different shared object or within adynamically linked executable and different shared objects. The dynamic linker will use the definition inwhichever object occurs first on the cc command line. You can, however, specify −Bsymbolic when youcreate a shared object

$ cc −K PIC −G −Bsymbolic −o libfoo.so function1.c \ function2.c function3.c

C and C++ compilation system

Link editing 25

Page 33: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

to insure that the dynamic linker will use the shared object's definition of one of its own symbols, rather than adefinition of the same symbol in an executable or another library.

In contrast, multiply defined symbols are generally reported as errors under static linking, because definitionsof so−called weak symbols can be hidden from the link editor by a definition of a global symbol. If a definedglobal symbol exists, the appearance of a weak symbol with the same name will not cause an error.

To illustrate this, let's look at our own implementation of the standard C library. This library provides servicesthat users are allowed to redefine and replace. At the same time, however, ANSI C defines standard servicesthat must be present on the system and cannot be replaced in a strictly conforming program. fread, forexample, is an ANSI C library function; the system function read is not. So a conforming program mayredefine read and still use fread in a predictable way.

The problem with this is that read underlies the fread implementation in the standard C library. A programthat redefines read could ``confuse'' the fread implementation. To guard against this, ANSI C states that animplementation cannot use a name that is not reserved to it. Therefore _read −− note the leading underscore−− is used to implement fread in the standard C library.

Now suppose that a program you have written calls read. If your program is going to work, the definition forread has to exist in the C library. It is identical to the definition for _read and is contained in the same objectfile.

Suppose further that another program you have written redefines read, and that this same program calls fread.Because you get our definitions of both _read and read when you use fread, we would expect the link editorto report the multiply defined symbol read as an error, and fail to create an executable program. To preventthat, use the #pragma directive in your source code for the library as follows:

#pragma weak read = _read

Because our read is defined as a weak symbol, your own definition of read will override the definition in thestandard C library. You can use the #pragma directive in the same way in your own library code.

There's a second use for weak symbols that you ought to know about:

#pragma weak read

tells the link editor not to complain if it does not find a definition for the weak symbol read. References to thesymbol use the symbol value if defined, 0 otherwise. The link editor does not extract archive members toresolve undefined weak symbols. The mechanism is intended to be used primarily with functions. Although itwill work for most data objects, it should not be used with uninitialized global data (``common'' symbols) orwith shared library data objects that are exported to executables.

Checking for run−time compatibility

Suppose you have been supplied with an updated version of a shared object. You have already compiled yourprogram with the previous version; the link editor has checked it for undefined symbols, found none, andcreated an executable. Therefore, you should not have to link your program again. The dynamic linker willsimply use the definitions in the new version of the shared object to satisfy unresolved external references inthe executable.

C and C++ compilation system

Checking for run−time compatibility 26

Page 34: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

Suppose further that this is a database update program that takes several days to run. You want to be sure thatyour program does not fail in a critical section because a symbol that was defined by the previous version ofthe shared object is no longer defined by the new version. You want the information that the link editor givesyou −− that your executable is compatible with the shared library −− without having to link edit it again.

There are two ways you can check for run−time compatibility. The command ldd (``list dynamicdependencies'') directs the dynamic linker to print the path names of the shared objects on which yourprogram depends:

$ ldd prog

When you specify the −d option to ldd, the dynamic linker prints a diagnostic message for each unresolveddata reference it would encounter if prog were executed. When you specify the −r option, it prints adiagnostic message for each unresolved data or function reference it would encounter if prog were executed.

You can do the same thing when you execute your program. Whereas the dynamic linker resolves datareferences immediately at run time, it normally delays resolving function references until a function isinvoked for the first time. Normally, then, the lack of a definition for a function will not be apparent until thefunction is invoked. By setting the environment variable LD_BIND_NOW

$ LD_BIND_NOW=1 export LD_BIND_NOW

before you execute your program, you direct the dynamic linker to resolve all references immediately. In thatway, you can learn before execution of main begins that the functions invoked by your process actually aredefined.

Dynamic linking programming interface

You can use the programming interface to the dynamic linking mechanism to attach a shared object to theaddress space of your process during execution, look up the address of a function in the library, call thatfunction, and then detach the library when it is no longer needed. See the following manual pages for moreinformation:

dlopen(3C)• dlclose(3C)• dlsym(3C)• dlerror(3C)•

Guidelines for building shared objects

This topic gives coding guidelines and maintenance tips for shared library development. Before getting downto specifics, we should emphasize that if you plan to develop a commercial shared library, you ought toconsider providing a compatible archive as well. As noted previously, some users may not find a sharedlibrary appropriate for their applications. Others may want their applications to run on UNIX® operatingsystem releases without shared object support. Shared object code is completely compatible with archivelibrary code. You can use the same source files to build archive and shared object versions of a library. Someof the reasons you might want to use shared objects are:

Because library code is not copied into the executables that use it, they require less disk space.• Because library code is shared at run time, the dynamic memory needs of systems are reduced.•

C and C++ compilation system

Dynamic linking programming interface 27

Page 35: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

Because symbol resolution is put off until run time, shared objects can be updated without having torelink applications that depend on them.

As long as its path name is not hard−coded in an executable, a shared object can be moved to adifferent directory without breaking an application.

To enhance shared library performance, you should:

Minimize the Library's Data Segment1. Minimize Paging Activity.2.

Minimize the library's data segment.

As noted, only a shared object's text segment is shared by all processes that use it; its data segment typicallyis not. Every process that uses a shared object usually gets a private memory copy of its entire data segment,regardless of how much data is needed. You can cut down the size of the data segment in several ways:

Try to use automatic (stack) variables. Don't use permanent storage if automatic variables will work.• Use functional interfaces rather than global variables. Generally speaking, that will make libraryinterfaces and code easier to maintain. Moreover, defining functional interfaces often eliminatesglobal variables entirely, which in turn eliminates global copy data. The ANSI C functionstrerror(3C) illustrates these points.

In previous implementations, system error messages were made available to applications only throughtwo global variables:

extern int sys_nerr;extern char sys_errlist[];

sys_errlist[X] gives a character string for the error X, if X is a nonnegative value less than sys_nerr.Now if the current list of messages were made available to applications only through a lookup table inan archive library, applications that used the table obviously would not be able to access newmessages as they were added to the system unless they were relinked with the library. Errors mightoccur for which these applications could not produce useful diagnostics. Something similar happenswhen you use a global lookup table in a shared library:

The compilation system sets aside memory for the table in the address space of eachexecutable that uses it, even though it does not know yet where the table will be loaded.

1.

After the table is loaded, the dynamic linker copies it into the space that has been set aside.2. Each process that uses the table gets a private copy of the library's data segment, including thetable, and an additional copy of the table in its own data segment.

3.

Each process pays a performance penalty for the overhead of copying the table at run time.4. Because the space for the table is allocated when the executable is built, the application willnot have enough room to hold any new messages you might want to add in the future. Afunctional interface overcomes these difficulties.

5.

strerror might be implemented as follows:

static const char msg[] = { "Error 0", "Not owner", "No such file or directory",

C and C++ compilation system

Guidelines for building shared objects 28

Page 36: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

...};

char strerror(int err) { if (err < 0 || err >= sizeof(msg)/sizeof(msg[0])) return "Unknown error";return (char )msg[err]; }

The message array is static, so no application space is allocated to hold a separate copy. Because noapplication copy exists, the dynamic linker does not waste time moving the table. New messages canbe added, because only the library knows how many messages exist. Finally, note the use of the typequalifier const to identify data as read−only. Whereas writable data is stored in a shared object's datasegment, read−only data is stored in its text segment. For more on const, see ``C language compilers''.

In a similar way, you should try to allocate buffers dynamically −− at run time −− instead of definingthem at link time. That will save memory because only the processes that need the buffers will getthem. It will also allow the size of the buffers to change from one release of the library to the nextwithout affecting compatibility as shown below:

char buffer(){ static char buf = 0;

if (buf == 0) { if ((buf = malloc(BUFSIZE)) == 0) return 0; } ... return buf; }Exclude functions that use large amounts of global data if you cannot rewrite them in the waysdescribed previously. If an infrequently used routine defines large amounts of static data, it probablydoes not belong in a shared library.

Make the library self−contained. If a shared object imports definitions from another shared object,each process that uses it will get a private copy not only of its data segment, but of the data segmentof the shared object from which the definitions were imported. In cases of conflict, this guidelineshould probably take precedence over the preceding one.

Minimize paging activity.

Although processes that use shared libraries will not write to shared pages, they still may incur page faults. Tothe extent they do, their performance will degrade. You can minimize paging activity in the following ways:

Organize to improve locality of reference.Exclude infrequently used routines on which the library itself does not depend. Traditionala.out files contain all the code they need at run time. So if a process calls a function, it mayalready be in memory because of its proximity to other text in the process. If the function is ina shared library, however, the surrounding library code may be unrelated to the callingprocess. Only rarely, for example, will any single executable use everything in the shared Clibrary. If a shared library has unrelated functions, and if unrelated processes make randomcalls to those functions, locality of reference may be decreased, leading to more pagingactivity. The point is that functions used by only a few a.out files do not save much disk spaceby being in a shared library, and can degrade performance.

1.

Try to improve locality of reference by grouping dynamically related functions. If every callto funcA generates calls to funcB and funcC, try to put them in the same page. See``Analyzing run−time behavior'' for a description of the fprof and fur tools that you can useto tune locality of reference.

2.

C and C++ compilation system

Guidelines for building shared objects 29

Page 37: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

Align for paging. Try to arrange the shared library's object files so that frequently used functions donot unnecessarily cross page boundaries. First, determine where the page boundaries fall. You can usethe nm(1) command to determine how symbol values relate to page boundaries. After groupingrelated functions, break them up into page−sized chunks. Although some object files and functionsare larger than a page, many are not. Then use the less frequently called functions as glue between thechunks. Because the glue between pages is referenced less frequently than the page contents, theprobability of a page fault is decreased. You can put frequently used, unrelated functions togetherbecause they will probably be called randomly enough to keep the pages in memory.

Library maintenance

We have already seen how allocating buffers dynamically can ease the job of library maintenance. As ageneral rule, you want to be sure that updated versions of a shared object are compatible with its previousversions so that users will not have to recompile their applications. You should avoid changing the names oflibrary symbols from one release to the next.

However, there may be instances in which you need to release a library version that is incompatible with itspredecessor. On the one hand, you will want to maintain the older version for dynamically linked executablesthat depend on it. On the other hand, you will want newly created executables to be linked with the updatedversion. Moreover, you will probably want both versions to be stored in the same directory. In this example,you could give the new release a different name, rewrite your documentation, and so forth. A better alternativewould be to plan for the contingency in the first instance by using the following sequence of commands whenyou create the original version of the shared object:

$ cc −K PIC −G −h libfoo.1 −o libfoo.1 function1.c \ function2.c function3.c $ ln libfoo.1 libfoo.so

In the first command −h stores the name given to it, libfoo.1, in the shared object itself. You then use theUNIX® operating system command ln(1), to create a link between the name libfoo.1 and the name libfoo.so.libfoo.so is the name the link editor will look for when users of your library specify

$ cc −Ldir file1.c file2.c file3.c −lfoo

However, the link editor will record in the user's executable the name you gave to −h, libfoo.1, rather than thename libfoo.so. That means that when you release a subsequent, incompatible version of the library, libfoo.2,executables that depend on libfoo.1 will continue to be linked with it at run time.

As shown earlier, the dynamic linker uses the shared object name that is stored in the executable to satisfyunresolved external references at run time.

You use the same sequence of commands when you create libfoo.2:

$ cc −K PIC −G −h libfoo.2 −o libfoo.2 function1.c \ function2.c function4.c $ ln libfoo.2 libfoo.so

When users specify

$ cc −Ldir file1.c file2.c file3.c −lfoo

C and C++ compilation system

Library maintenance 30

Page 38: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

the name libfoo.2 will be stored in their executables, and their programs will be linked with the new libraryversion at run time.

C++ and dynamic linking

Shared objects may be created from C++ source as well as from C source. The foregoing discussion ofcreating and using C shared objects applies to C++ shared objects also, with two limitations:

Using a shared object does not avoid the problem that any code using a class defined in the externalinterface to the library must be recompiled if the class definition changes. Executables and otherlibraries that depend on the class definition must be recompiled even if the change is in the privatepart of the class, since that will effect the offsets of the public and protected members and the offsetsof any members of classes derived from that class.

The programming interface to the dynamic linking mechanism is not available for any function withC++ linkage. If you intend for your library to be loaded with the dlopen(3C) interface, then theexternal interfaces must have C linkage, which precludes the use of C++ features. C++ linkage maystill be used for symbols internal to the library.

Note also that when you have file scope objects of the same name and type in both an a.out and a sharedlibrary, the space for the objects is shared (so that there's really only one object), but the constructor for thatobject is called twice during startup, once from the a.out, once from the .so. (And similarly for the destructor.)

An example would be this:

so.h: #include <stdio.h>

class A { public: A(const char* s) { printf("ctor %s, object %x\n", s, this); } };

so1.C: #include "so.h"

A a("from so1.C");

so2.C: #include "so.h"

A a("from so2.C");

int main() { }

$ CC −G −KPIC −o libx.so so1.C $ CC so2.C libx.so $ a.out ctor from so1.C, object 8049840 ctor fromso2.C, object 8049840 Since this kind of duplicate construction is probably not what was intended, such useof same−named file scope objects is best avoided. (The −Bsymbolic option can be used to create two separatespaces for the objects, but this probably is not what was intended either.)

Building a C++ shared object

The options to the CC command to create and locate shared objects are the same as the options to the cccommand. The following command, for example, will create a shared object named libfoo.so from two C++source files:

C and C++ compilation system

C++ and dynamic linking 31

Page 39: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

$ CC −G −KPIC −o libfoo.so function1.C function2.C

To avoid dangling references, you should ensure that all templates referenced within a library are instantiatedwithin the library.

Static constructors and destructors

Unlike an archive library, with a shared object the linker cannot determine what subset of functions andobjects within a library are actually needed by the program, therefore, all the static constructors (destructors)in the shared object will be executed when the shared object is loaded (unloaded). The static constructors willbe called in thereverse of the order the object files appear on the command line used to create the object. Forexample, given the following command line,

$ CC −G function1.o function2.o function3.o

the static constructors will be called in the order function3.o function2.o function1.o. When the object isunloaded, the static destructors will be called in the reverse order of the calls to the static constructors, that is,in the order the object files appear on the command line.

When multiple C++ shared objects are loaded, if shared object a depends on shared object b, the staticconstructors in shared object b will be executed first. There is no guarantee of any ordering of calls to staticconstructors between unrelated shared objects. When the program exits, the static destructors are called in thereverse order of the calls to the static constructors. If dlclose(3C) is used to explicitly close a shared object,static destructors may be executed out−of−sequence with respect to other static destructors in shared objectsthat are still open. (The ISO C++ standard considers that dynamic libraries are outside the scope of thestandard, which means that behavior such as this does not constitute a violation of the standard.)

Diagnostics from the dynamic linker

The dynamic linker does not demangle any names in the diagnostics it produces. (See

``C++ external function name encoding''.)

If a diagnostic appears somewhat cryptic:

$ a.out dynamic linker: a.out: symbol not found: g__Fv Killed

capture the standard error output and demangle it with the c++filt(1C++) filter program:

$ a.out 2>/var/tmp/a.out.err Killed $ c++filt </var/tmp/a.out.err dynamic linker: a.out: symbol not found: g(void)

C++ external function name encoding

In C++, you can have multiple functions which have the same name but differ in the number or types of theformal parameters. The C++ compiler will call the correct function based on the best match for the numberand types of the actual arguments. The return data type must be the same for each function with the same

C and C++ compilation system

C++ and dynamic linking 32

Page 40: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

name. To implement support for overloading function names, the C++ compiler encodes the parameter datatypes into the name of the function as it appears in the generated object code.

Use the nm(1) command to look at the symbol names in the object file generated for the C++ source programoverload.C:

extern void foo(void); extern void foo(int); extern void foo(const char *);

void dummy () { int i = 1; const char * pchar = "string....."; foo(); foo(i); foo(pchar); }

CC −c overload.C nm overload.o The nm output will show the following four entries for the functions eitherdefined or declared in the code.

Symbols from overload.o:

[Index] Value Size Type Bind Other Shndx Name

[6] | 0| 0|NOTY |GLOB |0 |UNDEF |foo__FPCc [7] | 0| 0|NOTY |GLOB |0 |UNDEF |foo__Fi [8] | 0| 0|NOTY|GLOB |0 |UNDEF |foo__Fv [9] | 0| 51|FUNC |GLOB |0 |1 |dummy__Fv By using the −C option of nm, thenames will be displayed in a format that resembles the function declaration. Note the absence of a functionreturn data type.

nm −C overload.o

Symbols from overload.o:

[Index] Value Size Type Bind Other Shndx Name

[6] | 0| 0|NOTY |GLOB |0 |UNDEF |foo(const char*) [7] | 0| 0|NOTY |GLOB |0 |UNDEF |foo(int) [8] | 0|0|NOTY |GLOB |0 |UNDEF |foo(void) [9] | 0| 51|FUNC |GLOB |0 |1 |dummy(void) The same output will beproduced by running the output of nm through the c++ name filter:

nm overload.o | c++filt

C++ class names are encoded in the function name when used as a parameter type or when the function is aclass member function. Look at the nm output for the object file produced from the following code.

string.C:

class String { public: String (); String (const char *); String (String &); ~String(); String operator=(String &); };

C and C++ compilation system

C++ and dynamic linking 33

Page 41: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

String str1; String str2("abcdef...");

void foo(String & refStr) { String str3;

str3 = refStr; }

CC −c string.C nm string.o This example contains usage of two constructors, a destructor, and an assignmentoperator member function for the String class. The nm output also shows the presence of a "staticinitialization" function which will be called (automatically) to initialize the global objects str1 and str2. Thereis a corresponding "termination" to call the necessary class destructors for objects at the file scope.

Note that externally visible variables are not encoded and appear in the object file exactly like a C global dataobject.

Symbols from string.o:

[Index] Value Size Type Bind Other Shndx Name

[8] | 0| 0|NOTY |GLOB |0 |UNDEF |__as__6StringFR6String [9] | 84| 69|FUNC |GLOB |0 |1 |foo__FR6String[10] | 0| 0|NOTY |GLOB |0 |UNDEF |__dt__6StringFv [11] | 0| 0|NOTY |GLOB |0 |UNDEF|__ct__6StringFPCc [12] | 0| 0|NOTY |GLOB |0 |UNDEF |__ct__6StringFv [13] | 40| 41|FUNC |GLOB |0 |1|__std__string_C_Sun_Apr_24 [14] | 0| 40|FUNC |GLOB |0 |1 |__sti__string_C_Sun_Apr_24 [15] | 1| 1|OBJT|GLOB |0 |2 |str2 [16] | 0| 1|OBJT |GLOB |0 |2 |str1

Once againing the −C option of nm output is easier to understand:

nm −C string.o

Symbols from string.o:

[Index] Value Size Type Bind Other Shndx Name

[8] | 0| 0|NOTY |GLOB |0 |UNDEF |String::operator=(String&) [9] | 84| 69|FUNC |GLOB |0 |1 |foo(String&)[10] | 0| 0|NOTY |GLOB |0 |UNDEF |String::~String(void) [11] | 0| 0|NOTY |GLOB |0 |UNDEF|String::String(const char*) [12] | 0| 0|NOTY |GLOB |0 |UNDEF |String::String(void) [13] | 40| 41|FUNC|GLOB |0 |1 |string_C_Sun_Apr_24_:__std [14] | 0| 40|FUNC |GLOB |0 |1 |string_C_Sun_Apr_24_:__sti [15]| 1| 1|OBJT |GLOB |0 |2 |str2 [16] | 0| 1|OBJT |GLOB |0 |2 |str1

Accessing C functions from C++

Because the C++ compiler encodes the parameter data type into the function name (see ``C++ externalfunction name encoding''), the same function declaration in both a C and C++ program will refer to differentlynamed functions in the object files. The following C and C++ modules can not be linked together as they arecurrently written.

callee.c:

#include <stdio.h>

C and C++ compilation system

Accessing C functions from C++ 34

Page 42: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

void subr (void) { printf("C subroutine\n"); }

caller.C:

extern void subr (void);

main () { subr(); return 0; }

cc −c callee.c CC callee.o caller.C

Undefined first referenced symbol in file subr(void) caller.o ld: a.out: fatal error: Symbol referencing errors.No output written to a.out The linker has reported an undefined function subr(void). In reality the C compilerhas generated a function in callee.o by the name subr. The C++ compiler has generated a function referenceto an external function which it expects to be named subr__Fv. To tell the C++ compiler that function subr()is an external C function and that all references to subr() should not reference an encoded function name,change the external function declaration in the C++ program to:

extern "C" void subr(void);

The extern "C" notation can be stated on each external C function that a C++ program will reference. Thiscan be quite cumbersome. Alternatively the external C function declarations can be grouped together andenclosed in this:

extern "C" { ..... }

The commonly used header files in /usr/include have been configured such that when included in a C++compilation, the C function declarations are nested inside an extern "C" block. A C++ program, therefore,should have access to all the publically available functions in the various libraries that are part of the release.

Quick−reference guide

The following are general link editing conventions that you must be familiar with:

By convention, shared objects, or dynamically linked libraries, are designated by the prefix lib and thesuffix .so; archives, or statically linked libraries, are designated by the prefix lib and the suffix .a.libc.so, then, is the shared object version of the standard C library; libc.a is the archive version.

1.

These conventions are recognized, in turn, by the −l option to the cc command. −lx directs the linkeditor to search the shared object libx.so or the archive library libx.a. The cc command automaticallypasses −lc to the link editor. The CC command automatically passes both −lc and −lC to the linkerTherefore, the compilation system arranges for the standard libraries to be linked with your programtransparently.

2.

By default, the link editor chooses the shared object implementation of a library, libx.so, in preferenceto the archive library implementation, libx.a, in the same directory.

3.

By default, the link editor searches for libraries in the standard places on your system, /usr/ccs/lib and/usr/lib, in that order. The standard libraries supplied by the compilation system normally are kept in/usr/ccs/lib.

4.

In this arrangement, then, C programs are dynamically linked with libc.so automatically:

C and C++ compilation system

Quick−reference guide 35

Page 43: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

$ cc file1.c file2.c file3.c

To link your program statically with libc.a, turn off the dynamic linking default with the −dn option:

$ cc −dn file1.c file2.c file3.c

Specify the −l option explicitly to link your program with any other library. If the library is in the standardplace, the command

$ cc file1.c file2.c file3.c −lx

will direct the link editor to search for libx.so, then libx.a in the standard place. Note that, as a rule, it's best toplace −l at the end of the command line.

If the library is not in the standard place, specify the path of the directory in which it is stored with the −Loption

$ cc −Ldir file1.c file2.c file3.c −lx

or the environment variable LD_LIBRARY_PATH

$ LD_LIBRARY_PATH=dir export LD_LIBRARY_PATH $ cc file1.c file2.c file3.c −lx

If the library is a shared object and is not in the standard place, you must also specify the path of the directoryin which it is stored with either the environment variable LD_RUN_PATH at link time, or the environmentvariable LD_LIBRARY_PATH at run time:

$ LD_RUN_PATH=dir export LD_RUN_PATH $ LD_LIBRARY_PATH=dir export LD_LIBRARY_PATH

It's best to use an absolute path when you set these environment variables. Note that LD_LIBRARY_PATHis read both at link time and at run time.

To direct the link editor to search libx.a where libx.so exists in the same directory, turn off the dynamiclinking default with the −dn option:

$ cc −dn −Ldir file1.c file2.c file3.c −lx

That command will direct the link editor to search libc.a well as libx.a. To link your program statically withlibx.a and dynamically with libc.so, use the −Bstatic and −Bdynamic options to turn dynamic linking off andon:

$ cc −Ldir file1.c file2.c file3.c −Bstatic −lx −Bdynamic

Files, including libraries, are searched for definitions in the order they are listed on the cc command line. Thestandard C library is always searched last.

Libraries and header files

The standard libraries supplied by the C compilation system contain functions that you can use in yourprogram to perform input/output, string handling, and other high−level operations that are not explicitly

C and C++ compilation system

Libraries and header files 36

Page 44: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

provided by the C language. Header files contain definitions and declarations that your program will need if itcalls a library function. They also contain function−like macros that you can use in your program as youwould a function.

``Link editing'' showed you how to link your program with these standard libraries and how to include aheader file. This topic illustrates the use of header files and library functions in your program, and discussesthe C++ precompiled header facility. It also describes the contents and location of some of the more importantstandard libraries, along with a brief discussion of standard I/O.

Header files

Header files serve as the interface between your program and the libraries supplied by the C compilationsystem. Because the functions that perform standard I/O often use the same definitions and declarations, thesystem supplies a common interface to the functions in the header file stdio.h. If you have definitions ordeclarations that you want to make available to several source files, you can create a header file with anyeditor, store it in a convenient directory, and include it in your program.

Header files traditionally are designated by the suffix .h, and are brought into a program at compile time. Thepreprocessor component of the compiler interprets the #include statement in your program as a directive. Thetwo most commonly used directives are #include and #define. The #include directive is used to call in andprocess the contents of the named file. The #define directive is used to define the replacement token string foran identifier. For example,

#define NULL 0

defines the macro NULL to have the replacement token sequence 0. See ``C language compilers'' for thecomplete list of preprocessing directives. The most commonly used .h files are listed in

``Header Files'' to illustrate the range of tasks you can perform with header files and library functions. Whenyou use a library function in your program, the manual page will tell you which header file, if any, needs to beincluded. If a header file is mentioned, it should be included before you use any of the associated functions ordeclarations in your program. Put the #include right at the top of a source file.

assert.h assertion checking

ctype.h character handling

errno.h error conditions

float.h floating point limits

limits.h other data type limits

locale.h program's locale

math.h mathematics

setjmp.h nonlocal jumps

signal.h signal handling

stdarg.h variable arguments

stddef.h common definitions

stdio.h standard input/output

stdlib.h general utilities

string.h string handling

C and C++ compilation system

Header files 37

Page 45: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

time.h date and time

unistd.h system calls

Header Files

C++ precompiled headers

It is often desirable to avoid recompiling a set of header files, especially when they introduce many lines ofcode and the primary source files that #include them are relatively small. The C++ compiler provides amechanism for, in effect, taking a snapshot of the state of the compilation at a particular point and writing it toa disk file before completing the compilation; then, when recompiling the same source file or compilinganother file with the same set of header files, it can recognize the ``snapshot point,'' verify that thecorresponding precompiled header (``PCH'') file is reusable, and read it back in. Under the rightcircumstances, this can produce a dramatic improvement in compilation time; the trade−off is that PCH filescan take a lot of disk space.

Automatic precompiled header processing

When −Rauto appears on the CC command line, automatic precompiled header processing is enabled. Thismeans the front end will automatically look for a qualifying precompiled header file to read in and/or willcreate one for use on a subsequent compilation.

The PCH file will contain a snapshot of all the code preceding the ``header stop'' point. The header stop pointis typically the first token in the primary source file that does not belong to a preprocessing directive, but itcan also be specified directly by #pragma hdrstop (see below) if that comes first. For example:

#include "xxx.h" #include "yyy.h" int i;

The header stop point is int (the first non−preprocessor token) and the PCH file will contain a snapshotreflecting the inclusion of xxx.h and yyy.h. If the first non−preprocessor token or the #pragma hdrstopappears within a #if block, the header stop point is the outermost enclosing #if. To illustrate, here's a morecomplicated example:

#include "xxx.h" #ifndef YYY_H #define YYY_H 1 #include "yyy.h" #endif #if TEST int i; #endif

Here, the first token that does not belong to a preprocessing directive is again int, but the header stop point isthe start of the #if block containing it. The PCH file will reflect the inclusion of xxx.h and conditionally thedefinition of YYY_H and inclusion of yyy.h; it will not contain the state produced by #if TEST.

A PCH file will be produced only if the header stop point and the code preceding it (mainly, the header filesthemselves) meet certain requirements:

C and C++ compilation system

C++ precompiled headers 38

Page 46: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

The header stop point must appear at file scope −−− it may not be within an unclosed scopeestablished by a header file. For example, a PCH file will not be created in this case:

// xxx.h class A {

// xxx.C #include "xxx.h" int i; };

The header stop point may not be inside a declaration started within a header file, nor may it be partof a declaration list of a linkage specification. For example, in the following case the header stoppoint is int, but since it is not the start of a new declaration, no PCH file will be created:

// yyy.h static

// yyy.C #include "yyy.h" int i;

The processing preceding the header stop must not have produced any errors.

NOTE: Warnings and other diagnostics will not be reproduced when the PCH file is reused.

No references to predefined macros __DATE__ or __TIME__ may have appeared.• No use of the #line preprocessing directive may have appeared.• #pragma no_pch (see below) must not have appeared.• The code preceding the header stop point must have introduced a sufficient number of declarations tojustify the overhead associated with precompiled headers.

When a precompiled header file is produced, it contains, in addition to the snapshot of the compiler state,some information that can be checked to determine under what circumstances it can be reused. This includes:

The compiler version, including the date and time the compiler was built.• The current directory (i.e., the directory in which the compilation is occurring).• The command line options.• The initial sequence of preprocessing directives from the primary source file, including #includedirectives.

The date and time of the header files specified in #include directives.•

This information comprises the PCH ``prefix.'' The prefix information of a given source file can be comparedto the prefix information of a PCH file to determine whether the latter is applicable to the current compilation.

As an illustration, consider two source files:

// a.C #include "xxx.h" ... // Start of code

// b.C #include "xxx.h" ... // Start of code When a.C is compiled with CC −Rauto, a precompiled header filenamed a.pch is created. Then, when b.C is compiled (or when a.C is recompiled), the prefix section of a.pchis read in for comparison with the current source file. If the command line options are identical, if xxx.h hasnot been modified, and so forth, then, instead of opening xxx.h and processing it line by line, the front endreads in the rest of a.pch and thereby establishes the state for the rest of the compilation.

C and C++ compilation system

C++ precompiled headers 39

Page 47: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

It may be that more than one PCH file is applicable to a given compilation. If so, the largest (i.e., the onerepresenting the most preprocessing directives from the primary source file) is used. For instance, consider aprimary source file that begins with

#include "xxx.h" #include "yyy.h" #include "zzz.h"

If there is one PCH file for xxx.h and a second for xxx.h and yyy.h, the latter will be selected (assuming bothare applicable to the current compilation). Moreover, after the PCH file for the first two headers is read in andthe third is compiled, a new PCH file for all three headers may be created.

When a precompiled header file is created, it takes the name of the primary source file, with the suffixreplaced by .pch. Unless −Rdir is specified (see below), it is created in the directory of the primary sourcefile.

When the −v option to CC is used, a message such as

"test.C": creating precompiled header file "test.pch"

is issued if a precompiled header file is created or used.

In automatic mode (i.e., when −Rauto is used) the front end will deem a precompiled header file obsolete anddelete it under the following circumstances:

if the precompiled header file is based on at least one out−of−date header file but is otherwiseapplicable for the current compilation; or

if the precompiled header file has the same base name as the source file being compiled (e.g., xxx.pchand xxx.C) but is not applicable for the current compilation (e.g., because of different command−lineoptions).

This handles some common cases; other PCH file clean−up must be dealt with by other means (e.g., by theuser).

Manual precompiled header processing

The CC command−line option −Rcreate=filename specifies that a precompiled header file of the specifiedname should be created. The CC command−line option −Ruse=filename specifies that the indicatedprecompiled header file should be used for this compilation; if it is invalid (i.e., if its prefix does not match theprefix for the current primary source file), a warning will be issued and the PCH file will not be used.

When either of these options is used in conjunction with −Rdir=directory−name, the indicated file name(which may be a path name) is tacked on to the directory name, unless the file name is an absolute path name.

The manual−mode PCH options may not be used in conjunction in automatic mode. In other words, theseoptions are not compatible with −Rauto. Nevertheless, most of the description of automatic PCH processingapplies to one or the other of these modes −−− header stop points are determined the same way, PCH fileapplicability is determined the same way, and so forth.

C and C++ compilation system

C++ precompiled headers 40

Page 48: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

Other ways to control precompiled headers

There are several ways in which the user can control and/or tune how precompiled headers are created andused.

#pragma hdrstop may be inserted in the primary source file at a point prior to the first token thatdoes not belong to a preprocessing directive. It enables the user to specify where the set of headerfiles subject to precompilation ends. For example,

#include "xxx.h" #include "yyy.h" #pragma hdrstop #include "zzz.h"

Here, the precompiled header file will include processing state for xxx.h and yyy.h but not zzz.h.(This is useful if the user decides that the information added by what follows the #pragma hdrstopdoes not justify the creation of another PCH file.)

#pragma no_pch may be used to suppress precompiled header processing for a given source file.• Command−line option −Rdir=directory−name is used to specify the directory in which to search forand/or create a PCH file.

Performance issues

The relative overhead incurred in writing out and reading back in a precompiled header file is quite small forreasonably large header files.

In general, it doesn't cost much to write a precompiled header file out even if it does not end up being used,and if it is used it almost always produces a significant speedup in compilation. The problem is that theprecompiled header files can be quite large (from a minimum of about 250K bytes to several megabytes ormore), and so one probably doesn't want many of them sitting around.

Thus, despite the faster recompilations, precompiled header processing is not likely to be justified for anarbitrary set of files with nonuniform initial sequences of preprocessing directives. Rather, the greatest benefitoccurs when a number of source files can share the same PCH file. The more sharing, the less disk space isconsumed. With sharing, the disadvantage of large precompiled header files can be minimized, without givingup the advantage of a significant speedup in compilation times.

Consequently, to take full advantage of header file precompilation, users should expect to reorder the#include sections of their source files and/or to group #include directives within a commonly used headerfile.

A common idiom is this:

#include "common_head.h" #pragma hdrstop #include ...

where common_head.h pulls in, directly and indirectly, the header files used throughout all the source filesthat make up the program. The #pragma hdrstop is inserted to get better sharing with fewer PCH files.Another idiom, is to partition related header files into groups:

C and C++ compilation system

C++ precompiled headers 41

Page 49: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

#include "common_head.h" #include "group1.h" #pragma hdrstop #include ...

If disk space were at a premium, one could decide to make one header file that pulls in all the header filesused −−− then, a single PCH file could be used in building the whole program.

Different environments and different projects will have different needs, but in general, users should be awarethat making the best use of the precompiled header support will require some experimentation and probablysome minor changes to source code.

How to use library functions

The manual page for each function describes how you should use the function in your program. As anexample, we'll look at the strcmp routine, which compares character strings. See string(3C) for moreinformation. Related functions are described there as well, but only the sections relevant to strcmp are shownin ``Excerpt from string(3C) Manual Page''.

NAME string: strcat, strdup, strncat, strcmp, strncmp, strcpy, strncpy, strlen, strchr, strrchr, strpbrk,strspn, strcspn, strtok, strtok_r, strstr, strlist − string operations.

SYNOPSIS

#include <string.h>

...

int strcmp(const char s1, const char s2);

... DESCRIPTION

...

strcmp compares its arguments and returns an integer less than, equal to, or greater than 0, based uponwhether s1 is lexicographically less than, equal to, or greater than s2. ...

Excerpt from string(3C) Manual Page

The DESCRIPTION section tells you what the function or macro does. The SYNOPSIS section contains thecritical information about how you use the function or macro in your program.

Note that the first line in the SYNOPSIS is

#include <string.h>

That means that you should include the header file string.h in your program because it contains usefuldefinitions or declarations relating to strcmp. string.h contains the line

C and C++ compilation system

How to use library functions 42

Page 50: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

extern int strcmp(const char , const char );

that describes the kinds of arguments expected and returned by strcmp. This line is called a functionprototype. Function prototypes afford a greater degree of argument type checking than old−style functiondeclarations, so you lessen your chance of using the function incorrectly. By including string.h, you assurethat the compiler checks calls to strcmp against the official interface. You can, of course, examine string.h inthe standard place for header files on your system, usually the /usr/include directory.

The next thing in the SYNOPSIS section is the formal declaration of the function. The formal declaration tellsyou:

the type of value returned by the function;• the arguments the function expects to receive when called, if any;• the argument types.•

By way of illustration, let's look at how you might use strcmp in your own code.

``How strcmp() Is Used in a Program'' shows a program fragment that will find the bird of your choice in anarray of birds.

#include <string.h>

/ birds must be in alphabetical order / char birds[] = { "albatross", "canary", "cardinal", "ostrich","penguin" };

/ Return the index of the bird in the array. / / If the bird is not in the array, return −1 /

int is_bird(const char string) { int low, high, midpoint; int cmp_value;

/ use a binary search to find the bird / low = 0; high = sizeof(birds)/sizeof(char ) − 1; while(low <= high) {midpoint = (low + high)/2; cmp_value = strcmp(string, birds[midpoint]); if (cmp_value < 0) high = midpoint− 1; else if (cmp_value > 0) low = midpoint + 1; else / found a match / return midpoint; } return −1; }

How strcmp() Is Used in a Program

C library (libc)

This topic discusses some of the more important routines in the standard C library. libc contains the systemcalls described in Section 2 and the C language functions described in Section 3, Subsections 3C and 3S.

Subsection 3S routines

Subsection 3S of the manual pages contains the standard I/O library for C programs. Frequently, one manualpage describes several related functions or macros. In ``Standard I/O Functions and Macros'', the left−handcolumn contains the name that appears at the top of the manual page; the other names in the same row arerelated functions or macros described on the same manual page. Programs that use these routines shouldinclude the header file stdio.h.

fclose fflush Close or flush a stream.

C and C++ compilation system

C library (libc) 43

Page 51: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

ferror feof clearerr fileno Stream status inquiries.

fopen freopen fdopen Open a stream.

fread fwrite Input/output.

fseek rewind ftell Reposition a file pointer in a stream.

getc getchar fgetc getw Get a character or word from a stream.

gets fgets Get a string from a stream.

popen pclose Begin or end a pipe to/from a process.

printf fprintf sprintf Print formatted output.

putc putchar fputc putw Put a character or word on a stream.

puts fputs Put a string on a stream.

scanf fscanf sscanf Convert formatted input.

setbuf setvbuf Assign buffering to a stream.

system Issue a command through the shell.

tmpfile Create a temporary file.

tmpnam tempnam Create a name for a temporary file.

ungetc Push character back into input stream.

vprintf vfprintf vsprintf Print formatted output of a varargs argument list.

Standard I/O Functions and Macros

Subsection 3C routines

Subsection 3C contains functions and macros that perform a variety of tasks:

string manipulation• character classification• character conversion• environment management• memory management.• multibyte/wide character conversion•

Here we'll look at functions and macros that perform the first three tasks.

``String Operations'' lists most of the string−handling functions that appear in string(3C). Programs that usethese functions should include the header file string.h.

Most of these string−handling functions have corresponding functions that handle multibyte/wide−characterobjects. These functions are designated by the prefix wcs in place of the str prefix of the string functions.

strcat Append a copy of one string to the end of another.

strncat Append no more than a given amount of characters from one string to the end of another.

strcmp Compare two strings. Returns an integer less than, greater than, or equal to 0 to show thatone is lexicographically less than, greater than, or equal to the other.

strncmp Compare no more than a given amount of characters from the two strings. Results areotherwise identical to strcmp.

C and C++ compilation system

C library (libc) 44

Page 52: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

strcpy Copy a string.

strncpy Copy a given amount of characters from one string to another. The destination string willbe truncated if it is longer than the given amount of characters, or padded with nullcharacters if it is shorter.

strdup Return a pointer to a newly allocated string that is a duplicate of a string pointed to.

strchr Return a pointer to the first occurrence of a character in a string, or a null pointer if thecharacter is not in the string.

strrchr Return a pointer to the last occurrence of a character in a string, or a null pointer if thecharacter is not in the string.

strlen Return the number of characters in a string.

strpbrk Return a pointer to the first occurrence in one string of any character from the second, or anull pointer if no character from the second occurs in the first.

strspn Return the length of the initial segment of one string that consists entirely of charactersfrom the second string.

strcspn Return the length of the initial segment of one string that consists entirely of characters notfrom the second string.

strstr Return a pointer to the first occurrence of the second string in the first string, or a nullpointer if the second string is not found.

strtok Break up the first string into a sequence of tokens, each of which is delimited by one ormore characters from the second string. Return a pointer to the token, or a null pointer ifno token is found.

strlist Concatenate an indefinite number of null−terminated strings into the array pointed to by itsfirst argument.

String Operations

``Classifying 8−Bit Character−Coded Integer Values'' lists most of the functions and macros that classify8−bit character−coded integer values. These routines appear in conv(3C) and ctype(3C). Programs that usethese routines should include the header file ctype.h.

isalpha Is c a letter?

isupper Is c an uppercase letter?

islower Is c a lowercase letter?

isdigit Is c a digit [0−9]?

isxdigit Is c a hexadecimal digit [0−9], [A−F], or [a−f]?

isalnum Is c alphanumeric (a letter or digit)?

isspace Is c a space, horizontal tab, carriage return, new−line, vertical tab, or form−feed?

ispunct Is c a punctuation character (neither control nor alphanumeric)?

isprint Is c a printing character?

isgraph Same as isprint except false for a space.

iscntrl Is c a control character or a delete character?

isascii Is c an ASCII character?

toupper Change lower case to upper case.

_toupper Macro version of toupper.

C and C++ compilation system

C library (libc) 45

Page 53: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

tolower Change upper case to lower case.

_tolower Macro version of tolower.

toascii Turn off all bits that are not part of a standard ASCII character; intended for compatibility withother systems.

Classifying 8−Bit Character−Coded Integer Values

``Converting Characters, Integers, or Strings'' lists functions and macros in Subsection 3C that are used toconvert characters, integers, or strings from one representation to another. The left−hand column contains thename that appears at the top of the manual page; the other names in the same row are related functions ormacros described on the same manual page. Programs that use these routines should include the header filestdlib.h.

a64l l64a Convert between long integer and base−64 ASCII string.

strtod atof strtold Convert string to double−precision number.

strtol atol atoi Convert string to integer.

strtoul Convert string to unsigned long.

Converting Characters, Integers, or Strings

System calls

UNIX® operating system calls are the interface between the kernel and the user programs that run on top ofit. read(2), write(2), and the other system calls define the UNIX® operating system. Everything else is builton their foundation. Strictly speaking, they are the only way to access such facilities as the file system,interprocess communication primitives, and multitasking mechanisms.

Of course, most programs do not need to invoke system calls directly to gain access to these facilities. If youare performing input/output, for example, you can use the standard I/O functions described earlier. When youuse these functions, the details of their implementation on the UNIX® operating system −− for example, thatthe system call read underlies the fread implementation in the standard C library −− are transparent to theprogram. Therefore, the program will generally be portable to any system with a conforming Cimplementation.

In contrast, programs that invoke system calls directly are portable only to other systems similar to the UNIX®

operating system. Therefore, you would not use read in a program that performed a simple I/O operation.Other operations, however, including most multitasking mechanisms, do require direct interaction with theUNIX® operating system kernel. These operations are discussed in detail in Programming with system callsand libraries.

Math library (libm)

The math library, libm, contains the mathematics functions supplied by the C compilation system. Theseappear in Subsection 3M. This describes some of the major functions, organized by the manual page on whichthey appear. Note that functions whose names end with the letter f are single−precision versions, which meansthat their argument and return types are float. Functions whose names end with the letter l are longdouble−precision versions, which means that their argument and return types are long double. The header file

C and C++ compilation system

C library (libc) 46

Page 54: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

math.h should be included in programs that use math functions.

bessel(3M) j0 j1 jn Bessel functions

y0 y1 yn Bessel functions

erf(3M) erfc erfcf erfcl return complementary error functionof x

erf erff erfl return error function of x

exp(3M) cbrt cbrtf cbrtl cube root function

exp expf expl exponential, mod, power, square rootfunctions

exp2 exp2f exp2l exponential function

expm1 expm1f expm1l exponential function

frexp frexpf frexpl exponential function

ldexp ldexpf ldexpl exponential function

modf modff modfl modulus function

pow powf powl power function

pown pownf pownl power function

scalbln scalblnf scalblnl scalbln function

scalbn scalbnf scalbnl scalbn function

sqrt sqrtf sqrtl square root function

fdim(3M) fdim fdimf fdiml return positive difference between twoarguments

floor(3M) ceil ceilf ceill return smallest integer not smallerthan x

copysign copysignf copysignl return x with the sign of y

fabs fabsf fabsl return the absolute value of x, |x|

floor floorf floorl return largest integer not greater thanx

fmod fmodf fmodl return the floating point remainder ofthe division of x by y

nan nanf nanl return a quiet NaN, if available, withcontent indicated by tagp

nearbyint nearbyintf nearbyintl return nearest integer value to floatingpoint argument x

nextafter nextafterf nextafterl return the next representable value inthe specified format after x in thedirection of y

nexttoward nexttowardf nexttowardl return the next representable value inthe specified format after x in thedirection of y

fma(3M) fma fmaf fmal return rounded result of (x x y) + z

fmax(3M) fmax fmaxf fmaxl return maximum numeric value of twoarguments

fmin fminf fminl return minimum numeric value of two

C and C++ compilation system

C library (libc) 47

Page 55: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

arguments

fpclassify(3M) isfinite determines whether x has a finitevalue

isgreater determines whether x is greater than y

isgreaterequal determines whether x is greater thanor equal to y

isinf determines whether x is positive ornegative infinity

isles determines whether x is less than y

islessequal determines whether x is less than orequal to y

islessgreater determines whether x is less than orgreater than y

isnan determines whether x is a NaN

isnormal determines whether x is not zero,subnormal, infinite, or a NaN

isunordered determines whether x and y areunordered

signbit determines whether the sign of x isnegative

gamma(3M) gamma gammaf gammal log gamma function

lgamma lgammaf lgammal log gamma function

signgam storage for sign for gamma andlgamma

tgamma tgammaf tgammal gamma function

hypot(3M) hypot hypotf hypotl return result of sqrt((x x x) + (y x y))

log(3M) ilogb ilogbf ilogbl logarithm function

log logf logl logarithm function

log10 log10f log10l logarithm function

log1p log1pf log1pl logarithm function

log2 log2f log2l logarithm function

logb logbf logbl logarithm function

matherr(3M) matherr matherrl error−handling function

remainder(3M) remainder remainderf remainderl return floating point remainder ofdivision of x by y

remquo remquof remquol return floating point remainder ofdivision of x by y

rint(3M) llrint llrintf llrintl return nearest integer value to floatingpoint argument x

llround llroundf llroundl return the rounded integer value of x

lrint lrintf lrintl return nearest integer value to floatingpoint argument x

lround lroundf lroundl return the rounded integer value of x

rint rintf rintl

C and C++ compilation system

C library (libc) 48

Page 56: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

return nearest integer value to floatingpoint argument x

round roundf roundl return the rounded integer value of x

trunc truncf truncl return the truncated integer value of x

sinh(3M) acosh acoshf acoshl return inverse hyperbolic cosine ofargument

asinh asinhf asinhl return inverse hyperbolic sine ofargument

atanh atanhf atanhl return inverse hyperbolic tangent ofargument

cosh coshf coshl return hyperbolic cosine of argument

sinh sinhf sinhl return hyperbolic sine of argument

tanh tanhf tanhl return hyperbolic tangent of argument

trig(3M) acos acosf acosl return the arccosine of an argument inradians

asin asinf asinl return the arcsine of an argument inradians

atan atanf atanl return the arctangent of an argumentin radians

atan2 atan2f atan2l return the arctangent of y/x in radians

cos cosf cosl return the cosine of an argument inradians

sin sinf sinl return the sine of an argument inradians

tan tanf tanl return the tangent of an argument inradians

Math Functions

General purpose library (libgen)

libgen contains general purpose functions, and functions designed to facilitate internationalization. Theseappear in Subsection 3G. ``libgen Functions'' describes some of the functions in libgen. The header fileslibgen.h and, occasionally, regexp.h should be included in programs that use these functions.

NOTE: Note that libc now contains a better set of regular expression functions as defined by XPG4/POSIX.

advance step Execute a regular expression on a string.

basename Return a pointer to the last element of a path name.

bgets Read a specified number of characters into a buffer from a stream until a specifiedcharacter is reached.

bufsplit Split the buffer into fields delimited by tabs and new−lines.

compile Return a pointer to a compiled regular expression that uses the same syntax as ed.

C and C++ compilation system

General purpose library (libgen) 49

Page 57: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

copylist Copy a file into a block of memory, replacing new−lines with null characters. It returns apointer to the copy.

dirname Return a pointer to the parent directory name of the file path name.

eaccess Determine if the effective user ID has the appropriate permissions on a file.

gmatch Check if name matches shell file name pattern.

isencrypt Use heuristics to determine if contents of a character buffer are encrypted.

mkdirp Create a directory and its parents.

p2open p2close p2open is similar to popen(3S). It establishes a two−way connection between the parentand the child. p2close closes the pipe.

pathfind Search the directories in a given path for a named file with given mode characteristics. Ifthe file is found, a pointer is returned to a string that corresponds to the path name of thefile. A null pointer is returned if no file is found.

regcmp Compile a regular expression and return a pointer to the compiled form.

regex Compare a compiled regular expression against a subject string.

rmdirp Remove the directories in the specified path.

strccpy strcadd strccpy copies the input string to the output string, compressing any C−like escapesequences to the real character. strcadd is a similar function that returns the address ofthe null byte at the end of the output string.

strecpy Copy the input string to the output string, expanding any non−graphic characters with theC escape sequence. Characters in a third argument are not expanded.

strfind Return the offset of the first occurrence of the second string in the first string. −1 isreturned if the second string does not occur in the first.

strrspn Trim trailing characters from a string. It returns a pointer to the last character in thestring not in a list of trailing characters.

strtrns Return a pointer to the string that results from replacing any character found in twostrings with a character from a third string. This function is similar to the tr command.

libgen Functions

Standard I/O

The functions in Subsection 3S constitute the standard I/O library for C programs. I/O involves:

reading information from a file or device to your program;• writing information from your program to a file or device;• opening and closing files that your program reads from or writes to.•

Standard files

Programs automatically start off with three open files: standard input, standard output, and standard error.These files with their associated buffering are designated stdin, stdout, and stderr, respectively. The shellassociates all three files with your terminal by default.

You can use functions and macros that deal with stdin, stdout, or stderr without having to open or close files.gets, for example, reads a string from stdin; puts writes a string to stdout. Other functions and macros read

C and C++ compilation system

Standard I/O 50

Page 58: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

from or write to files in different ways: character at a time, getc and putc; formatted, scanf and printf; and soon. You can specify that output be directed to stderr by using a function such as fprintf. fprintf works thesame way as printf except that it delivers its formatted output to a named stream, such as stderr.

Named files

Any file other than standard input, standard output, and standard error must be explicitly opened before yourprogram can read from or write to the file. You open a file with the standard library function fopen. fopentakes a path name, asks the system to keep track of the connection between your program and the file, andreturns a pointer that you can then use in functions that perform other I/O operations. This bookkeepingmechanism associated with a file is referred to as a stdio stream.

The data structure associated with a stream, FILE, is defined in stdio.h. The FILE structure members areintended for use only by the stdio subsystem. For example, your program must have a declaration such as

FILE fin;

which says that fin is a pointer to a FILE. The statement

fin = fopen("filename", "r");

associates a structure of type FILE with filename, the path name of the file to open, and returns a pointer toit. The ``"r"'' means that the file is to be opened for reading. This argument is known as the mode. There aremodes for reading, writing, and both reading and writing.

In practice, the file open function is often included in an if statement:

if ((fin = fopen("filename", "r")) == NULL) (void)fprintf(stderr,"Cannot open input file %s\n", "filename");

which uses fopen to return a NULL pointer if it cannot open the file. To avoid falling into the immediatelyfollowing code on failure, you can call exit, which causes your program to quit:

if ((fin = fopen("filename", "r")) == NULL) { (void)fprintf(stderr,"Cannot open input file %s\n", "filename"); exit(1); }

Once you have opened the file, you use the pointer fin in functions or macros to refer to the streamassociated with the opened file:

int c; c = getc(fin);

brings in one character from the stream into an integer variable called c. The variable c is declared as aninteger even though it is reading characters because getc returns an integer. Getting a character is oftenincorporated in some flow−of−control mechanism such as

while ((c = getc(fin)) != EOF) . . .

C and C++ compilation system

Standard I/O 51

Page 59: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

that reads through the file until EOF is returned. EOF, NULL, and the macro getc are all defined in stdio.h.

Your program may have multiple files open simultaneously, 20 or more depending on system configuration. Ifyour program needs to open more files than it is permitted to have open simultaneously, you can use thestandard library function fclose to free up a stdio stream for reuse by your program. fclose first flushes theoutput buffer for write streams and frees any memory allocated by the stdio subsystem associated with thestream. The stdio stream is then available for reuse. exit closes all open files for you, but it also gets youcompletely out of your process, so you should use it only when you are sure you are finished.

Passing command line arguments

As indicated in ``Introduction to programming in standard C and C++'', information or control data can bepassed to a C program as an argument on the command line. When you execute the program, command linearguments are made available to the function main in two parameters, an argument count, conventionallycalled argc, and an argument vector, conventionally called argv. argc is the number of arguments with whichthe program was invoked. argv is an array of pointers to characters strings that contain the arguments, one perstring. Because the command name itself is considered to be the first argument, or argv[0], the count isalways at least one.

If you plan to accept run−time parameters in your program, you need to include code to deal with theinformation.

``Using argv[1] to Pass a File Name'' and ``Using Command Line Arguments to Set Flags'' show programfragments that illustrate two common uses of run−time parameters:

``Using argv[1] to Pass a File Name'' shows how you provide a variable file name to a program, suchthat a command of the form

$ prog filename

will cause prog to attempt to open the specified file.

``Using Command Line Arguments to Set Flags'' shows how you set internal flags that control theoperation of a program, such that a command of the form

$ prog −opr

will cause prog to set the corresponding variables for each of the options specified. The getoptfunction used in the example is the most common way to process arguments in UNIX® operatingsystem programs. See getopt(3C) for more information.

#include <stdio.h>

int main(int argc, char argv[]) { FILE fin; int ch;

switch (argc) { case 2: if ((fin = fopen(argv[1], "r")) == NULL) { / First string (%s) is program name(argv[0]). / / Second string (%s) is name of file that could / / not be opened (argv[1]). /

(void)fprintf(stderr, "%s: Cannot open input file %s\n", argv[0], argv[1]); return(2); } break; case 1: fin =stdin; break;

C and C++ compilation system

Standard I/O 52

Page 60: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

default: (void)fprintf(stderr, "Usage: %s [file]\n", argv[0]); return(2); }

while ((ch = getc(fin)) != EOF) (void)putchar(ch);

return (0);

}

Using argv[1] to Pass a File Name

#include <stdio.h> #include <stdlib.h>

int main(int argc, char argv[]) { int oflag = 0; int pflag = 0; / Function flags / int rflag = 0; int ch;

while ((ch = getopt(argc, argv, "opr")) != −1) { / For options present, set flag to 1. / / If unknown optionspresent, print error message. /

switch (ch) { case 'o': oflag = 1; break; case 'p': pflag = 1; break; case 'r': rflag = 1; break; default:(void)fprintf(stderr, "Usage: %s [−opr]\n", argv[0]); return(2); } } / Do other processing controlled by oflag,pflag, rflag. / return(0); }

Using Command Line Arguments to Set Flags

Reentrant libraries

The traditional UNIX® process could be characterized as having a single flow of control. Global datastructures were guaranteed to be consistent, in the absence of signal handlers that might modify the globaldata or if inter−process shared memory was not used. Global variables could be counted on to retain theirmost recently set value throughout the process, since no other flow of control could be changing it "behindyour back".

Multi−threaded applications allow for multiple flows of control through a process. Each flow of control, orthread, shares access to the global variables and data space of the overall process. This includes global andstatic scope variables as well as the heap storage managed by the C library memory allocation routines. Thereis nothing that magically prevents multiple threads within a process from interfering with each other in a waythat causes data corruption. Reentrant or thread safe libraries, therefore, must guarantee synchronized accessto their own global data or functions.

The following libraries are reentrant:

libc• libm• libnsl•

When building a multithreaded application, you must pass −K thread to the cc command. This enables yourapplication to see thread−safe interfaces from the system headers.

C and C++ compilation system

Reentrant libraries 53

Page 61: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

For details on multithreaded programming, see Programming with system calls and libraries. The C++standard library, libC, is reentrant (except for the old, pre−standard iostreams classes kept for compatibility,which are not).

BSD system libraries and header files

If you are migrating to UNIX® System V from a BSD System environment, or want to run BSD systemapplications, you may need to install and access the BSD libraries and header files included in the BSDCompatibility Package.

Accessing BSD libraries and header files

Once the BSD Compatibility Package is installed, compatibility package header files and libraries called bythe C compiler (cc) and linker (ld) are located in /usr/ucbinclude and /usr/ucblib. To access these header filesand libraries, set your PATH variable so that /usr/ucb comes before the default UNIX® System V pathdirectories /sbin, /usr/sbin, /usr/bin, and /usr/ccs/bin.

For more information on the BSD application compatibility, see Appendix B of Programming with SystemCalls and Libraries.

C and C++ compilation system

BSD system libraries and header files 54

Page 62: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

C language compilersThis topic is a guide to the C language compilers. It assumes that you have some experience with C, and arefamiliar with fundamental programming concepts.

The compilers are compatible with the C language described in the American National Standards Institute(ANSI) American National Standard for Information Systems−−Programming Language C documentnumber X3.159−1989. The standard language is referred to as ``ANSI C'' in this document. The compilers arealso compatible with the C language described in the International Standards Organization (ISO)Programming Languages−−C, document number ISO/IEC 9899:1990 (E), referred to as ``ISO C'', as well asthe updated C standard (C99), ISO/IEC 9899:1999, except for two major exceptions and two minor ones. Themain unimplemented features are complex arithmetic and variable length arrays. The former also means thatthere is no <complex.h> header. The minor items are the return value for snprintf(3S) when the destinationarray is not long enough (the C99−conforming functions are provided, but are spelled _xsnprintf() and_xvsnprintf()), and some header name space issues. The latter means that many of the new names added byC99 to existing headers will not be visible when using −Xc.

You can use this topic either as a quick reference guide, or as a comprehensive summary of the language asimplemented by the compilation system. Many topics are grouped according to their place in theANSI−specified phases of translation, which describe the steps by which a source file is translated into anexecutable program. The phases of translation are explained in ``Phases of translation''

Compilation modes

The compilation system has four compilation modes that correspond to degrees of compliance with ANSI C.The modes are:

−XaANSI C mode. Specifies standards conformance except that some required warnings are omitted andthe name space is expanded to include names that are not specified by the standards. All Cconstructions behave as specified in the standards. All implemented language and library extensionsbeyond the standards are also available. This is the default.

−XbBackward compatibility mode. Like −Xa, except that restrict and inline are not taken as C99keywords.

−XcConformance mode. Specifies strict standards conformance. Since the name space of the languageand headers are reduced from that of −Xa, certain extensions, such as the asm keyword, and somecommonly expected header file declarations are not available.

−XtTransition mode. Specifies standards conformance except where the semantics differ from classic C.Under this option, the compiler provides new ANSI C features and supports all extensions that wereprovided in C Issue 4 (CI4), which was the release made availablewith System V Release 3. Where the interpretation of a construct differs between CI4 and theStandard, the compiler issues a warning and follows the CI4 behavior.

C language compilers 55

Page 63: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

Feature test macros

There are two feature test macros that can be used with the compilation modes:

_POSIX_SOURCE• _XOPEN_SOURCE•

NOTE: _POSIX_SOURCE is the feature test macro for POSIX.1.

By default, neither feature test macro is defined. The default compilation mode is either −Xt or −Xa,depending on which compilation release is being used. With these default settings, every header makes visibleall declarations and definitions.

As explained previously, the −Xc mode specifies visibility according to the C standard. When using −Xc withno feature test macros defined, the headers owned by the C standard (assert.h, ctype.h, errno.h, float.h,limits.h, locale.h, math.h, setjmp.h, signal.h, stdarg.h, stddef.h, stdio.h, stdlib.h, string.h, and time.h) as wellas the additional C99 headers (complex.h fenv.h inttypes.h iso646.h stdbool.h stdint.h tgmath.h wchar.h andwctype.h). are minimized.

When a feature test macro is defined in combination with the −Xc compilation mode, those headers owned bythe associated [pseudo] standard are extended to make more declarations and definitions visible. For example,when using the −Xc compilation mode, if _POSIX_SOURCE is defined:

cc −Xc −D_POSIX_SOURCE

then the statement

FILE *fdopen(int, const char *);

would be present. However, it would not be present if the −Xc compilation mode is used without_POSIX_SOURCE defined.

Alternatively, when a feature test macro is defined in conjunction with −Xa or −Xt, the set of visibledeclarations and definitions in the associated headers are reduced. For example, <stdlib.h> will make getopt()visible if it is used with the −Xt compilation mode without _POSIX_SOURCE defined, but if_POSIX_SOURCE is defined, it will not be present.

NOTE: See The C Programming Language, Kernighan and Ritchie,(First Edition).

Global behavior

A program that depends on unsigned−preserving arithmetic conversions will behave differently. This isconsidered to be the most serious change made by ANSI C to a widespread current practice.

In The C Programming Language,Kernighan unsigned specified exactly one type; there were no unsignedchars, unsigned shorts, or unsigned longs, but most C compilers added these very soon thereafter.

C language compilers

Feature test macros 56

Page 64: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

In previous System V C compilers, the ``unsigned preserving'' rule is used for promotions: when an unsignedtype needs to be widened, it is widened to an unsigned type; when an unsigned type mixes with a signed type,the result is an unsigned type.

The other rule, specified by ANSI C, came to be called ``value preserving,'' in which the result type dependson the relative sizes of the operand types. When an unsigned char or unsigned short is ``widened,'' the resulttype is int if an int is large enough to represent all the values of the smaller type. Otherwise the result typewould be unsigned int. The ``value preserving'' rule produces the ``least surprise'' arithmetic result for mostexpressions.

Only in the transition (−Xt) mode will the compiler use the unsigned preserving promotions; in the other twomodes, conformance (−Xc) and ANSI (−Xa), the value preserving promotion rules will be used. If you use the−v option to the cc command, the compiler will warn about each expression whose behavior might depend onthe promotion rules used.

Phases of translation

The compiler processes a source file into an executable in eight conceptual steps, which are called phases oftranslation. While some of these phases may actually be folded together, the compiler behaves as if they occurseparately, in sequence.

Trigraph sequences are replaced by their single−character equivalents. Trigraph sequences areexplained in ``Preprocessing''.

1.

Any source lines that end with a backslash and new−line are spliced together with the next line bydeleting the backslash and new−line.

2.

The source file is partitioned into preprocessing tokens and sequences of white−space characters.Each comment is, in effect, replaced by one space character. Preprocessing tokens are explained in``Preprocessing''.

3.

Preprocessing directives are executed, and macros are expanded. Any files named in #includestatements are processed from phase 1 through phase 4, recursively.

4.

Escape sequences in character constants and string literals are converted to their character equivalents.5. Adjacent character string literals, and wide character string literals, are concatenated.6. Each preprocessing token is converted into a token. The resulting tokens are syntactically andsemantically analyzed and translated. Tokens are explained in ``Source files and tokenization''.

7.

All external object and function references are resolved. Libraries are linked to satisfy externalreferences not defined in the current translation unit. All translator output is collected into a programimage which contains information needed for execution.

8.

Output from certain phases may be saved and examined by specifying option flags on the cc command line.

The preprocessing token sequence resulting from Phase 4 can be saved by using the following options:

−P leaves preprocessed output in a file with a .i extension.1. −E sends preprocessed output to the standard output.2.

Output from Phase 7 can be saved in a file with a .o extension by using the −c option to cc. The output ofPhase 8 is the compilation system's final output (a.out).

C language compilers

Phases of translation 57

Page 65: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

Source files and tokenization

Tokens

A token is a series of contiguous characters that the compiler treats as a unit. Translation phase 3 partitions asource file into a sequence of tokens. Tokens fall into seven classes:

Identifiers• Keywords• Numeric Constants• Character Constants• String literals• Operators• Other separators and punctuators•

Identifiers

Identifiers are used to name things such as variables, functions, data types, and macros.• Identifiers are made up of a combination of letters, digits, or underscore ( _ ) characters.• First character may not be a digit.•

Keywords

The following identifiers are reserved for use as keywords and may not be used otherwise:

asm default _Imaginary static __asm do inline struct auto double int switch _Bool else long typedef break enum register union case extern restrict unsigned char float return void _Complex for short volatile const goto signed while continue if sizeof

The keyword asm is reserved in all compilation modes except −Xc. The keyword __asm is a synonym forasm and is available under all compilation modes, although a warning will be issued when it is used under the−Xc mode.

The the new C99 keywords Complex and Imaginary are not recognized as keywords for −Xc and −Xb.

Constants

C language compilers

Source files and tokenization 58

Page 66: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

Integral constants

DecimalDigits 0−9.♦ First digit may not be 0 (zero).♦

OctalDigits 0−7.♦ First digit must be 0 (zero).♦

HexadecimalDigits 0−9 plus letters a−f or A−F. Letters correspond to decimal values 10−15.♦ Prefixed by 0x or 0X (digit zero).♦

Suffixes

All of the above can be suffixed to indicate type, as follows:

Suffix Type

u or U unsigned

l or L long

both unsigned long

ll or LL long long

u or U with ll or LL unsigned long long

Floating point constants

Consist of integer part, decimal point, fraction part, an e or E, an optionally signed integer exponent,and a type suffix, one of f, F, l, or L. Each of these elements is optional; however one of the followingmust be present for the constant to be a floating point constant:

A decimal point (preceded or followed by a number).♦ An e with an exponent.♦ Any combination of the above. Examples:

xxx e expxxx..xxx

Hexadecimal floating point constant consists of 0x or 0X followed by a hexadecimal digit sequencewith an optional radix (.(, followed by a p or P, an optional sign and an exponent.

Type determined by suffix; f or F indicates float, l or L indicates long double, otherwise type isdouble.

Character constants

One or more characters enclosed in single quotes, as in 'x'.• All character constants have type int.•

C language compilers

Constants 59

Page 67: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

Value of a character constant is the numeric value of the character in the ASCII character set.• A multiple−character constant that is not an escape sequence (see below) has a value derived from thenumeric values of each character. For example, the constant `123' has a value of

0 '3' '2' '1'

or 0x333231 on the 3B2. On the Intel386 microprocessor the value is

0 '1' '2' '3'

or 0x313233.

Character constants may not contain the character ' or new−line. To represent these characters, andsome others that may not be contained in the source character set, the compiler provides the followingescape sequences:

Escape sequences

new−line NL (LF) \n audible alert BEL \a

horizontal tab HT \t question mark ? \?

vertical tab VT \v double quote " \"

backspace BS \b octal escape ooo \ooo

carriage returnCR \r hexadecimal escapehh \xhh

formfeed FF \f backslash \ \\

single quote ' \'

If the character following a backslash is not one of those specified, the compiler will issue a warning and treatthe backslash−character sequence as the character itself. Thus, '\q' will be treated as 'q'. However, if yourepresent a character this way, you run the risk that the character may be made into an escape sequence in thefuture, with unpredictable results. An explicit new−line character is invalid in a character constant and willcause an error message.

The octal escape consists of one to three octal digits.• The hexadecimal escape consists of one or more hexadecimal digits.•

Wide characters and multibyte characters

A wide character constant is a character constant prefixed by the letter L.• A wide character has an external encoding as a multibyte character and an internal representation asthe integral type wchar_t, defined in stddef.h.

A wide character constant has the integral value for the multibyte character between single quotecharacters, as defined by the locale−dependent mapping function mbtowc.

C language compilers

Constants 60

Page 68: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

String literals

One or more characters surrounded by double quotes, as in ``"xyz"''.• Initialized with the characters contained in the double quotes.• Have static storage duration and type ``array of characters.''• The escape sequences described in ``Character Constants'' may also be used in string literals. Adouble quote within the string must be escaped with a backslash. New−line characters are not validwithin a string.

Adjacent string literals are concatenated into a single string. A null character, \0, is appended to theresult of the concatenation, if any.

String literals are also known as ``string constants.''•

Wide string literals

A wide−character string literal is a string literal immediately prefixed by the letter L.• Wide−character string literals have type ``array of wchar_t.''• Wide string literals may contain escape sequences, and they may be concatenated, like ordinary stringliterals.

Comments

Comments begin with the characters /* and end with the next */.

/* this is a comment */

Comments do not nest.

If a comment appears to begin within a string literal or character constant, it will be taken as part of the literalor constant, as specified by the phases of translation.

char *p = "/* this is not a comment */"; /* but this is */

In addition, //−style comments are accepted.

Preprocessing

Preprocessing handles macro substitution, conditional compilation, and file inclusion.• Lines beginning with # indicate a preprocessing control line. Spaces and tabs may appear before andafter the #.

Lines that end with a backslash character \ and new−line are joined with the next line by deleting thebackslash and the new−line characters. This occurs (in translation phase 2) before input is dividedinto tokens.

Each preprocessing control line must appear on a line by itself.•

C language compilers

String literals 61

Page 69: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

Trigraph sequences

Trigraph sequences are three−character sequences that are replaced by a corresponding single character inTranslation Phase 1, as follows:

??= # ??( [ ??< {

??/ \ ??) ] ??> }

??' ^ ??! | ??− ~

Digraph sequences behave the same as their corresponding single characters in all respects except fo rtheirspelling. Thus

<: :> <% >% %: %:%:

are equivalent to

[ ] { } # ##

respectively.

No other such sequences are recognized. The trigraph sequences provide a way to specify characters that aremissing on some terminals, but that the C language uses.

Preprocessing tokens

A token is the basic lexical unit of the language. All source input must be formed into valid tokens bytranslation phase seven. Preprocessing tokens (pp−tokens) are a superset of regular tokens. Preprocessingtokens allow the source file to contain non−token character sequences that constitute valid preprocessingtokens during translation. There are four categories of preprocessing tokens:

Header file names, meant to be taken as a single token.• Preprocessing numbers (discussed in the following section).• All other single characters that are not otherwise (regular) tokens. See the example in ``Preprocessingnumbers''.

Identifiers, numeric constants, character constants, string literals, operators, and punctuators.•

Preprocessing numbers

A preprocessing number is made up of a digit, optionally preceded by a period, and may be followedby letters, underscores, digits, periods, and any one of e+ e− E+ E− p+ p− P+ P−.

Preprocessing numbers include all valid number tokens, plus some that are not valid number tokens.For example, in the macro definition:

#define R 2e ## 3

the preprocessing number 2e is not a valid number. However, the preprocessing operator ## will``paste'' it together with the preprocessing number 3 when R is replaced, resulting in thepreprocessing number 2e3, which is a valid number.

C language compilers

Trigraph sequences 62

Page 70: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

Preprocessing directives

Preprocessing operators

The preprocessing operators are evaluated left to right, without any defined precedence.

#A macro parameter preceded by the # preprocessing operator has its corresponding unexpandedargument tokens converted into a string literal. (Any double quotes and backslashes contained incharacter constants or part of string literals are escaped by a backslash). The # character is sometimesreferred to as the ``stringizing'' operator. This rule applies only within function−like macros.

##If a replacement token sequence contains a ## operator, the ## and any surrounding white space aredeleted and adjacent tokens are concatenated, creating a new token. This occurs only when the macrois expanded. See ``Macro definition and expansion''

Macro definition and expansion

An object−like macro is defined with a line of the form:

#define identifier token−sequence[opt]

where identifier will be replaced with token−sequence wherever identifier appears in regular text.

A function−like macro is defined with a line of the form:

#define identifier( identifier−list[opt] ) token−sequence[opt]

where the macro parameters are contained in the comma−separated identifier−list. Thetoken−sequence following the identifier list determines the behavior of the macro, and is referred toas the replacement list. There can be no space between the identifier and the ( character. For example:

#define FLM(a,b) a+b

The replacement−list a+b determines that the two parameters a and b will be added.

A function−like macro can also be defined with a line of the form:

#define identifier( identifier−list[opt], ... ) token−sequence[opt]

where if the identifier−list is not present, there is also no comma. This form of macro definitionaccepts zero or more arguments matching the ellipsis. Those arguments, if any, that match the ellipsisin an invocation of this macro replace the special identifier __VA_ARGS__ optionally present in thereplacement token sequence.

A function−like macro is invoked in normal text by using its identifier, followed by a ( token, a list oftoken sequences separated by commas, and a ) token. For example:

FLM(3,2)

C language compilers

Preprocessing directives 63

Page 71: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

The arguments in the invocation (comma−separated token sequences) may be expanded, and theythen replace the corresponding parameters in the replacement token sequence of the macro definition.Macro arguments in the invocation are not expanded if they are operands of # or ## operators in thereplacement string. Otherwise, expansion does take place. For example:

Assume that M1 is defined as 3:

#define M1 3

When the function−like macro FLM is used, use of the # or ## operators will affect expansion (andthe result), as follows:

Definition Invocation Result Expansion?

a+b FLM(M1,2) 3+2 Yes, Yes

#a FLM(M1,2) "M1" No

a##b FLM(M1,2) M12 No, No

a+#a FLM(M1,2) 3+"M1" Yes, No

In the last example line, the first a in a+#a is expanded, but the second a is not expanded because it isan operand of the # operator.

The number of arguments in the invocation must match the number of parameters in the definition.• A macro's definition, if any, can be eliminated with a line of the form:

#undef identifier

There is no effect if the definition doesn't exist.

File inclusion

A line of the form:

#include "filename"

causes the entire line to be replaced with the contents of filename. The following directories aresearched, in order.

The current directory (of the file containing the #include line).♦ Any directories named in −I options to the compiler, in order.♦ A list of standard places, typically, but not necessarily, /usr/include.♦

A line of the form:

#include <filename>

causes the entire line to be replaced with contents of filename. The angle brackets surroundingfilename indicate that filename is not searched for in the current directory.

A third form allows an arbitrary number of preprocessing tokens to follow the #include, as in:

#include preprocessing−tokens

C language compilers

Preprocessing directives 64

Page 72: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

The preprocessing tokens are processed the same way as when they are used in normal text. Anydefined macro name is replaced with its replace−ment list of preprocessing tokens. The preprocessing tokens must expand to match one of the first twoforms ( ``< . . . >'' or ``". . ."'' ).A file name beginning with a slash / indicates the absolute pathname of a file to include, no matterwhich form of #include is used.

Any #include statements found in an included file cause recursive processing of the named file(s).•

Conditional compilation

Different segments of a program may be compiled conditionally. Conditional compilation statements mustobserve the following sequence:

One of: #if or #ifdef or #ifndef.1. Any number of optional #elif lines.2. One optional #else line.3. One #endif line.4.

#if integral−constant−expression

Is true if integral−constant−expression evaluates to nonzero.

If true, tokens following the if line are included.

The integral−constant−expression following the if is evaluated by following this sequence of steps:

Any preprocessing tokens in the expression are expanded. Any use of the defined operatorevaluates to 1 or 0 if its operand is, respectively, defined, or not.

1.

If any identifiers remain, they evaluate to 0.2. The remaining integral constant expression is evaluated. The constant expression must bemade up of components that evaluate to an integral constant. In the context of a #if, theintegral constant expression may not contain the sizeof operator, casts, or floating pointconstants.

The following table shows how various types of constant expressions following a #if wouldbe evaluated. Assume that name is not defined.

Constant expression Step 1 Step 2 Step 3

__STDC__ 1 1 1

!defined(__STDC__) !1 !1 0

3||name 3||name 3||0 1

2 + name 2 + name 2 + 0 2

3.

#ifdef identifier

Is true if identifier is currently defined by #define or by the −D option to the cc command line.

#ifndef identifier

Is true if identifier is not currently defined by #define (or has been undefined).

#elif constant−expression•

C language compilers

Preprocessing directives 65

Page 73: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

Indicates alternate if−condition when all preceding if−conditions are false.#else

Indicates alternate action when no preceding if or elif conditions are true. A comment may follow theelse, but a token may not.

#endif

Terminates the current conditional. A comment may follow the endif, but a token may not.

Line control

Useful for programs that generate C programs.• A line of the form

#line constant "filename"

causes the compiler to believe, for the purposes of error diagnostics and debugging, that the linenumber of the next source line is equal to constant (which must be a decimal integer) and the currentinput file is filename (enclosed in double quotes). The quoted file name is optional. constant must bea decimal integer in the range 1 to MAXINT. MAXINT is defined in limits.h.

Assertions

A line of the form

#assert predicate (token−sequence)

associates the token−sequence with the predicate in the assertion name space (separate from the space usedfor macro definitions). The predicate must be an identifier token.

#assert predicate

asserts that predicate exists, but does not associate any token sequence with it.

The compiler provides the following predefined predicates by default on the 3B2:

#assert machine ( u3b2 ) #assert system ( unix ) #assert cpu ( M32 )

The following defaults apply to the Intel386 microprocessor:

#assert machine ( i386 ) #assert system ( unix ) #assert cpu ( i386 )

Any assertion may be removed by using #unassert, which uses the same syntax as assert. Using #unassertwith no argument deletes all assertions on the predicate; specifying an assertion deletes only that assertion.

C language compilers

Preprocessing directives 66

Page 74: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

An assertion may be tested in a #if statement with the following syntax:

#if #predicate(non−empty token−list)

For example, the predefined predicate system can be tested with the following line:

#if #system(unix)

which will evaluate true.

Version control

The #ident directive is used to help administer version control information.

#ident "version"

puts an arbitrary string in the .comment section of the object file. The .comment section is not loaded intomemory when the program is executed.

Pragmas

Preprocessing lines of the form

#pragma pp−tokens

specify implementation−defined actions.

Three #pragmas are recognized by the compilation system:#pragma ident "version"

which is identical in function to #ident "version".

#pragma weak identifier

which identifies identifier as a weak global symbol,

or

#pragma weak identifier = identifier2

which identifies identifier as a weak global symbol whose value is the same as identifier2.identifier should otherwise be undefined. See ``Handling multiply defined symbols'' in ``Cand C++ compilation system'' for more information on weak global symbols.

#pragma int_to_unsigned identifier

which identifies identifier as a function whose type was int in previous releases of thecompilation system, but whose type is unsigned int in this release. The declaration foridentifier must precede the #pragma.

unsigned int strlen(const char*);#pragma int_to_unsigned strlen

C language compilers

Preprocessing directives 67

Page 75: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

#pragma int_to_unsigned makes it possible for the compiler to identify expressions inwhich the function's changed type may affect the evaluation of the expression. In the −Xtmode the compiler treats the function as if it were declared to return int rather than unsignedint.

The Intel386 microprocessor has a fourth #pragma:#pragma pack(n)

which controls the layout of structure offsets. n is a number, 1, 2, or 4, that specifies thestrictest alignment desired for any structure member.If n is omitted, the alignment reverts to the default, which may have been set by the −Zpoption to cc.

A value of 4 is the default.

♦ •

The compiler ignores unrecognized pragmas.•

Error generation

A preprocessing line consisting of

#error token−sequence

causes the compiler to produce a diagnostic message containing the token−sequence, and stop.

Predefined names

The following identifiers are predefined as object−like macros:

__LINE__The current line number as a decimal constant.

__FILE__A string literal representing the name of the file being compiled.

__DATE__The date of compilation as a string literal in the form ``Mmm dd yyyy.''

__TIME__The time of compilation, as a string literal in the form ``hh:mm:ss.''

__STDC__The constant 1 under compilation mode −Xc, otherwise 0.

__USLC__A positive integer constant; its definition signifies a USL C compilation system.

With the exception of __STDC__, these predefined names may not be undefined or redefined. Undercompilation mode −Xt, __STDC__ may be undefined (#undef __STDC__) to cause a source file to think it isbeing compiled by a previous version of the compiler.

C language compilers

Preprocessing directives 68

Page 76: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

Declarations and definitions

A declaration describes an identifier in terms of its type and storage duration. The location of a declaration(usually, relative to function blocks) implicitly determines the scope of the identifier.

Basic types

The basic types and their sizes are:

char (1 byte)• short int (2 bytes)• int (4 bytes)• long int (4 bytes)• long long int (8 bytes)

Each of char, short, int, and long may be prefixed with signed or unsigned. A type specified withsigned is the same as the type specified without signed except for signed char on the 3B2. (char onthe 3B2 has only non−negative values.)

float (4 bytes)• double (8 bytes)• long double (12 bytes)• void•

Integral and floating types are collectively referred to as ``arithmetic types''. Arithmetic types and pointertypes make up the ``scalar types''.

NOTE: See ``Pointer declarators'' for more information.

Type qualifiers

const

The compiler may place an object declared const in read−only memory. The program may not changeits value and no further assignment may be made to it. An explicit attempt to assign to a const objectwill provoke an error.

volatile

volatile advises the compiler that unexpected, asynchronous events may affect the object so declared,and warns it against making assumptions. An object declared volatile is protected from optimizationthat might otherwise occur.

restrict

This keyword has no code generation effects in this implementation.

C language compilers

Declarations and definitions 69

Page 77: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

Structures and unions

Structures

A structure is a type that consists of a sequence of named members. The members of a structure mayhave different object types (as opposed to an array, whose members are all of the same type). Todeclare a structure is to declare a new type. A declaration of an object of type struct reserves enoughstorage space so that all of the member types can be stored simultaneously.

A structure member may consist of a specified number of bits, called a bit−field. The number of bits(the size of the bit−field) is specified by appending a colon and the size (an integral constantexpression, the number of bits) to the declarator that names the bit−field. The declarator name itself isoptional; a colon and integer will declare the bit−field. A bit−field must have integral type. The sizemay be zero, in which case the declaration name must not be specified, and the next member starts ona boundary of the type specified. For example:

char :0

means ``start the next member (if possible) on a char boundary.''

A named bit−field number that is declared with an explicitly unsigned type holds values in the range

0 − (2[n] −1)

where n is the number of bits. A bit−field declared with an explicit signed types holds values in therange

−2[n−1] − (2[n−1] −1)

A bit−field declared neither explicitly signed nor unsigned will hold values in one of the two ranges,depending on the machine. Consult specific Application Binary Interface Processor Supplement todetermine which of the two ranges are correct for a specific processor.

An optional structure tag identifier may follow the keyword struct. The tag names the type ofstructure described, and may then be used with struct as a shorthand name for the declarations thatmake up the body of the structure. For example:

struct t { int x; float y;} st1, st2;

Here, st1 and st2 are structures, each made up of x, an int, and y, a float. The tag t may be used todeclare more structures identical to st1 and st2, as in:

struct t st3;

A structure may include a pointer to itself as a member; this is known as a ``self−referentialstructure''.

struct n { int x; struct n *left;

C language compilers

Basic types 70

Page 78: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

struct n *right;};

As a special case, the last member of a struct can be an array with no length.Unions

A union is an object that may contain one of several different possible member types. A union mayhave bit−field members. Like a structure, declaring a union declares a new type. Unlike a structure, aunion stores the value of only one member at a given time. A union does, however, reserve enoughstorage to hold its largest member.

Enumerations

An enumeration is a unique type that consists of a set of constants called enumerators. The enumerators aredeclared as constants of type int, and optionally may be initialized by an integral constant expressionseparated from the identifier by an = character.

Enumerations consist of two parts:

The set of constants.• An optional tag.•

For example:

enum color {red, blue=5, yellow};

color is the tag for this enumeration type. red, blue, and yellow are its enumeration constants. If the firstenumeration constant in the set is not followed by an =, its value is 0. Each subsequent enumeration constantnot followed by an = is determined by adding 1 to the value of the previous enumeration constant. Thusyellow has the value 6.

enum color car_color;

declares car_color to be an object of type enum color.

Scope

The use of an identifier is limited to an area of program text known as the identifier's scope. The four kinds ofscope are function, file, block, and function prototype.

The scope of every identifier (other than label names) is determined by the placement of itsdeclaration (in a declarator or type specifier).

The scope of structure, union and enumeration tags begins just after the appearance of the tag in atype specifier that declares the tag. Each enumeration constant has scope that begins just after theappearance of its defining enumerator in an enumerator list. Any other identifier has scope that beginsjust after the completion of its declarator.

If the declarator or type specifier appears outside a function or parameter list, the identifier has filescope, which terminates at the end of the file (and all included files).

If the declarator or type specifier appears inside a block or within the list of parameter declarations ina function definition, the identifier has block scope, which ends at the end of the block (at the } that

C language compilers

Basic types 71

Page 79: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

closes that block).If the declarator or type specifier appears in the list of parameter declarations in a function prototypedeclaration, the identifier has function prototype scope, which ends at the end of the functiondeclarator (at the ) that ends the list).

Label names always have function scope. A label name must be unique within a function.•

Storage duration

Automatic Storage Duration

Storage is reserved for an automatic object, and is available for the object on each entry (by anymeans) into the block in which the object is declared. On any type of exit from the block, storage is nolonger reserved.

Static Storage Duration

An object declared outside any block, or declared with the keywords static or extern, has storagereserved for it for the duration of the entire program. The object retains its last−stored valuethroughout program execution.

Storage class specifiers

auto

An object may be declared auto only within a function. It has block scope and the defined object hasautomatic storage duration.

register

A register declaration is equivalent to an auto declaration. It also advises the compiler that the objectwill be accessed frequently. It is not possible to take the address of a register object.

static

static gives a declared object static storage duration.

NOTE: See ``Storage duration'' for more information.

The object may be defined inside or outside functions. An identifier declared static with file scopehas internal linkage. A function may be declared or defined with static. If a function is defined to bestatic, the function has internal linkage. A function may be declared with static at block scope; thefunction should be defined with static as well.

extern

extern gives a declared object static storage duration. An object or function declared with extern hasthe same linkage as any visible declaration of the identifier at file scope. If no file scope declaration isvisible the identifier has external linkage.

typedef

Using typedef as a storage class specifier does not reserve storage. Instead, typedef defines anidentifier that names a type. See the section on derived types for a discussion of typedef.

C language compilers

Storage duration 72

Page 80: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

Declarators

A brief summary of the syntax of declarators:

declarator:pointer[opt] direct−declarator

direct−declarator: identifier ( declarator ) direct−declarator [ constant−expression[opt] ]

direct−declarator ( parameter−type−list ) direct−declarator ( identifier−list[opt] )

pointer: * type−qualifier−list[opt] * type−qualifier−list[opt] pointer

Pointer declarators

Pointer to a type:

char *p;

p is a pointer to type char. p contains the address of a char object.

Care should be taken when pointer declarations are qualified with const:

const int *pci;

declares a pointer to a const−qualified (``read−only'') int.

int *const cpi;

declares a pointer−to−int that is itself ``read−only.''

Pointer to a pointer:

char **t;

t points to a character pointer.

Pointer to a function:

int (*f)();

f is a pointer to a function that returns an int.

Pointer to void:

void *

A pointer to void may be converted to or from a pointer to any object or incomplete type, without lossof information. This ``generic pointer'' behavior was previously carried out by char *; a pointer tovoid has the same representation and alignment requirements as a pointer to a character type.

C language compilers

Declarators 73

Page 81: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

Array declarators

One−dimensional array:

int ia[10];

ia is an array of 10 integers.

Two−dimensional array:

char d[4][10];

d is an array of 4 arrays of 10 characters each.

Array of pointers:

char *p[7];

p is an array of seven character pointers.

An array type of unknown size is known as an ``incomplete type''.

Function declarators

A function declaration includes the return type of the function, the function identifier, and an optionallist of parameters.

Function prototype declarations include declarations of parameters in the parameter list.• If the function takes no arguments, the keyword void may be substituted for the parameter list in aprototype.

A parameter type list may end with an ellipsis ``, . . . '' to indicate that the function may take morearguments than the number described. The comma is necessary only if it is preceded by an argument.

The parameter list may be omitted, which indicates that no parameter information is being provided.•

Examples:

void srand(unsigned int seed);

The function srand returns nothing; it has a single parameter which is an unsigned int. The name seedgoes out of scope at the ) and as such serves solely as documentation.

int rand(void);

The function rand returns an int; it has no parameters.

int strcmp(const char *, const char *);

The function strcmp returns an int; it has two parameters, both of which are pointers to characterstrings that strcmp does not change.

void (*signal(int, void (*)(int)))(int);

The function signal returns a pointer to a function that itself returns nothing and has an int parameter;the function signal has two parameters, the first of which has type int and the second has the sametype as signal returns.

C language compilers

Declarators 74

Page 82: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

int fprintf(FILE *stream, const char *format, . . .);

The function fprintf returns an int; FILE is a typedef name declared in stdio.h; format is a constqualified character pointer; note the use of ellipsis ( . . . ) to indicate an unknown number ofarguments.

Function definitions

A function definition includes the body of the function after the declaration of the function. As withdeclarations, a function may be defined as a function prototype definition or defined in the old style. Thefunction prototype style includes type declarations for each parameter in the parameter list. This exampleshows how main would be defined in each style:

/* Function Prototype Style */ /* Old Style */

int int main(int argc, char *argv[]) main(argc, argv) { int argc; ... char *argv[]; } { ... }

Some important rules that govern function definitions:

Function definitions can include the inline function specifier. This serves as a hint to the compilerthat it might want to replace calls to this function with an in−line replacement (with identicalsemantics). An inline function cannot contain a definition of a modifiable object with static storageduration, nor can it refer to an identifier with internal linkage.

NOTE: No inlining will occur unless the −O option of cc(1) is specified.

An old style definition names its parameters in an identifier list, and their declarations appear betweenthe function declarator and the ``{'' that begins the function body.

Under the old style, if the type declaration for a parameter was absent, the type defaulted to int. In thenew style, all parameters in the parameter list must be type−specified and named. The exception tothis rule is the use of ellipsis, explained in ``Function declarators''

A function definition serves as a declaration.• Incomplete types are not allowed in the parameter list or as the return type of a function definition.They are allowed in other function declarations.

Conversions and expressions

Implicit conversions

Characters and integers

Any of the following may be used in an expression where an int or unsigned int may be used.

char.• short int.• A char, short, or int bit−field.•

C language compilers

Function definitions 75

Page 83: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

The signed or unsigned varieties of any of the above types.• An object or bit−field that has enumeration type.•

If an int can represent all values of the original type, the value is converted to an int; otherwise it is convertedto an unsigned int. This process is called ``integral promotion''.

NOTE: The promotion rules for ANSI C are different from previous releases. Use the −Xt mode to get theolder behavior.

Compilation mode dependencies that affect unsigned types

Under compilation mode −Xt, unsigned char and unsigned short are promoted to unsigned int.• Under compilation modes −Xa and −Xc, unsigned char and unsigned short are promoted to int.•

Signed and unsigned integers

When an integer is converted to another integral type, the value is unchanged if the value can berepresented by the new type.

If a negative signed integer is converted to an unsigned integer with greater size, the signed integer isfirst promoted to the signed integer corresponding to the unsigned integer.

Integral and floating

When a floating type is converted to any integral type, any fractional part is discarded.

Float and double

A float is promoted to double or long double, or a double is promoted to long double without a change invalue.

The actual rounding behavior that is used when a floating point value is converted to a smaller floating pointvalue depends on the rounding mode in effect at the time of execution. The default rounding mode is ``roundto nearest.'' See ``Floating point operations'' and the IEEE Standard for Binary Floating−Point Arithmetic(ANSI/IEEE Std 754−1985) for a more complete discussion of rounding modes.

Usual arithmetic conversions

Some binary operators convert the types of their operands in order to yield a common type, which is also thetype of the result. These are called the ``usual arithmetic conversions'':

If either operand is type long double, the other operand is converted to long double.• Otherwise, if either operand has type double, the other operand is converted to double.• Otherwise, if either operand has type float, the other operand is converted to float.• Otherwise, the integral promotions are performed on both operands, such that the smaller of the two ispromoted to type of the larger, where both are of at least size int. The sizes of the types are consideredas follows, from largest to smallest:

C language compilers

Implicit conversions 76

Page 84: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

long double > double > float > unsigned long long > long long> unsigned long > long > unsigned int > int

However, if one operand is an unsigned type and the other is a potentially larger signed type, and theyare the same size, they are converted to the unsigned version of the signed type. For example, in thecase of long + unsigned int, both operands are converted to unsigned long. In the case of unsignedlong + long long, the unsigned long is converted to long long.

Expressions

Objects and lvalues

An object is a manipulatable region of storage. An lvalue is an expression referring to an object. An obviousexample of an lvalue expression is an identifier. There are operators that yield lvalues: for example, if E is anexpression of pointer type, then *E is an lvalue expression referring to the object to which E points.

An lvalue is ``modifiable'' if:

it does not have array type,• it does not have an incomplete type,• it does not have a const−qualified type,•

and, if it is a structure or union, it does not have any member (including, recursively, any member of allcontained structures or unions) with a const−qualified type.

The name ``lvalue'' comes from the assignment expression E1 = E2 in which the left operand E1 must be anlvalue expression.

Primary expressions

Identifiers, constants, string literals, and parenthesized expressions are primary expressions.• An identifier is a primary expression, provided it has been declared as designating an object (whichmakes it an lvalue) or a function (which makes it a function designator).

A constant is a primary expression; its type depends on its form and value.• A string literal is a primary expression; it is an lvalue.• A parenthesized expression is a primary expression. Its type and value are identical to those of theunparenthesized version. It is an lvalue, a function designator, or a void expression, according to thetype of the unparenthesized expression.

A compound literal takes the form (type){init−list}. It behaves in the same way as an object of thespecified type, with a value provided by the initialization list.

Operators

A table of operator associativity and precedence appears in the next section.

C language compilers

Expressions 77

Page 85: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

Unary operators

Expressions with unary operators group right to left.

* eIndirection operator. Returns the object or function pointed to by its operand. If the type of theexpression is ``pointer to ...,'' the type of the result is ``....''

& eAddress operator. Returns a pointer to the object or function referred to by the operand. Operandmust be an lvalue or function type, and not a bit−field or an object declared register. Where theoperand has type ``type,'' the result has type ``pointer to type.''

− eNegation operator. The operand must have arithmetic type. Result is the negative of its operand.Integral promotion is performed on the operand, and the result has the promoted type. The negative ofan unsigned quantityis computed by subtracting its value from 2[n] where n is the number of bits in the result type.

+ eUnary plus operator. The operand must have arithmetic type. Result is the value of its operand.Integral promotion is performed on the operand, and the result has the promoted type.

! eLogical negation operator. The operand must have arithmetic or pointer type. Result is one if thevalue of its operand is zero, zero if the value of its operand is nonzero. The type of the result is int.

~ eThe ~ operator yields the one's complement (all bits inverted) of its operand, which must haveintegral type. Integral promotion is performed on the operand, and the result has the promoted type.

++eThe object referred to by the lvalue operand of prefix ++ is incremented. The value is the new valueof the operand but is not an lvalue. The expression ++x is equivalent to x += 1. The type of the resultis the type of the operand.

−−eThe modifiable lvalue operand of prefix −− is decremented analogously to the prefix ++ operator.

e++When postfix ++ is applied to a modifiable lvalue, the result is the value of the object referred to bythe lvalue. After the result is noted, the object is incremented in the same manner as for the prefix ++operator. The type of the result is the same as the type of the lvalue.

e−−When postfix −− is applied to an lvalue, the result is the value of the object referred to by the lvalue.After the result is noted, the object is decremented in the same manner as for the prefix −− operator.The type of the result is the same as the type of the lvalue.

sizeof e

C language compilers

Operators 78

Page 86: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

The sizeof operator yields the size in bytes of its operand. When applied to an object with array type,the result is the total number of bytes in the array. (The size is determined from the declarations of theobjects in the expression.) This expression is semantically an unsigned constant (of type size_t, atypedef) and may be used anywhere a constant is required (except in a #if preprocessing directiveline). One major use is in communication with routines like storage allocators and I/O systems.

sizeof (type)The sizeof operator may also be applied to a parenthesized type name. In that case it yields the size inbytes of an object of the indicated type.

Cast operators − explicit conversions

(type) ePlacing a parenthesized type name before an expression converts the value of the expression to thattype. Both the operand and type must be pointer type or an arithmetic type.

Multiplicative operators

The multiplicative operators *, /, and % group left to right. The usual arithmetic conversions are performed,and that is the type of the result.

e*eMultiplication operator. The * operator is commutative.

e/eDivision operator. When positive integers are divided, truncation is toward 0. If either operand isnegative, the quotient is negative. Operands must be arithmetic types.

e%eRemainder operator. Yields the remainder from the division of the first expression by the second. Theoperands must have integral type.The sign of the remainder is that of the first operand. It is always true that (a/b)*b + a%b is equal toa (if a/b is representable).

Additive operators

The additive operators + and − group left to right. The usual arithmetic conversions are performed. There aresome additional type possibilities for each operator.

e+eResult is the sum of the operands. A pointer to an object in an array and an integral value may beadded. The latter is always converted to an address offset by multiplying it by the size of the object towhich the pointer points. The result is a pointer of the same type as the original pointer that points to

C language compilers

Operators 79

Page 87: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

another object in the same array, appropriately offset from the original object. Thus if P is a pointer toan object in an array, the expression P+1 is a pointer to the next object in the array. No further typecombinations are allowed for pointers.

The + operator is commutative.

The valid operand type combinations for the + operator are:

a + a

p + i or i + p

where a is an arithmetic type, i is an integral type, and p is a pointer.

e−eResult is the difference of the operands. The operand combinations are the same as for the + operator,except that a pointer type may not be subtracted from an integral type.

Also, if two pointers to objects of the same type are subtracted, the result is converted (by division bythe size of the object) to an integer that represents the number of objects separating the pointed−toobjects. This conversion will in general give unexpected results unless the pointers point to objects inthe same array, because pointers, even to objects of the same type, do not necessarily differ by amultiple of the object size. The result type is ptrdiff_t (defined in stddef.h). ptrdiff_t is a typedef forint in this implementation. It should be used ``as is'' to ensure portability. Valid type combinations are

a − a

p − i

p − p

Bitwise shift operators

The bitwise shift operators << and >> take integral operands.

e1 << e2Shifts e1 left by e2 bit positions. Vacated bits are filled with zeros.

e1 >> e2Shifts e1 right by e2 bit positions. Vacated bits are filled with zeros on the 3B2. On the Intel386microprocessor, vacated bits are filled with zeros if the promoted type of e1 is an unsigned type.Otherwise they are filled with copies of the sign bit of the promoted value of e1.

The result types of the bitwise shift operators are compilation−mode dependent, as follows:

−XtThe result type is unsigned if either operand is unsigned.

C language compilers

Operators 80

Page 88: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

−Xa, −XcThe result type is the promoted type of the left operand. Integral promotion occurs before the shiftoperation.

Relational operators

a relop ap relop p

The relational operators < (less than) > (greater than) <= (less than or equal to) >= (greater than orequal to) yield 1 if the specified relation is true and 0 if it is false.

The result has type int.• Both operands:

have arithmetic type; or♦ are pointers to qualified or unqualified versions of the same object or incomplete types.♦

Equality operators

a eqop ap eqop pp eqop 0

0 eqop p

The == (equal to) and != (not equal to) operators are analogous to the relational operators; however,they have lower precedence.

Bitwise AND operator

ie1 & ie2

Bitwise ``and'' of ie1 and ie2.• Value contains a 1 in each bit position where both ie1 and ie2 contain a 1, and a 0 in every otherposition.

Operands must be integral; the usual arithmetic conversions are applied, and that is the type of theresult.

Bitwise exclusive OR operator

ie1 ^ ie2

Bitwise exclusive ``or'' of ie1 and ie2.•

C language compilers

Operators 81

Page 89: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

Value contains a 1 in each position where there is a 1 in either ie1 or ie2, but not both, and a 0 inevery other bit position.

Operands must be integral; the usual arithmetic conversions are applied, and that is the type of theresult.

Bitwise OR operator

ie1 | ie2

Bitwise inclusive ``or'' of ie1 and ie2.• Value contains a 1 in each bit position where there is a 1 in either ie1 or ie2, and a 0 in every other bitposition.

Operands must be integral; the usual arithmetic conversions are applied, and that is the type of theresult.

Logical AND operator

e1 && e2

Logical ``and'' of e1 and e2.• e1 and e2 must be scalars.• e1 is evaluated first, and e2 is evaluated only if e1 is nonzero.• Result is 1 if both e1 and e2 are non−zero, otherwise 0.• Result type is int.•

Logical OR operator

e1 || e2

Logical ``or'' of e1 and e2.• e1 and e2 must be scalars.• e1 is evaluated first, and e2 is evaluated only if e1 is zero. Result is 0 only if both e1 and e2 are false,otherwise 1.

Result type is int.•

Conditional operator

e ? e1 : e2

If e is nonzero, then e1 is evaluated; otherwise e2 is evaluated. The value is e1 or e2.• The first operand must have scalar type.•

C language compilers

Operators 82

Page 90: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

For the second and third operands, one of the following must be true:Both must be arithmetic types. The usual arithmetic conversions are performed to make thema common type and the result has that type.

Both must have compatible structure or union type; the result is that type.♦ Both operands have void type; the result has void type.♦ Both operands are pointers to qualified or unqualified versions of compatible types. The resulttype is the composite type.

One operand is a pointer and the other is a null pointer constant. The result type is the pointertype.

One operand is a pointer to an object or incomplete type and the other is a pointer to aqualified or unqualified version of void. The result type is a pointer to void.

For the pointer cases (the last three), the result is a pointer to a type qualified by all the qualifiers ofthe types pointed to by the operands.

Assignment expressions

Assignment operators are: = *= /= %= += −= <<= >>= &= |= ^=• An expression of the form e1 op= e2 is equivalent to e1 = e1 op (e2) except that e1 is evaluated onlyonce.

The left operand:must be a modifiable lvalue.♦ must have arithmetic type, or, for += and −=, must be a pointer to an object type and the rightoperand must have integral type.

of an = operator, if the operand is a structure or union, must not have any member orsubmember qualified with const.

Result type is the type of the (unpromoted) left operand.•

Comma operator

e1 , e2

e1 is evaluated first, then e2.• The result has the type and value of e2 and is not an lvalue.•

Structure operators

su.mem

Indicates member mem of structure or union su.

sup −> mem

Indicates member mem of structure or union pointed to by sup. Equivalent to (*sup).mem.

C language compilers

Operators 83

Page 91: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

Associativity and precedence of operators

Operators Associativity

() [] −> \. left to right

! ~ ++ −− + − * &(type) sizeof

right to left

* / % left to right

+ − left to right

<< >> left to right

< <= > >= left to right

== != left to right

& left to right

^ left to right

| left to right

&& left to right

|| left to right

?: right to left

= += −= *= /= %= &=^= |= <<= >>=

right to left

, left to right

Unary +, −, and * have higher precedence than their binary versions.

Prefix ++ and −− have higher precedence than their postfix versions.

Constant expressions

A constant expression is evaluated during compilation (rather than at run time). As a result, a constantexpression may be used any place that a constant is required.

Constant expressions must not contain assignment, ++, −−, function−call, or comma operators, exceptwhen they appear within the operand of a sizeof operator.

Initialization

Scalars (all arithmetic types and pointers):

Scalar types with static or automatic storage duration are initialized with a single expression,optionally enclosed in braces. Example:

int i = 1;

Additionally, scalar types (with automatic storage duration only) may be initialized with anonconstant expression.

Unions:•

C language compilers

Associativity and precedence of operators 84

Page 92: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

An initializer for a union with static storage duration must be enclosed in braces, and initializes thefirst member in the declaration list of the union. The initializer must have a type that can be convertedto the type of the first union member. Example:

union { int i; float f;} u = {1}; /* initialize u.i */

For a union with automatic storage duration, if the initializer is enclosed in braces, it must consist ofconstant expressions that initialize the first member of the union. If the initializer is not enclosed inbraces, it must be an expression that has the matching union type.Structures:

The members of a structure may be initialized by initializers that can be converted to the type of thecorresponding member.

struct s { int i; char c; char *s;} st = { 3, 'a', "abc" };

This example illustrates initialization of all three members of the structure. If initialization values aremissing, as in

struct s st2 = {5};

then the first member is initialized (in this case, member i is initialized with a value of 5), and anyuninitialized member is initialized with 0 for arithmetic types and a null pointer constant for pointertypes.

For a structure with automatic storage duration, if the initializer is enclosed in braces, it must consistof constant expressions that initialize the respective members of the structure. If the initializer is notenclosed in braces, it must be an expression that has the matching structure type.

Arrays:

The number of initializers for an array must not exceed the dimension (the declared number ofelements), but there may be fewer initializers than the number of elements. When the number ofinitializers is less than the size of the array, the first array elements are initialized with the valuesgiven, until the supply of initializers is exhausted. Any remaining array elements are initialized withthe value 0 or a null pointer constant, as explained above in the discussion of structures. Example:

int ia[5] = { 1, 2 };

In this example, an array of five ints is declared, but only the first two members are initializedexplicitly. The first member, ia[0], is initialized with a value of 1; the second member, ia[1], isinitialized with a value of 2. The remaining members are initialized with a value of 0.

When no dimensions are given, the array is sized to hold exactly the number of initializers supplied.

A character array may be initialized with a string literal, as in:

C language compilers

Associativity and precedence of operators 85

Page 93: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

char ca[] = { "abc" }; /*curly braces are optional*/

where the size of the array is four (three characters with a null byte appended). The following:

char cb[3] = "abc";

is valid; however, in this case the null byte is discarded. But:

char cc[2] = "abc";

is erroneous because there are more initializers than the array can hold.

Arrays may be initialized similarly with wide characters:

wchar_t wc[] = L"abc";

Initializing subaggregates (for example, arrays of arrays) requires the proper placement of braces. Forexample,

int ia [4][2] ={ 1, 2, 3, 4};

initializes the first two rows of ia (ia[0][0], ia[0][1], ia[1][0], and ia[1][1]), and initializes the rest to0. This is a ``minimally bracketed'' initialization.

Note that a similar ``fully bracketed'' initialization yields a different result:

int ia [4][2] ={ {1}, {2}, {3}, {4},};

initializes the first column of ia (ia[0][0], ia[1][0], ia[2][0], and ia[3][0]), and initializes the rest to 0.

Mixing the fully and minimally bracketed styles may lead to unexpected results. Use one style or theother consistently.Designated initializers:

For struct, union and array objects, a designator can precede the initializer (in an initialization list),giving the subobject affected by that initializer. The designator is a list of .name and [constant] asappropriate to the object being initialized. An ``='' separates the designator and the initializer.

C language compilers

Associativity and precedence of operators 86

Page 94: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

Statements

Expression statement

expression;

The expression is executed for its side effects, if any (such as assignment or function call).

Compound statement

{declaration−list[opt]statement−list[opt]

}

Delimited by { and }.• May have a list of declarations.• May have a list of statements.• May be used wherever statement appears below.•

Selection statements

if

if (expression)statement

If expression evaluates to nonzero (true), statement is executed.• If expression evaluates to zero (false), control passes to the statement following statement.• The expression must have scalar type.•

else

if (expression1)statement1

else if (expression2)statement2

elsestatement3

If expression1 is true, statement1 is executed, and control passes to the statement followingstatement3. Otherwise, expression2 is evaluated.

If expression2 is true, statement2 is executed, and control passes to the statement followingstatement3. Otherwise, statement3 is executed, and control passes to the statement followingstatement3.

An else is associated with the lexically nearest if that has no else and that is at the same block level.•

C language compilers

Statements 87

Page 95: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

switch

switch (expression)statement

Control jumps to or past statement depending on the value of expression.• expression must have integral type.• Any optional case is labeled by an integral constant expression.• If a default case is present, it is executed if no other case match is found.• If no case matches, including default, control goes to the statement following statement.• If the code associated with a case is executed, control falls through to the next case unless a breakstatement is included.

Each case of a switch must have a unique constant value after conversion to the type of the controllingexpression.

In practice, statement is usually a compound statement with multiple cases, and possibly a default; thedescription above shows the minimum usage. In the following example, flag gets set to 1 if i is 1 or 3, and to0 otherwise:

switch (i) { case 1: case 3: flag = 1; break; default: flag = 0; }

Iteration statements

while

while (expression)statement

This sequence is followed repetitively:

expression is evaluated.• If expression is non−zero, statement is executed.• If expression is zero, statement is not executed, and the repetition stops.•

expression must have scalar type.

do−while

dostatement

while (expression);

C language compilers

Selection statements 88

Page 96: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

This sequence is followed repetitively:

statement is executed.• expression is evaluated.• If expression is zero, repetition stops.•

(do−while tests loop at the bottom; while tests loop at the top.)

for

for (clause1; expression2; expression3)statement

clause1 initializes the loop. It can be a declaration (with scope through to the end of the loop) or anexpression.

expression2 is tested before each iteration.• If expression2 is true:

statement is executed.♦ expression3 is evaluated.♦ Loop until expression2 is false (zero).♦

Any of expression1, expression2, or expression3 may be omitted, but not the semicolons.• expression1 and expression3 may have any type; expression2 must have scalar type.•

Jump statements

goto

goto identifier;

Goes unconditionally to statement labeled with identifier.• Statement is labeled with an identifier followed by a colon, as in:

A2: x = 5;

Useful to break out of nested control flow statements.• Can only jump within the current function.•

break

Terminates nearest enclosing switch, while, do, or for statement. Passes control to the statement followingthe terminated statement. Example:

for (i=0; i<n; i++) { if ((a[i] = b[i]) == 0) break; /* exit for */ }

C language compilers

Iteration statements 89

Page 97: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

continue

Goes to top of smallest enclosing while, do, or for statement, causing it to reevaluate the controllingexpression. A for loop's expression3 is evaluated before the controlling expression. Can be thought of as theopposite of the break statement. Example:

for (i=0; i<n; i++) { if (a[i] != 0) continue; a[i] = b[i]; k++; }

return

return; return expression;

return by itself exits a function.• return expression exits a function and returns the value of expression. For example,

return a + b;

Portability considerations

Certain parts of C are inherently machine dependent. The following list of potential trouble spots is not meantto be all−inclusive but to point out the main ones.

Purely hardware issues like word size and the properties of floating point arithmetic and integer division haveproven in practice to be not much of a problem. Other facets of the hardware are reflected in differingimplementations. Some of these, particularly sign extension (converting a negative character into a negativeinteger) and the order in which bytes are placed in a word, are nuisances that must be carefully watched. Mostof the others are only minor problems.

The number of variables declared with register that can actually be placed in registers varies from machine tomachine as does the set of valid types. Nonetheless, the compilers all do things properly for their ownmachine; excess or invalid register declarations are ignored.

The order of evaluation of function arguments is not specified by the language. The order in which sideeffects take place is also unspecified. For example, in the expression

a[i] = b[i++]

the value of i could be incremented after b[i] is fetched, but before a[i] is evaluated and assigned to, or itcould be incremented after the assignment.

The value of a multi−character character constant may be different for different machines.

C language compilers

Jump statements 90

Page 98: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

Fields are assigned to words, and characters to integers, right to left on some machines and left to right onother machines. These differences are invisible to isolated programs that do not indulge in type punning (forexample, by converting an int pointer to a char pointer and inspecting the pointed−to storage) but must beaccounted for when conforming to externally imposed storage layouts.

The lint tool is useful for finding program bugs and non−portable constructs. For information on how to uselint, see ``Analyzing your code with lint''.

C language compilers

Jump statements 91

Page 99: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

Complying with standard CThis topic describes techniques for writing new and upgrading existing C code to comply with the ANSI Clanguage specification.

NOTE: This discussion is taken from a series of articles, each covering a specific transition topic. Thesearticles were originally written for an in−house newsletter by David Prosser.

NOTE: For a description of the C++ dialect accepted by the compilation system, see ``C++ language''.

Mixing old and new style functions

ANSI C's most sweeping change to the language is the function prototype borrowed from the C++ language.By specifying for each function the number and types of its parameters, not only does every regular compileget the benefits of lint−like argument/parameter checks for each function call, but arguments areautomatically converted (just as with an assignment) to the type expected by the function. ANSI C includesrules that govern the mixing of old− and new−style function declarations since there are numerous lines ofexisting C code that could and should be converted to use prototypes.

Writing new code

When you write an entirely new program, you should use new−style function declarations (functionprototypes) in headers and new−style function declarations and definitions in other C source files. However, ifthere is a possibility that someone will port the code to a machine with a pre−ANSI C compiler, use the macro__STDC__ (which is defined only for ANSI C compilation systems) in both header and source files.

Because an ANSI C conforming compiler must issue a diagnostic whenever two incompatible declarations forthe same object or function are in the same scope, if all functions are declared and defined with prototypes(and the appropriate headers are included by the correct source files), all calls should agree with the definitionof the functions −− thus eliminating one of the most common C programming mistakes.

Updating existing code

If you have an existing application and want the benefits of function prototypes, there are several possibilitiesfor updating, depending on how much of the code you care to change:

Recompile without making any changes.

Even with no coding changes, the compiler will warn you about mismatches in parameter type andnumber when invoked with the −v option.

1.

Add function prototypes just to the headers.

All calls to global functions are covered.

2.

Add function prototypes to the headers and start each source file with function prototypes for its local(static) functions.

3.

Complying with standard C 92

Page 100: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

All calls to functions are covered, but this requires typing the interface for each local function twice inthe source file.Change all function declarations and definitions to use function prototypes.4.

For most programmers, choices 2 and 3 are probably the best cost/benefit compromise. Unfortunately, theseoptions are precisely the ones that require detailed knowledge of the rules for mixing of old and new styles.

Mixing considerations

In order for function prototype declarations to work with old−style function definitions, both must specifyfunctionally identical interfaces (or have ``compatible types'' using ANSI C's terminology).

For functions with varying arguments, there can be no mixing of ANSI C's ellipsis notation and the old−stylevarargs function definition. For functions with a fixed number of parameters, however, you can simplyspecify the types of the parameters as they were passed in previous implementations.

In pre−ANSI C compilers, each argument was converted just before it was passed to the called functionaccording to the default argument promotions. These specified that all integral types narrower than int werepromoted to int size, and any float argument was promoted to double. This simplified both the compiler andlibraries. Function prototypes are more expressive −− the specified parameter type is what is passed to thefunction. Thus, if a function prototype is written for an existing (old−style) function definition, there shouldbe no parameters in the function prototype with any of the following types:

char signed char unsigned char float short signed short unsigned short

There still remain two complications with writing prototypes: typedef names and the promotion rules fornarrow unsigned types.

If parameters in old−style functions were declared using typedef names such as off_t and ino_t, it isimportant to know whether or not the typedef name designates a type that is affected by the default argumentpromotions. For these two, off_t is a long , so it is appropriate to use in a function prototype, but ino_t used tobe an unsigned short, so if it were used in a prototype, the compiler would issue a diagnostic (possibly fatal)because the old−style definition and the prototype specify different and incompatible interfaces.

Just what should be used instead of an unsigned short leads us into the final complication. The one biggestincompatibility between pre−ANSI C compilers and ANSI C is the promotion rule for the widening ofunsigned char and unsigned short to an int−sized value. (See ``Promotions: unsigned vs. value preserving''.)

Unfortunately, the parameter type that matches such an old−style parameter depends on the compilation modeused when you compile: for −Xt, unsigned int should be used; −Xa and −Xc should use int. The bestapproach (even though it violates the spirit of choices 2 and 3 above) is to change the old−style definition tospecify either int or unsigned int and use the matching type in the function prototype (you can always assignits value to a local variable with the narrower type, if necessary, after you enter the function).

Examples

Appropriate use of __STDC__ produces a header file that can be used for both old and new compilers:

header.h:

Complying with standard C

Mixing considerations 93

Page 101: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

struct s { /* ... */ }; #ifdef __STDC__ void errmsg(int, ...); struct s *f(const char *); int g(void); #else voiderrmsg(); struct s *f(); int g(); #endif The following function uses prototypes and can still be compiled on anolder system:

struct s * #ifdef __STDC__ f(const char *p) #else f(p) char *p; #endif { /* ... */ }

The following is an updated source file (as with choice 3 above). The local function still uses an old styledefinition, but a prototype is included for newer compilers:

source.c:

#include <header.h> typedef /* ... */ MyType; #ifdef __STDC__ static void del(MyType *); /* ... */ #endifstatic void del(p) MyType *p; { /* ... */ } /* ... */

Functions with varying arguments

In previous implementations, you could not specify the parameter types that a function expected, but ANSI Callows you to use prototypes to do this. In order to support functions such as printf, the syntax for prototypesincludes a special ellipsis (...) terminator. Because an implementation might be required to do unusual thingsto handle a varying number of arguments, ANSI C requires that all declarations and the definition of such afunction include the ellipsis terminator.

Since there are no names for the ``...'' part of the parameters, a special set of macros contained in stdarg.hgives the function access to these arguments. Earlier versions of such functions had to use similar macroscontained in varargs.h.

Example

This example writes an error handler function called errmsg that returns void, and whose only fixedparameter is an int that specifies details about the error message. This parameter may be followed by a filename, or a line number, or both, and these are followed by printf−like format and arguments that specify thetext of the error message.

In order to allow this example to compile with earlier compilers, it makes extensive use of the macro__STDC__ which is defined only for ANSI C compilation systems. Thus the function's declaration (in theappropriate header file) would be:

#ifdef __STDC__ void errmsg(int code, ...); #else void errmsg(); #endif

The file that contains the definition of errmsg is where the old and new styles can get complex. First, theheader to include depends on the compilation system:

Complying with standard C

Functions with varying arguments 94

Page 102: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

#ifdef __STDC__ #include <stdarg.h> #else #include <varargs.h> #endif #include <stdio.h>

(stdio.h is included because it is needed to call fprintf and vfprintf later.)

Next comes the function's definition. The identifiers va_alist and va_dcl are part of the old−style varargs.hinterface.

void #ifdef __STDC__ errmsg(int code, ...) #else errmsg(va_alist) va_dcl /* note: no semicolon! */ #endif { /* more detail below */ }

Since the old−style variable argument mechanism did not allow specification of any fixed parameters (at leastnot officially), we need to arrange for them to be accessed before the varying portion. Also due to the lack of aname for the ``...'' part of the parameters, the new va_start macro has a second argument −− the name of theparameter that comes just before the ``...'' terminator.

NOTE: ANSI C, as an extension, allows functions to be declared and defined with no fixed parameters, as in:

int f(...);

For such functions, va_start should be invoked with an empty second argument, as in:

va_start(ap,)

The following is the body of the function:

{ va_list ap; char *fmt;

#ifdef __STDC__ va_start(ap, code); #else int code;

va_start(ap); /* extract the fixed argument */ code = va_arg(ap, int); #endif if (code & FILENAME)(void)fprintf(stderr, "\"%s\": ", va_arg(ap, char *)); if (code & LINENUMBER) (void)fprintf(stderr, "%d: ",va_arg(ap, int)); if (code & WARNING) (void)fputs("warning: ", stderr); fmt = va_arg(ap, char *);(void)vfprintf(stderr, fmt, ap); va_end(ap); } Both the va_arg and va_end macros work the same for theold−style and ANSI C versions. Because va_arg changes the value of ap, the call to vfprintf cannot be:

(void)vfprintf(stderr, va_arg(ap, char *), ap);

The definitions for the macros FILENAME, LINENUMBER, and WARNING are presumably contained inthe same header as the declaration of errmsg.

Complying with standard C

Functions with varying arguments 95

Page 103: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

A sample call to errmsg could be:

errmsg(FILENAME, "<command line>", "cannot open: %s\n", argv[optind]);

Promotions: unsigned vs. value preserving

The following information appears in the Rationale that accompanies the draft C Standard:

QUIET CHANGE

A program that depends on unsigned preserving

arithmetic conversions will behave differently,

probably without complaint. This is considered

to be the most serious change made by the Committee

to a widespread current practice.

This section explores how this change affects our code.

Background

According to The C Programming Language, Kernighan and Ritchie, (First Edition), unsigned specifiedexactly one type; there were no unsigned chars, unsigned shorts, or unsigned longs, but most C compilersadded these very soon thereafter. (Some compilers did not implement unsigned long but included the othertwo.) Naturally, implementations chose different rules for type promotions when these new types mixed withothers in expressions.

The UnixWare® C compiler and most other C compilers used the simpler rule−−``unsigned preserving''.When an unsigned type needs to be widened, it is widened to an unsigned type; when an unsigned type mixeswith a signed type, the result is an unsigned type.

The other rule, specified by ANSI C, came to be called ``value preserving'', in which the result type dependson the relative sizes of the operand types. When an unsigned char or unsigned short is ``widened,'' the resulttype is int if an int is large enough to represent all the values of the smaller type. Otherwise the result typewould be unsigned int. The ``value preserving'', rule produces the ``least surprise'' arithmetic result for mostexpressions.

Compilation behavior

Only in the transition (−Xt) mode will the ANSI C compiler use the unsigned preserving promotions; in theother two modes, conforming (−Xc) and ANSI (−Xa), the value preserving promotion rules will be used.Using cc−v will cause the compiler to warn you about each expression whose behavior might depend on thepromotion rules used.

This warning is not optional since this is a serious change in behavior. Fortunately, these situations do notoften occur, and it is always possible to suppress the warning by making the intended behavior explicit, as isshown below.

Complying with standard C

Promotions: unsigned vs. value preserving 96

Page 104: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

First example: using a cast

In the following code, assume that an unsigned char is smaller than an int.

int f(void) { int i = −2; unsigned char uc = 1;

return (i + uc) < 17; }

NOTE: The code above causes the compiler to issue the following warning when the −v option to cc isspecified:

line 6: warning: semantics of "<" change in ANSI C;use explicit cast

The result of the addition has type int (value preserving) or unsigned int (unsigned preserving), but the bitpattern does not change between these two. A two's−complement machine has:

i: 111...110 (−2) + uc: 000...001 ( 1) =================== 111...111 (−1 or UINT_MAX)

This bit representation corresponds to −1 for int and UINT_MAX for unsigned int. Thus, if the result hastype int, a signed comparison is used and the less−than test is true, if the result has type unsigned int, anunsigned comparison is used and the less−than test is false.

The addition of a cast serves to specify which of the two behaviors is desired:

value preserving:

(i + (int)uc) < 17

unsigned preserving:

(i + (unsigned int)uc) < 17

Because this expression can be viewed as ambiguous (since differing compilers chose different meanings forthe same code), the addition of a cast is as much to help the reader as it is to eliminate the warning message.

Bit−fields

The same situation applies to the promotion of bit−field values. In ANSI C, if the number of bits in an int orunsigned int bit−field is less than the number of bits in an int, the promoted type is int; otherwise thepromoted type is unsigned int. In most older C compilers, the promoted type is unsigned int for explicitlyunsigned bit−fields, and int otherwise.

On machines where plain bit−fields represent unsigned values, full−sized bit−fields (for example, 8 bit chars,16 bit shorts, 32 bit ints, longs, and enums) will be changed for the purpose of code generation to thecorresponding unsigned type. Any full−sized bitfield, on any machine, will be changed to a simple type.

Complying with standard C

First example: using a cast 97

Page 105: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

Similar use of casts can eliminate situations that are ambiguous.

Second example: result is the same

In the following code, assume that both unsigned short and unsigned char are narrower than int.

int f(void) { unsigned short us; unsigned char uc;

return uc < us; }

In this example, both automatics are either promoted to int or to unsigned int, so the comparison issometimes unsigned and sometimes signed. However, the result is the same for the two choices.

Integral constants

As with expressions, the rules for the types of certain integral constants have changed. Previously, anunsuffixed decimal constant had type int only if its value fit in an int and an unsuffixed octal or hexadecimalconstant had type int only if its value fit in an unsigned int. Otherwise, an integral constant had type long.(At times the value did not fit in the resulting type!) In ANSI C, the constant type is the first type encounteredin the list below that corresponds to the value:

unsuffixed decimal:int, long, unsigned long

unsuffixed octal or hexadecimal:int, unsigned int, long, unsigned long

U suffixed:unsigned int, unsigned long

L suffixed:long, unsigned long

UL suffixed:unsigned long

The old integral constant typing rules are used only in the transition mode; the ANSI and conforming modesuse the new rules.

Third example: integral constants

In the following code, assume ints are 16 bits.

int f(void) { int i = 0;

Complying with standard C

Second example: result is the same 98

Page 106: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

return i > 0xffff; } Because the hexadecimal constant's type is either int (with a value of −1 on atwo's−complement machine) or an unsigned int (with a value of 65535), the comparison will be true on anon−ANSI C compiler (−Xt mode), and false on an ANSI C compiler (−Xa and −Xc modes).

An appropriate cast clarifies the code:

non−ANSI C behavior:

i > (int)0xffff

ANSI C behavior:

i > (unsigned int)0xffff /* or */i > 0xffffU

(The U suffix character is a new feature of ANSI C and will probably produce an error message with oldercompilers.)

Tokenization and preprocessing

Probably the least specified part of previous versions of C concerned the operations that transformed eachsource file from a bunch of characters into a sequence of tokens, ready to parse. These operations includedrecognition of white space (including comments), bundling consecutive characters into tokens, handlingpreprocessing directive lines, and macro replacement. However, their respective ordering was neverguaranteed.

ANSI C translation phases

The order of these translation phases is specified by ANSI C:

Every ``trigraph'' sequence in the source file is replaced. ANSI C has exactly nine trigraph sequencesthat were invented solely as a concession to deficient character sets (as far as C is concerned) and arethree−character sequences that name a character not in the ISO 646−1983 character set:

??= # ??' ^??− ~ ??! |??( [ ??/ \??) ]??< {??> }

These sequences must be understood by ANSI C compilers, but they should not be used except(possibly) to obscure code. The ANSI C compiler warns you whenever it replaces a trigraph while intransition (−Xt) mode, even in comments. For example, consider the following:

/* comment *??//* still comment? */

The ??/ becomes a backslash. This character and the following newline are removed. The resultingcharacters are

/* comment */* still comment? */

1.

Complying with standard C

Tokenization and preprocessing 99

Page 107: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

The first / from the second line is the end of the comment. The next token is the *.Every backslash/new−line character pair is deleted.2. The source file is converted into preprocessing tokens and sequences of white space. Each comment iseffectively replaced by a space character.

3.

Every preprocessing directive is handled and all macro invocations are replaced. Each #includedsource file is run through the earlier phases before its contents replace the directive line.

4.

Every escape sequence (in character constants and string literals) is interpreted.5. Adjacent string literals are concatenated.6. Every preprocessing token is converted into a (regular) token; the compiler proper parses these andgenerates code.

7.

All external object and function references are resolved, resulting in the final program.8.

Old C translation phases

The previous pre−ANSI C UNIX® System C compilers did not follow such a simple sequence of phases, norwere there any guarantees for when these steps were applied. A separate preprocessor recognized tokens andwhite space at essentially the same time as it replaced macros and handled directive lines. The output was thencompletely retokenized by the compiler proper, which then parsed the language and generated code.

Because the tokenization process within the preprocessor was a moment−by−moment thing and macroreplacement was done as a character−based operation (and not token−based), the tokens and white spacecould have a great deal of variation during preprocessing.

There are several differences that arise from these two approaches. The rest of this section will discuss howcode behavior may change due to line splicing, macro replacement, ``stringizing,'' and token ``pasting,'' whichoccur during macro replacement.

Logical source lines

In pre−ANSI C compilers, backslash/new−line pairs were allowed only as a means to continue a directive, astring literal, or a character constant to the next line. ANSI C extended the notion so that a backslash/new−linepair can continue anything to the next line. (The result is a ``logical source line.'') Therefore, any code thatrelied on the separate recognition of tokens on either side of a backslash/new−line pair will not behave asexpected.

Macro replacement

The macro replacement process has never been described in any significant detail prior to ANSI C. Thisvagueness spawned a great many divergent implementations and any code that relied on anything fancier thanmanifest constant replacement and simple ?: −like macros was probably not truly portable. This tutorialcannot begin to uncover all the subtle and not so subtle differences between the old C macro replacementimplementation and the ANSI C version. Fortunately, nearly all uses of macro replacement with the exceptionof token pasting and stringizing will produce exactly the same series of tokens as before. Furthermore, theANSI C macro replacement algorithm can do things not possible in the old C version. For example,

#define name (*name)

causes any use of name to be replaced with an indirect reference through name. (The old C preprocessorwould produce a large amount of parentheses and stars and eventually complain about macro recursion.)

Complying with standard C

Old C translation phases 100

Page 108: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

The major change in the macro replacement approach taken by ANSI C is to require macro arguments (otherthan those that are operands of the macro substitution operators # and ## ) to be expanded recursively prior totheir substitution in the replacement token list. However, this change seldom produces an actual difference inthe resulting tokens.

Stringizing

NOTE: In ANSI C, the examples below marked with a ++ produce a warning about use of old features. Onlyin the transition mode (−Xt) will the result be the same as in previous versions of C.

In pre−ANSI C compilers, the following produced the string literal ``"x y!"'':

#define str(a) "a!" ++ str(x y)

Thus the preprocessor searched inside string literals (and character constants) for characters that looked likemacro parameters. ANSI C recognized the importance of this feature, but could not condone operations onparts of tokens. (In ANSI C, all invocations of the above macro produce the string literal ``"a!"''.) To achievethe old effect in ANSI C, we make use of the # macro substitution operator and the concatenation of stringliterals.

#define str(a) #a "!" str(x y)

The above produces the two string literals ``"x y"'' and ``"!"'' which, after concatenation, produces theidentical ``"x y!"''.

Unfortunately, there is no direct replacement for the analogous operation for character constants. The majoruse of this feature was similar to the following:

#define CNTL(ch) (037 & 'ch') ++ CNTL(L)

which produced

(037 & 'L')

which evaluates to the ASCII control−L character. The best solution is to change all uses of this macro (atleast this can be done automatically) to:

#define CNTL(ch) (037 & (ch)) CNTL('L')

which is arguably more readable and more useful, as it can also be applied to expressions.

Token pasting

In pre−ANSI C compilers, there were at least two ways to combine two tokens. Both invocations in thefollowing produced a single identifier x1 out of the two tokens x and 1.

#define self(a) a #define glue(a,b) a/**/b ++

Complying with standard C

Stringizing 101

Page 109: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

self(x)1 glue(x,1)

ANSI C could not sanction either approach to what they believed to be an important capability. In ANSI C,both the above invocations would produce the two separate tokens x and 1. The second of the above twomethods can be rewritten for ANSI C by using the ## macro substitution operator:

#define glue(a,b) a ## b glue(x, 1)

Since ## is an actual operator, the invocation can be much freer with respect to white space in both thedefinition and invocation.

There is no direct approach to effect the first of the two old−style pasting schemes, but since it put the burdenof the pasting at the invocation, it was used less frequently than the other form.

NOTE: # and ## should be used as macro substitution operators only when __STDC__ is defined.

Using const and volatile

The keyword const was one of the C++ features that is found in ANSI C. When an analogous keyword,volatile, was invented by the ANSI C committee, in effect, the ``type qualifier'' category was created. Thisstill remains one of the more nebulous parts of ANSI C.

Types for lvalues

const and volatile are part of an identifier's type, not its storage class. However, they are peculiar in that theyare often removed from the top−most part of the type. This occurs when an object's value is fetched in theevaluation of an expression −− exactly at the point when an lvalue becomes an rvalue. (These terms arise fromthe prototypical assignment ``"L=R"''; in which the left side must still refer directly to an object (an lvalue)and the right side need only be a value (an rvalue).) Therefore, only expressions that are lvalues can bequalified by const or volatile or both.

Type qualifiers in derived types

The type qualifiers are unique in that they may modify type names and derived types. Derived types are thoseparts of C's declarations that can be applied over and over to build more and more complex types: pointers,arrays, functions, structures, and unions. Except for functions, one or both type qualifiers can be used tochange the behavior of a derived type.

For example,

const int five = 5;

declares and initializes an object with type const int whose value will not be changed by a correct program.(The order of the keywords is not significant to C. For example, the declarations:

int const five = 5;

Complying with standard C

Using const and volatile 102

Page 110: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

and

const five = 5;

are identical to the above declaration in its effect.)

The declaration

const int *pci = &five ;

declares an object with type pointer to const int, which initially points to the previously declared object. Notethat the pointer itself does not have a qualified type −− it points to a qualified type, and as such, the pointercan be changed to point to essentially any int during the program's execution, but pci cannot be used tomodify the object to which it points unless a cast is used, as in the following:

*(int *)pci = 17;

(If pci actually points to a const object, the behavior of this code is undefined.)

The declaration

extern int *const cpi;

says that somewhere in the program there exists a definition of a global object with type const pointer to int.In this case, cpi's value will not be changed by a correct program, but it can be used to modify the object towhich it points. Notice that const comes after the * in the above declaration. The following pair ofdeclarations produces the same effect:

typedef int *INT_PTR; extern const INT_PTR cpi;

These can be combined as in the following declaration in which an object is declared to have type constpointer to const int:

const int *const cpci;

Using const to read character values

In hindsight, readonly would have been a better choice for a keyword than const. If one reads const in thismanner, declarations such as

char *strcpy(char *, const char *);

are easily understood to mean that the second parameter will only be used to read character values, while thefirst parameter will undoubtedly overwrite the characters to which it points. Furthermore, despite the fact thatin the above example the type of cpi is a pointer to a const int, you can still change the value of the object towhich it points through some other means (unless it actually points to an object declared with const int type).

Complying with standard C

Using const to read character values 103

Page 111: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

Examples of const usage

The two main uses for const are to declare (large) compile−time initialized tables of information asunchanging, and to specify that pointer parameters will not modify the objects to which they point.

The first use potentially allows portions of the data for a program to be shared by other concurrent invocationsof the same program (just as the code for the program can be), and may cause attempts to modify thispresumably invariant data to be detected immediately by means of some sort of memory protection fault, asthe data resides in a read−only portion of memory.

The second use will most likely help locate potential errors (before generating a memory fault during thatcritical demo). For example, functions that temporarily place a null character into the middle of a string willbe detected at compile time, if passed a pointer to a string that cannot be so modified.

When to use volatile

The previous examples have all used const because it's conceptually simpler. However, when using exactsemantics is necessary, volatile should be used. To the programmer, volatile has multiple meanings. To acompiler writer it means to take no code generation shortcuts when accessing such an object. Moreover inANSI C, it is a programmer's responsibility to declare every object that has the appropriate special propertieswith a volatile−qualified type.

Examples of volatile usage

The usual four examples of volatile objects are:

an object that is a memory−mapped I/O port1. an object that is shared between multiple concurrent processes2. an object that is modified by an asynchronous signal handler3. an automatic storage duration object declared in a function that calls setjmp and whose value ischanged between the call to setjmp and a corresponding call to longjmp

4.

The first three examples are all instances of an object with a particular behavior whose value can be modifiedat any point during the execution of the program. Thus, the seemingly infinite loop

flag = 1; while (flag) ;

is completely reasonable as long as flag has a volatile−qualified type. (Presumably, some asynchronous eventwill set flag to zero in the future.) Otherwise, the compilation system is free to change the above loop(because the value of flag is unchanged within the body of the loop) into a truly infinite loop that completelyignores the value of flag.

The fourth example, involving variables local to functions that call setjmp, is more involved. If you read thefine print about the behavior of setjmp and longjmp, you will find there are no guarantees about the valuesfor objects matching the fourth case. It turns out to be necessary for longjmp to examine every stack framebetween the function calling setjmp and the function calling longjmp for saved register values in order to getthe most desirable behavior. The possibility of asynchronously created stack frames makes this expensive jobeven harder. Therefore most implementations just documented the undesirable side effect and used aninexpensive implementation.

Complying with standard C

Using const to read character values 104

Page 112: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

When an automatic object is declared with a volatile−qualified type, the compilation system knows that it hasto produce code that exactly matches what the programmer wrote. Therefore, the most recent value for suchan automatic object will always be in memory (not just in a register) and as such will be guaranteed to beup−to−date when longjmp is called.

Multibyte characters and wide characters

At first, the ``internationalization'' of ANSI C affected only library functions. However the final stage ofinternationalization −− multibyte characters and wide characters −− also affected the language proper.

``Asianization'' means multibyte characters

The basic difficulty in an Asian environment is the huge number of ideograms needed for I/O. To work withinthe constraints of usual computer architectures, these ideograms are encoded as sequences of bytes. Theassociated operating systems, application programs, and terminals understand these byte sequences asindividual ideograms. Moreover, all of these encodings allow intermixing of regular single−byte characterswith the ideogram byte sequences. Just how difficult it is to recognize distinct ideograms depends on theencoding scheme used.

The term ``multibyte character'' is defined by ANSI C to denote a byte sequence that encodes an ideogram, nomatter what encoding scheme is employed. All multibyte characters are members of the so−called ``extendedcharacter set.'' (A regular single−byte character is just a special case of a multibyte character.) Essentially theonly requirement placed on the encoding is that no multibyte character can use a null character as part of itsencoding.

ANSI C specifies that program comments, string literals, character constants, and header names are allsequences of multibyte characters.

Encoding variations

The encoding schemes come in two variations:

Where each multibyte character is self−identifying, therefore, any multibyte character can simply beinserted between any pair of multibyte characters. (The encoding used by the ANSI C compiler is oneof these types; each byte of a non−single−byte character has the high−order bit set.)

1.

Where the presence of special ``shift bytes'' changes the interpretation of subsequent bytes. Anexample is the method used by most fancy character terminals to get in and out of line drawing mode.For programs written in multibyte characters with a shift−state−dependent encoding, ANSI C has theadditional requirement that each comment, string literal, character constant, and header name mustboth begin and end in the unshifted state.

2.

Wide characters

Some of the inconvenience of handling multibyte characters would be eliminated if all characters were of auniform number of bytes or bits. Since there can be thousands or tens of thousands of ideograms in such acharacter set, a 16−bit or 32−bit sized integral value should be used to hold all members.

NOTE: The full Chinese alphabet includes more than 65000 ideograms.

Complying with standard C

Multibyte characters and wide characters 105

Page 113: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

ANSI C includes the typedef name wchar_t as the implementation−defined integral type large enough tohold all members of the extended character set.

For each wide character there is a corresponding multibyte character and vice versa; the wide character thatcorresponds to a regular single−byte character is required to have the same value as its single−byte value,including the null character. However, there is no guarantee that the value of the macro EOF can be stored ina wchar_t. (Just as EOF might not be representable as a char.)

Conversion functions

ANSI C provides five library functions that manage multibyte characters and wide characters:

mblen length of next multibyte charactermbtowc convert multibyte character to wide characterwctomb convert wide character to multibyte charactermbstowcs convert multibyte character string to wide character stringwcstombs convert wide character string to multibyte character string

The behavior of all of these functions depends on the current locale.

NOTE: See the setlocale function in ``Internationalization''.

It is expected that vendors providing compilation systems targeted to this market will supply many morestring−like functions to simplify the handling of wide character strings. However, for most applicationprograms, there is no need to convert any multibyte characters to or from wide characters. Programs such asdiff, for example, will read in and write out multibyte characters, needing only to check for an exactbyte−for−byte match. More complicated programs (such as grep) that use regular expression patternmatching, may need to understand multibyte characters, but only the common set of functions that managesthe regular expression needs this knowledge. The program grep itself requires no other special multibytecharacter handling.

C language features

To give even more flexibility to the programmer in an Asian environment, ANSI C provides wide characterconstants and wide string literals. These have the same form as their non−wide versions except that they areimmediately prefixed by the letter L:

´x´ regular character constant

´¥´ regular character constant

L´x´ wide character constant

L´¥´ wide character constant

"abc¥xyz" regular string literal

L"abc¥xyz" wide string literal

Notice that multibyte characters are valid in both the regular and wide versions. The sequence of bytesnecessary to produce the ideogram ¥ is encoding−specific, but if it consists of more than one byte, the value ofthe character constant '¥' is implementation defined, just as the value of 'ab' is implementation defined. Aregular string literal contains exactly the bytes (except for escape sequences) specified between the quotes,

Complying with standard C

Conversion functions 106

Page 114: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

including the bytes of each specified multibyte character.

When the compilation system encounters a wide character constant or wide string literal, each multibytecharacter is converted (as if by calling the mbtowc function) into a wide character. Thus the type of L'¥' iswchar_t and the type of L"abc¥xyz" is array of wchar_t with length eight. (Just as with regular stringliterals, each wide string literal has an extra zero−valued element appended, but in these cases it is a wchar_twith value zero.)

Just as regular string literals can be used as a short−hand method for character array initialization, wide stringliterals can be used to initialize wchar_t arrays:

wchar_t *wp = L"a¥z"; wchar_t x[] = L"a¥z"; wchar_t y[] = {L'a', L'¥', L'z', 0}; wchar_t z[] = {'a', L'¥', 'z', '\0'};

In the above example, the three arrays x, y, and z, and the array pointed to by wp, have the same length and allare initialized with identical values.

Finally, adjacent wide string literals will be concatenated, just as with regular string literals. However,adjacent regular and wide string literals produce undefined behavior. A compiler is not even required tocomplain if it does not accept such concatenations.

Standard headers and reserved names

Very early in the standardization process, the ANSI Standards Committee chose to include library functions,macros, and header files as part of ANSI C. These macros, header files, and functions are necessary to writetruly portable C programs, however, they also produce a large set of reserved names.

This section presents the various categories of reserved names and some rationale for their reservations. At theend is a set of rules to follow that can steer your programs clear of any reserved names.

Balancing process

In order to match existing implementations, the ANSI C committee had to choose names like printf andNULL; to have done otherwise would have disqualified virtually all existing C programs from conformance,and would obviously run counter to their charter to standardize existing practice. However, each such namereduced the set of names available for free use in C programs.

On the other hand, before standardization, implementors felt free to add both new keywords to their compilersand names to headers. This meant that no program could be guaranteed to compile from one release toanother, let alone port from one vendor's implementation to another.

Thus the Committee made a hard decision: to restrict all conforming implementations from including anyextra names, except those with certain forms. It is this decision, more than any other, that will cause most Ccompilation systems to be almost conforming. Nevertheless, the Standard contains 32 keywords and almost250 names in its headers, none of which necessarily follow any particular naming pattern.

Complying with standard C

Standard headers and reserved names 107

Page 115: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

Standard headers

The standard headers are as follows:

assert.h locale.h stddef.h ctype.h math.h stdio.h errno.h setjmp.h stdlib.h float.h signal.h string.h limits.h stdarg.h time.h

Most implementations will provide more headers, but a strictly conforming ANSI C program can only usethese.

Other standards disagree slightly regarding the contents of some of these headers. For example, POSIX (IEEE1003.1) specifies that fdopen is declared in stdio.h. To allow these two standards to coexist, POSIX requiresthe macro _POSIX_SOURCE to be #defined prior to the inclusion of any header to guarantee that theseadditional names exist. (In actuality, the POSIX committee believes that almost the opposite will occur: that_POSIX_SOURCE will be used to limit headers to only those names POSIX describes, and that, by default,headers will contain even more names than POSIX specifies.) X/Open, in its Portability Guide, has also usedthis macro scheme for its extensions. X/Open's macro is _XOPEN_SOURCE. The following sectiondescribes why this scheme is sufficient.

ANSI C requires the standard headers to be both self−sufficient and idempotent. No standard header needsany other header to be #included before or after it, and each standard header can be #included more than oncewithout causing problems. The Standard also requires that its headers be #included only in safe contexts sothat the names used in the headers are guaranteed to remain unchanged.

Names reserved for implementation use

The Standard places further restrictions on implementations regarding their libraries. While in the past, mostprogrammers learned not to use names like read and write for their own functions on UNIX® Systems(usually after encountering interesting program behavior), ANSI C requires that only names reserved by theStandard be introduced by references within the implementation.

Thus the Standard reserves a subset of all possible names for implementations to use as they so choose. Thisclass of names consists of identifiers that begin with an underscore and continue with either anotherunderscore or a capital letter. The class of names contains all names matching the following regularexpression:

_[_A−Z][0−9_a−zA−Z]*

Strictly speaking, if your program uses such an identifier, its behavior is undefined. Thus, programs using_POSIX_SOURCE (or _XOPEN_SOURCE) have undefined behavior.

However, undefined behavior comes in different degrees. If, in a POSIX−conforming implementation you use_POSIX_SOURCE, you know that your program's ``undefined behavior'' consists of certain additionalnames in certain headers, and your program still conforms to an accepted standard. This deliberate loophole inthe ANSI C standard allows implementations to conform to seemingly incompatible specifications. On theother hand, an implementation that does not conform to the POSIX standard is free to behave in any mannerwhen encountering a name such as _POSIX_SOURCE.

Complying with standard C

Standard headers 108

Page 116: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

The Standard also reserves all other names that begin with an underscore for use in header files as regular filescope identifiers and as tags for structures and unions, but not in local scopes. This means that the commonexisting practice of having functions named _filbuf and _doprnt to implement hidden parts of the library issanctioned.

Names reserved for expansion

In addition to all the names explicitly reserved, the Standard also reserves (for implementations and futurestandards) names matching certain patterns:

errno.h E[0−9A−Z].* ctype.h (to|is)[a−z].* locale.h LC_[A−Z].* math.h current function names[fl] signal.h (SIG|SIG_)[A−Z].* stdlib.h str[a−z].* string.h (str|mem|wcs)[a−z].*

In the above lists, names that begin with a capital letter are macros and are thus reserved only when theassociated header is included. The rest of the names designate functions and therefore cannot be used to nameany global objects or functions.

Names safe to use

As you can tell by now, the rules regarding when certain names are reserved are complicated. There are,however, four fairly simple rules you can follow to keep from colliding with any ANSI C reserved names:

#include all system headers at the top of your source files (except possibly after a #define of_POSIX_SOURCE or _XOPEN_SOURCE, or both).

1.

Do not define or declare any names that begin with an underscore.2. Use an underscore or a capital letter somewhere within the first few characters of all file scope tagsand regular names. (But beware of the ``va_'' prefix found in stdarg.h or varargs.h.)

3.

Use a digit or a non−capital letter somewhere within the first few characters of all macro names. (Butnote that almost all names beginning with an E are reserved if errno.h is #included.)

4.

As noted earlier, most implementations will continue to add names to the standard headers by default.Therefore these rules are just a guideline to follow.

Internationalization

A previous section introduced the ``internationalization'' of the standard libraries. (See ``Multibyte Charactersand Wide Characters.'') This section discusses the affected library functions and gives some hints on howprograms should be written to take advantage of these features.

Locales

At any time, a C program has a current ``locale'' −− a collection of information that describes the conventionsappropriate to some nationality, culture, and language. Locales have names that are strings and the only twostandardized locale names are ``"C"'' and ``""''. Each program begins in the ``"C"'' locale which unsurprisinglycauses all library functions to behave just like they have historically. The ``""'' locale is the implementation'sbest guess at the correct set of conventions appropriate to the program's invocation. (Of course ``"C"'' and

Complying with standard C

Names reserved for expansion 109

Page 117: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

``""'' can cause identical behavior.) Other locales may be provided by implementations.

For the purposes of practicality and expediency, locales are partitioned into a set of categories. A program canchange the complete locale (all categories) or one or more categories leaving the other categories unchanged.Generally each category affects a set of functions disjoint from the functions affected by other categories, sotemporarily changing one category for a little while can make sense.

The setlocale function

The setlocale function is the interface to the program's locale. In general, any program that requires theinvocation country's conventions should place a call such as

#include <locale.h> /*...*/ setlocale(LC_ALL, "");

early in the program's execution path. This causes the program's current locale to change to the appropriatelocal version (if possible), since LC_ALL is the macro that specifies the entire locale instead of one category.The following are the standard categories:

LC_COLLATEsorting information

LC_CTYPEcharacter classification information

LC_MONETARYcurrency printing information

LC_NUMERICnumeric printing information

LC_TIMEdate and time printing information

Any of these macros can be passed as the first argument to setlocale to specify just that category.

The setlocale function returns the name of the current locale for a given category (or LC_ALL) and serves inan inquiry−only capacity when its second argument is a null pointer. Thus, code along the lines of thefollowing can be used to change the locale or a portion thereof for a limited duration:

#include <locale.h> /*...*/ char *oloc; /*...*/ oloc = setlocale(LC_cat, NULL); if (setlocale(LC_cat, "new") != 0) { /* use temporarily changed locale */ (void)setlocale(LC_cat, oloc); }

Complying with standard C

The setlocale function 110

Page 118: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

Most programs will never need this capability.

Changed functions

Wherever possible and appropriate, existing library functions were extended to include locale−dependentbehavior. These functions came in two groups: those declared by the ctype.h header (character classificationand conversion), and those functions that convert to and from printable and internal forms of numeric values(for example, printf and strtod).

All ctype.h predicate functions except isdigit and isxdigit are allowed to return nonzero (true) for additionalcharacters when the LC_CTYPE category of the current locale is other than ``"C"''. In a Spanish locale,isalpha('ñ') should be true. Similarly the character conversion functions (tolower and toupper) shouldappropriately handle any extra alphabetic characters identified by the isalpha function. (As an implementationnote, the ctype.h functions are almost always macros that are implemented using table lookups indexed by thecharacter argument. Their behavior is changed by resetting the table(s) to the new locale's values, andtherefore there is no performance impact.)

Those functions that write or interpret printable floating values may change to use a decimal−point characterother than period (.) when the LC_NUMERIC category of the current locale is other than ``"C"''. There is noprovision for converting any numeric values to printable form with thousands separator−type characters, butwhen converting from a printable form to an internal form, implementations are allowed to accept suchadditional forms, again in other than the ``"C"'' locale. Those functions that make use of the decimal−pointcharacter are the printf and scanf families, atof, and strtod. Those functions that are allowedimplementation−defined extensions are atof, atoi, atol, strtod, strtol, strtoul, and the scanf family. (ANSI Ccurrently defines no such extensions.)

New functions

Certain locale−dependent capabilities were added as new standard functions. Besides setlocale which allowscontrol over the locale itself, the Standard includes the following new functions:

localeconv numeric/monetary conventionsstrcoll collation order of two stringsstrxfrm translate string for collationstrftime formatted date/time conversion

and the multibyte functions previously discussed (mblen, mbtowc, mbstowcs, wctomb, and wcstombs).

The localeconv function returns a pointer to a structure containing information useful for formatting numericand monetary information appropriate to the current locale's LC_NUMERIC and LC_MONETARYcategories. (This is the only function whose behavior depends on more than one category.) For numeric valuesthe structure describes the decimal−point character, the thousands separator, and where the separator(s) shouldbe located. There are fifteen other structure members that describe how to format a monetary value!

The strcoll function is analogous to the strcmp function except that the two strings are compared according tothe LC_COLLATE category of the current locale. As this comparison is not necessarily as inexpensive asstrcmp, the strxfrm function can be used to transform a string into another, such that any two suchafter−translation strings can be passed to strcmp and get an ordering analogous to what strcoll would havereturned if passed the two pre−translation strings.

Complying with standard C

Changed functions 111

Page 119: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

The strftime function provides a sprintf−like formatting of the values in a struct tm, along with some dateand time representations that depend on the LC_TIME category of the current locale. This function is basedon the ascftime function released as part of UNIX® System V Release 3.2.

Grouping and evaluation in expressions

One of the choices made by Dennis Ritchie in the design of C was to give compilers license to rearrangeexpressions involving adjacent operators that are mathematically commutative and associative, even in thepresence of parentheses. This was explicitly noted in the "Reference Manual Appendix" of The CProgramming Language, Kernighan and Ritchie,(First Edition). ANSI C does not grant compilers this samefreedom.

This section discusses the differences between these two definitions of C and clarifies the distinctions betweenan expression's side effects, grouping, and evaluation by considering the expression statement from thefollowing code fragment.

int i, *p, f(void), g(void); /*...*/ i = *++p + f() + g();

Definitions

The side effects of an expression are its modifications to memory and its accesses to volatile−qualifiedobjects. The side effects in the above expression are the updating of i and p and any side effects containedwithin the functions f and g.

An expression's ``grouping'' is the way values are combined with other values and operators. The aboveexpression's grouping is, primarily, the order in which the additions are performed.

An expression's ``evaluation'' includes everything necessary to produce its resulting value. To evaluate anexpression, all specified side effects must occur (anywhere between the previous and next sequence point) andthe specified operations are performed with a particular grouping. For the above expression, the updating of iand p must occur after the previous statement and by the ``;'' of this expression statement; the calls to thefunctions can occur in either order, any time after the previous statement, but before their return values areused. In particular, note that the operators that cause memory to be updated have no requirement to assign thenew value before the value of the operation is used.

The Kernighan and Ritchie C rearrangement license

The Kernighan and Ritchie C rearrangement license applies to the above expression because addition ismathematically commutative and associative. To distinguish between regular parentheses and the actualgrouping of an expression, the left and right curly braces will designate grouping. The three possiblegroupings for the expression are

i = { {*++p + f()} + g() }; i = { *++p + {f() + g()} }; i = { {*++p + g()} + f() };

all of which are valid given Kernighan and Ritchie C rules. Moreover, all of these groupings are valid even ifthe expression were written instead, for example, in either of these ways:

Complying with standard C

Grouping and evaluation in expressions 112

Page 120: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

i = *++p + (f() + g()); i = (g() + *++p) + f();

If this expression is evaluated on an architecture for which either overflows cause an exception or additionand subtraction are not inverses across an overflow, these three groupings will behave differently if one of theadditions overflows.

For such expressions on these architectures, the only recourse available in Kernighan and Ritchie C was tosplit the expression to force a particular grouping. The following are possible rewrites that respectivelyenforce the above three groupings.

i = *++p; i += f(); i += g(); i = f(); i += g(); i += *++p; i = *++p; i += g(); i += f();

The ANSI C rules

ANSI C does not allow operations to be rearranged that are mathematically commutative and associative, butthat are not actually so on the target architecture. Thus the precedence and associativity of the ANSI Cgrammar completely describes the grouping for all expressions; all expressions must be grouped as they areparsed. The expression under consideration is grouped in this manner:

i = { {*++p + f()} + g() };

(This still does not mean that f must be called before g, or that p must be incremented before g is called.) InANSI C, expressions need not be split to guard against unintended overflows.

Parentheses grouping and evaluation

ANSI C is often erroneously described as honoring parentheses or evaluating according to parentheses due toan incomplete understanding or an inaccurate presentation.

Since ANSI C expressions simply have the grouping specified by their parsing, parentheses still only serve asa way of controlling how an expression is parsed; the natural precedence and associativity of expressionscarry exactly the same weight as parentheses.

The above expression could have been written as

i = (((*(++p)) + f()) + g());

with no different effect on its grouping or evaluation.

The ``As If'' rule

There were good reasons for the Kernighan and Ritchie C rearrangement rules:

The rearrangements provide many more opportunities for optimizations such as compile−timeconstant folding.

The rearrangements do not change the result of integral−typed expressions on most machines.• Some of the operations are both mathematically and computationally commutative and associative onall machines.

Complying with standard C

The ANSI C rules 113

Page 121: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

The ANSI C committee eventually became convinced that the rearrangement rules were intended to be aninstance of the ``as if'' rule when applied to the described target architectures. ANSI C's ``as if'' rule is ageneral license that permits an implementation to deviate arbitrarily from the abstract machine description aslong as the deviations do not change the behavior of a valid C program.

Thus all the binary bitwise operators (other than shifting) are allowed to be rearranged on any machinebecause there is simply no way to notice such regroupings. On typical two's complement machines in whichoverflow silently wraps−around, integer expressions involving multiplication or addition can be rearrangedfor the same reason.

Therefore, this change in C does not have a significant impact on most C programmers.

Incomplete types

The ANSI C standard introduced the term ``incomplete type'' to formalize a fundamental, yet misunderstood,portion of C, implicit from its beginnings. This article describes incomplete types, where they are permitted,and why they are useful.

Types

ANSI separates C's types into three distinct sets: function, object, and incomplete. Function types are obvious;object types cover everything else, except when the size of the object is not known. The Standard uses theterm ``object type'' to specify that the designated object must have a known size, but it is important to knowthat incomplete types other than void also refer to an object.

There are only three variations of incomplete types: void, arrays of unspecified length, and structures andunions with unspecified content. The type void differs from the other two in that it is an incomplete type thatcannot be completed, and it serves as a special function return and parameter type.

Completing incomplete types

An array type is completed by specifying the array size in a following declaration in the same scope thatdenotes the same object. (Also, when an array without a size is declared and initialized in the samedeclaration, the array has an incomplete type only between the end of its declarator and the end of itsinitializer.)

An incomplete structure or union type is completed by specifying the content in a following declaration in thesame scope for the same tag.

Declarations

Certain declarations can use incomplete types, but others require (complete) object types. Those declarationsthat require object types are array elements, members of structures or unions, and objects local to a function.All other declarations permit incomplete types. In particular, the following are permitted:

pointers to incomplete types• functions returning incomplete types• incomplete function parameter types• typedef names for incomplete types•

Complying with standard C

Incomplete types 114

Page 122: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

The function return and parameter types are special. Except for void, an incomplete type used in such amanner must be completed by the time the function is defined or called. (A return type of void specifies afunction that returns no value and a single parameter type of void specifies a function that accepts noarguments.)

Note that since array and function parameter types are rewritten to be pointer types, a seemingly incompletearray parameter type is not actually incomplete. The typical declaration of main's argv (namely, char*argv[]) as an unspecified length array of character pointers, is rewritten to be a pointer to character pointers.

Expressions

Most expression operators require (complete) object types. The only three exceptions are the unary &operator, the first operand of the comma operator, and the second and third operands of the ?: operator. Mostoperators that accept pointer operands also permit pointers to incomplete types, unless pointer arithmetic isrequired. The list includes the unary * operator, even though some may find this surprising. For example,given:

void *p

&*p is a valid subexpression that makes use of this.

Justification

C would have been simpler without incomplete types. However, they are necessary for void, and there is onefeature provided by incomplete types that C has no other way to handle, and that has to do with forwardreferences to structures and unions. If one has two structures that need pointers to each other, the only way todo so (without resorting to potentially invalid casts) is with incomplete types:

struct a { struct b *bp; }; struct b { struct a *ap; };

All strongly typed programming languages that have some form of pointer and heterogeneous data typesprovide some method of handling this case.

Examples

Defining typedef names for incomplete structure and union types is frequently quite useful. If one has acomplicated bunch of data structures that contain many pointers to each other, having a list of typedefs to thestructures up front (possibly in a central header) can simplify the declarations.

typedef struct item_tag Item; typedef union note_tag Note; typedef struct list_tag List; . . . struct item_tag { . . . }; . . . struct list_tag { List *next; . . . };

Moreover, for those structures and unions whose contents should not be available to the rest of the program, aheader can declare the tag without the content. Other parts of the program can use pointers to the incomplete

Complying with standard C

Expressions 115

Page 123: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

structure or union without any problems (unless they attempt to use any of its members).

A frequently used incomplete type is an external array of unspecified length. It generally is not necessary toknow the extent of an array to make use of its contents.

extern char *tzname[]; . . . (void)fputs("Alternate time zone: ", stdout);

(void)puts(tzname[1]);

Compatible and composite types

With Kernighan and Ritchie C, and even more so with ANSI C, it is possible for two declarations that refer tothe same entity to be other than identical. The term ``compatible type'' is used in ANSI C to denote those typesthat are ``close enough.'' This section describes compatible types as well as ``composite types'' −− the result ofcombining two compatible types.

Multiple declarations

If a C program were only allowed to declare each object or function once, there would be no need forcompatible types. But linkage (which allows two or more declarations to refer to the same entity), functionprototypes, and separate compilation all need such a capability. Not too surprisingly, separate translation units(source files) have different rules for type compatibility than within a single translation unit.

Separate compilation compatibility

Since each compilation probably looks at different source files, most of the rules for compatible types acrossseparate compiles are structural in nature:

Matching scalar (integral, floating, and pointer) types must be compatible, as if they were in the samesource file.

Matching structures, unions, and enums must have the same number of members and each matchingmember must have a compatible type (in the separate compilation sense), including bit−field widths.

Matching structures must have the members in the same order. (The order of union and enummembers does not matter.)

Matching enum members must have the same value.•

An additional requirement is that the names of members (including the lack of names for unnamed members)match for structures, unions, and enums, but not necessarily their respective tags.

Single compilation compatibility

When two declarations in the same scope describe the same object or function, the two declarations mustspecify compatible types. These two types are then combined into a single composite type that is compatiblewith the first two. More about composite types later.

The compatible types are defined recursively. At the bottom are type specifier keywords. (These are the rulesthat say that unsigned short is the same as unsigned short int, and that a type without type specifiers is the

Complying with standard C

Compatible and composite types 116

Page 124: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

same as one with int.) All other types are compatible only if the types from which they are derived arecompatible. For example, two qualified types are compatible if the qualifiers (const and volatile) are identicaland the unqualified base types are compatible.

Compatible pointer types

For two pointer types to be compatible, the types they point to must be compatible and the two pointers mustbe identically qualified. Recall that the qualifiers for a pointer are specified after the *, so that these twodeclarations

int *const cpi; int *volatile vpi;

declare two differently qualified pointers to the same type, int.

Compatible array types

For two array types to be compatible, their element types must be compatible, and, if both array types have aspecified size, they must match. This last part means that an incomplete array type (see ``Incomplete Types'')is compatible both with another incomplete array type and an array type with a specified size.

Compatible function types

For two function types to be compatible, their return types must be compatible. If either or both function typeshave prototypes, the rules get more complicated.

For two function types with prototypes to be compatible, they also must have the same number of parameters(including use of the ellipsis (``...'') notation) and the corresponding parameters must beparameter−compatible.

For an old style function definition to be compatible with a function type with a prototype, the prototypeparameters must not end with an ellipsis (``...'') and each of the prototype parameters must beparameter−compatible with the corresponding old style parameter, after application of the default argumentpromotions.

For an old style function declaration (not a definition) to be compatible with a function type with a prototype,the prototype parameters must not end with an ellipsis (``...'') and all of the prototype parameters must havetypes that would be unaffected by the default argument promotions.

For two types to be parameter−compatible, the types must be compatible after the (top level) qualifiers, if any,have been removed, and after a function or array type has been converted to the appropriate pointer type.

Special cases

There are a few surprises in this area. For example, signed int behaves the same as int except possibly forbit−fields, in which a plain int may denote an unsigned−behaving quantity.

Another interesting note is that each enumeration type must be compatible with some integral type. Forportable programs this means that enumeration types effectively are separate types, and, for the most part, theANSI C standard views them in that manner.

Complying with standard C

Compatible pointer types 117

Page 125: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

Composite type

The construction of a composite type from two compatible types is also recursively defined. The wayscompatible types can differ from each other are due either to incomplete arrays or to old style function types.As such, the simplest description of the composite type is that it is the type compatible with both of theoriginal types, including every available array size and every available parameter list from the original types.

Complying with standard C

Composite type 118

Page 126: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

C++ languageThe international standard for the C++ language and standard library is ISO/IEC 14882. This includes ANSIand all other member national body standards organizations.

This C++ compiler implements almost all of this standard.

This topic discusses the C++ ``dialect'' accepted by the compiler, and describes the compilation options whichallow you to control the interpretation of the source code.

Compilation modes

The C++ compilation system has four compilation modes that correspond to degrees of compliance with theproposed standard and with cfront. You can specify which of these compilation modes the compiler shoulduse to interpret your code by using the −X option.

−X str controls the interpretation of the C++ source code with respect to language dialect. The optionargument str can be one of the following:

dCompile the default dialect of the language. This implements almost all of the ISO C++ standard, butwith less strict checking than in the next two options. See ``C++ dialect accepted'' below for a moredetailed description of this dialect. This option is the default.

wEnable strict ISO conformance mode. This mode issues warnings when features not in the ISOstandard are used, and disables features that conflict with the standard.

eSame as −Xw except that errors are issued instead of warnings.

oEnable old cfront transition mode. This causes the compiler to accept language constructs andanachronisms that, while not part of the C++ language definition, are accepted by the cfront C++Language System releases 2.1 or 3.0.x. Use of these constructs and anachronisms is discouragedunless they occur in existing code that is difficult to change.

C++ dialect accepted

Depending on the −X option specified, the C++ compiler accepts a dialect of the language conforming to thelatest version of the ISO standard, plus extensions, or compatible with the older cfront dialect.

Normal C++ mode

By default (or with the −Xd option), the compiler implements almost all of the language and library featuresof the ISO standard. The only features in the standard that are not implemented are:

C++ language 119

Page 127: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

Two−phase name binding in templates, as described in sections 14.6 and 14.6.2 of the standard, is notimplemented.

A partial specialization of a class member template cannot be added outside of the class definition.• The export keyword for templates is not implemented.• Placement delete is not implemented.• Function−try−block (a try block that is the top−level statement of a function, constructor, ordestructor) is not implemented.

Support for multibyte characters in source is not implemented (but support for universal character setescapes such as \u05d0 is implemented).

String literals do not have const type.•

Extensions accepted in normal C++ mode

By default, the following extensions are accepted (except when strict ISO violations are diagnosed as errorsvia the −Xe option). Using them is discouraged unless they occur in existing code which is difficult to change.

A friend declaration for a class may omit the class keyword:

class B; class A { friend B; // Should be "friend class B" };

Constants of scalar type may be defined within classes (this is an old form; the modern form uses aninitialized static data member):

class A { const int size = 10; int a[size]; };

In the declaration of a class member, a qualified name may be used:

struct A { int A::f(); // Should be int f(); }

The preprocessing symbol c_plusplus is defined in addition to the standard __cplusplus.• A pointer to a constant type can be deleted.• An assignment operator declared in a derived class with a parameter type matching one of its baseclasses is treated as a ``default'' assignment operator −−− that is, such a declaration blocks the implicitgeneration of a copy assignment operator. (This is cfront behavior that is known to be relied upon inat least one widely used library.) Here's an example:

struct A { }; struct B : public A { B& operator=(A&); };

By default, as well as in cfront−compatibility mode, there will be no implicit declaration ofB::operator=(const B&), whereas in strict−ISO mode B::operator=(A&) is not a copy assignmentoperator and B::operator=(const B&) is implicitly declared.

Implicit type conversion between a pointer to an extern "C" function and a pointer to an extern "C++"function is permitted:

extern "C" voidf(); void (*pf) () = &f ; //allowed

C++ language

Extensions accepted in normal C++ mode 120

Page 128: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

The long long and unsigned long long built−in types are supported, representing 64−bit signed andunsigned integers.

The NCEG Floating Point Specification relational operators are supported. These extend the set ofbasic comparisons to support the notion of "unordered" from the IEEE 754 floating point standard.

In addition, the compiler accepts most of the preprocessing and assembly−level extensions of the UnixWare®

C compiler. These include #assert, #ident, #pragma ident, #pragma weak, #pragma pack, old−style asmsand enhanced asms.

Anachronisms accepted

The following anachronisms are accepted when anachronisms are enabled via the −Xo option. Using them isdiscouraged unless they occur in existing code that is difficult to change.

overload is allowed in function declarations. It is accepted and ignored.• Definitions are not required for static data members that can be initialized using default initialization.The anachronism does not apply to static data members of template classes; they must always bedefined.

The number of elements in an array may be specified in an array delete operation. The value isignored.

A single operator++() and operator−−() function can be used to overload both prefix and postfixoperations.

The base class name may be omitted in a base class initializer if there is only one immediate baseclass.

A bound function pointer (a pointer to a member function for a given object) can be cast to a pointerto a function.

A nested class name may be used as a nonnested class name provided no other class of that name hasbeen declared. The anachronism is not applied to template classes.

A reference to a non−const type may be initialized from a value of a different type. A temporary iscreated, it is initialized from the (converted) initial value, and the reference is set to the temporary.

A function with old−style parameter declarations is allowed and may participate in functionoverloading as though it were prototyped. Default argument promotion is not applied to parametertypes of such functions when the check for compatibility is done, so that the following declares theoverloading of two functions named f:

int f(int); int f(x) char x; { return x; }

It will be noted that in C this is code is legal but has a different meaning: a tentative declaration of f isfollowed by its definition.

A reference to a non−const class can be bound to a class rvalue of the same type or a derived typethereof. Example:

class A { public: A(int); A operator=(A&); A operator+(const A&); }; void f() { A b(1); b = A(1) + A(2); // allowed }

C++ language

Anachronisms accepted 121

Page 129: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

Extensions accepted in cfront transition mode

The following extensions are accepted in cfront transition mode via the −Xo option. Using them isdiscouraged unless they occur in existing code that would be difficult to change.

NOTE: This mode does not necessarily provide complete compatibility with cfront Release 3.0. Somechanges may be required in user code.

Because cfront does not check the accessibility of types, access errors for types are issued as warningsinstead of errors.

A reference to a pointer type may be initialized from a pointer value without use of a temporary evenwhen the reference pointer type has additional type qualifiers above those present in the pointer value.For example,

int *p; const int *&r = p; // No temporary used

No warning is issued when an operator()() function has default argument expressions.• Virtual function table pointer update code is not generated in destructors for base classes of classeswithout virtual functions, even if the base class virtual functions might be overridden in afurther−derived class. For example:

struct A { virtual void f() {} A() {} ~A() {} }; struct B : public A { B() {} ~B() {f();} // Should call A::f according to ARM 12.7 }; struct C : public B { void f() {} } c;

In cfront transition mode, B::~B calls C::f.

An alternate form of declaring pointer−to−member−function variables is supported, namely:

struct A { void f(int); static void f(int); typedef void A::T3(int); // nonstd typedef decl typedef void T2(int); // std typedef }; typedef void A::T(int); // nonstd typedef decl T* pmf = &A::f; // nonstd ptr−to−member decl A::T2* pf = A::sf; // std ptr to static mem decl A::T3* pmf2 = &A::f; // nonstd ptr−to−member decl

where T is construed to name a routine type for a nonstatic member function of class A that takes anint argument and returns void; the use of such types is restricted to nonstandard pointer−to−memberdeclarations. The declarations of T and pmf in combination are equivalent to a single standardpointer−to−member declaration:

void (A::* pmf)(int) = &A::f;

C++ language

Extensions accepted in cfront transition mode 122

Page 130: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

A nonstandard pointer−to−member declaration that appears outside of a class declaration, such as thedeclaration of T, is normally invalid and would cause an error to be issued. However, for declarationsthat appear within a class declaration, such as A::T3, this feature changes the meaning of a validdeclaration. cfront version 2.1 accepts declarations, such as T, even when A is an incomplete type; sothis case is also excepted.Protected member access checking is not done when the address of a protected member is taken. Forexample:

class B { protected: int i; }; class D : public B { void mf()};

void D::mf() { int B::* pmi1 = &B::i; // error, OK in cfront mode int D::* pmi2 = &D::i; // OK }

Note that protected member access checking for other operations (i.e., everything except taking apointer−to−member address) is done in the normal manner.

The destructor of a derived class may implicitly call the private destructor of a base class. In defaultmode this is an error but in cfront mode it is reduced to a warning. For example:

class A { ~A(); }; class B : public A { ~B(); }; B::~B(){} // Error except in cfront mode

An extra comma is allowed after the last argument in an argument list, as for example in

f(1, 2, );

When disambiguation requires deciding whether something is a parameter declaration or an argumentexpression, the pattern type−name−or−keyword (identifier...) is treated as an argument. For example:

class A { A(); }; double d; A x(int(d)); A (x2);

By default int(d) is interpreted as a parameter declaration (with redundant parentheses), and so x is afunction; but in cfront transition mode int(d) is an argument and x is a variable. The declarationA(x2); is also misinterpreted by cfront. It should be interpreted as the declaration of an object namedx2, but in cfront mode is interpreted as a function style cast of x2 to the type A.

Similarly, the declaration

int xyz(int() );

declares a function named xyz, that takes a parameter of type "function taking no arguments andreturning an int." In cfront mode this is interpreted as a declaration of an object that is initialized withthe value int() (which evaluates to zero).

A named bit−field may have a size of zero. The declaration is treated as though no name had beendeclared.

Type qualifiers on the this parameter may to be dropped in contexts such as this example:

struct A {

C++ language

Extensions accepted in cfront transition mode 123

Page 131: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

void f() const; }; void (A::*fp)() = &A::f;

This is actually a safe operation. A pointer to a const function may be put into a pointer to non−const,because a call using the pointer is permitted to modify the object and the function pointed to willactually not modify the object. The opposite assignment would not be safe.Conversion operators specifying conversion to void are allowed.• A nonstandard friend declaration may introduce a new type. A friend declaration that omits theelaborated type specifier is allowed in default mode, but in cfront mode the declaration is also allowedto introduce a new type name.

struct A { friend B; };

The third operator of the ? operator is a conditional expression instead of an assignment expression asit is in the current Iso standard.

A reference may be initialized with a null.• When matching arguments of an overloaded function, a const variable with value zero is notconsidered to be a null pointer constant.

Plain bit fields (i.e., bit fields declared with a type of int) are always unsigned.• The name given in an elaborated type specifier is permitted to be a typedef name that is the synonymfor a class name, e.g.,

typedef class A T; class T *pa; // No error in cfront mode

No warning is issued on duplicate size and sign specifiers.

short short int i; // No warning in cfront mode

A constant pointer−to−member−function may be cast to a pointer−to−function. A warning is issued.

struct A {int f();}; main () { int (*p)(); p = (int (*)())A::f; // Okay, with warning }

Arguments of class types that allow bitwise copy construction but also have destructors are passed byvalue (i.e., like C structures), and the destructor is not called on the ``copy.'' In normal mode, the classobject is copied into a temporary, the address of the temporary is passed as the argument, and thedestructor is called on the temporary after the call returns. Note that because the argument is passeddifferently (by value instead of by address), code like this compiled in cfront mode is notcalling−sequence compatible with the same code compiled in normal mode. In practice, this is notmuch of a problem, since classes that allow bitwise copying usually do not have destructors.

A union member may be declared to have the type of a class for which the user has defined anassignment operator (as long as the class has no constructor or destructor). A warning is issued.

When an unnamed class appears in a typedef declaration, the typedef name may appear as the classname in an elaborated type specifier.

typedef struct { int i, j; } S; struct S x; // No error in cfront mode

Two member functions may be declared with the same parameter types when one is static and theother is non−static with a function qualifier.

class A {

C++ language

Extensions accepted in cfront transition mode 124

Page 132: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

void f(int) const; static void f(int); //No error in cfront mode

The scope of a variable in the for−init−statement is the scope to which the for statement belongs.

int f(int i) { for (int j = 0; j < i; ++j) {/* ...*/ } return j; //No error in cfront mode }

Function types differing only in that one is declared extern "C" and the other extern "C++" can betreated as identical:

typedef void (*PF) (); extern "C" typedef void (*PCF) (); void f(PF); void f(PCF);

PF and PCF are considered identical and void f(PCF) is treated as a compatible redeclaration of f.(By contrast, in standard C++ PF and PCF are different and incompatible types −− PF is a pointer toan extern "C++" function whereas PCF is a pointer to an extern "C" function −− and the twodeclarations of f create an overload set.)

Note that implicit type conversion will always be done between a pointer to an extern "C" functionand a pointer to an extern "C++" function.

Functions declared inline have internal linkage.• enum types are regarded as integral types.• An uninitialized const object of non−POD class type is allowed even if its default constructor isimplicitly declared:

struct A { virtual void f(); int i; }; const A a;

Old−style template specializations are allowed.• Old−style guiding declarations for function templates are allowed.• The old−form header <new.h> is available, with its old definition.• operator new returns zero rather than throwing an exception when no memory is available.• In addition, keywords and syntax for the following new language features are not recognized:runtime type informationarray new and deleteexplicitnamespaceswchar_tbooltypenamealternative operators

An assignment operator declared in a derived class with a parameter type matching one of its baseclasses is treated as a ``default'' assignment operator −−− that is, such a declaration blocks the implicitgeneration of a copy assignment operator. (This is cfront behavior that is known to be relied upon inat least one widely used library.) Here's an example:

struct A { }; struct B : public A { B& operator=(A&); };

C++ language

Extensions accepted in cfront transition mode 125

Page 133: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

By default, as well as in cfront−compatibility mode, there will be no implicit declaration ofB::operator=(const B&), whereas in strict−ISO mode B::operator=(A&) is not a copy assignmentoperator and B::operator=(const B&) is implicitly declared.

C++ language

Extensions accepted in cfront transition mode 126

Page 134: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

Instantiating C++ templatesThe C++ language has a feature known as a ``template''. A template is a skeleton for defining a set of types orfunctions. Each type or function is created by combining the template with a set of arguments that arethemselves types or values. This combining process is called ``instantiation''.

NOTE: Since templates are descriptions of entities (typically, classes) that are parameterizable according tothe types they operate upon, they are sometimes called parameterized types.

For example, consider a Vector template:

template <class T> class Vector { int cursize; T* ptr; void grow(); public: Vector(); ~Vector(); T& operator[](int); };

This template takes one type argument T, has two private data members cursize and ptr, and one privatemember function grow. There are also a public constructor and destructor and one function operator[](int).

Suppose that you have some class A and you want to use a vector of class objects in your application. To dothis, you might say:

void f() { Vector<A> x; A a, b;

x[17] = a; b = x[23]; }

Vector<A> is a ``template class'', a combination of a template with specific argument types. This is aninstantiation.

The instantiation problem

An application may use multiple template classes. These have to be instantiated at some point. If this is donein each source file (actually each object file) where the template class is used, then there will be conflicts inthe linker with multiply defined symbols, and wasted space.

Another issue concerns instantiating unused members of a template; this is not allowed, but determining usagerequires knowledge of the whole program.

C++ also has the notion of ``specializations'', that is a specific version of a template entity to be used in placeof the general version. A specialization cannot be discovered until link time.

Instantiating C++ templates 127

Page 135: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

One way around these problems is to defer instantiation until link time. This approach also has problems, suchas recovering information about where templates and types are defined.

Coding standards and several ``manual'' schemes for doing instantiation are discussed in ``Coding standardsfor template definitions'' and ``Manual instantiation''. Manual means that the user is responsible for directinginstantiation, rather than having it done automatically. We will also discuss ``automatic'' instantiation in``Automatic instantiation''.

The goal of automatic instantiation is to guide painless instantiation. The programmer should be able tocompile source files to object code, then link them and run the resulting program without worry about how thenecessary instantiations are done. Automatic instantiation is the default for the C++ Compilation System, andit is recommended for most purposes. However, it does have some consequences; we will discuss somepractical tips on compile−time performance and a discussion of some of the things that can go wrong.

Coding standards for template definitions

In order for instantiation to work, the C++ compiler must have available both the declaration and definition ofeach template and template argument type that is used. Normally a template declaration is placed in a .h file,and the definition in a .C file in the same directory. For function templates, the .h file would contain a forwarddeclaration:

template <class T> void f(T);

An application that uses templates would include the .h file, and the .C file would automatically be includedby the compiler. (The file with the definition can have any valid C++ source file suffix, not only .C.) Thisprocess is known as ``implicit inclusion'', and is goverened by the CC −T options implicit and no_implicit.Implicit inclusion is on by default. It may be used with either automatic or manual instantiation. It will notwork for preprocessed (.i) source files.

The files containing template definitions should containg only template definitions, since they are notcompiled in the normal way.

If implicit inclusion is not used, for example, if the declaration and definition files have different basenamesor are in different directories, it is necessary to include both files in the application. The rule is simply that toinstantiate, the template declaration and definition must both be present or else there must be known rules forfinding the template definition.

Manual instantiation

Manual instantiation is best done by using the C++ language's explicit instantiation directive, since thismethod is portable across compiler systems. A simple example is:

class Baseball { };

template<class T> class Roster { };

template class Roster<Baseball>; // explicit instantiation

Instantiating C++ templates

Coding standards for template definitions 128

Page 136: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

Manual instantiation can also be controlled through the use of #pragma instantiate, or via the command linethrough the use of the −T option to CC.

Pragma interface

One approach to controlling instantiation is to use #pragma in a program:

#pragma instantiate A<int> // template class name

#pragma instantiate A<int>::f // member function name

#pragma instantiate A<int>::i // static data member name

#pragma instantiate void A<int>::f(int, char) // member function declaration

#pragma instantiate char* f(int, float) // template function declaration

do_not_instantiate can be substituted for instantiate to exclude a specific member when instantiating awhole template class. For example,

#pragma instantiate A<int> #pragma do_not_instantiate A<int>::f

Template definitions must be present in the source file (typically via an included header) for instantiation tooccur. If an instantiation is explicitly requested via instantiate and no template definition is available or aspecific definition is provided (a specialization), an error will be given.

template <class T> void f1(T); // No body provided template <class T> void g1(T); // No body provided void f1(int) {} // Specific definition int main() { int i; double d; f1(i); f1(d); g1(i); g1(d); } #pragma instantiate void f1(int) // error − specific definition #pragma instantiate void g1(int) // error − no body provided

f1(double) and g1(double) will not be instantiated (because no bodies were supplied) but no errors will beproduced during the compilation (if no bodies are supplied at link time, a linker error will be produced).

You can specify overloaded functions by giving the complete member function declaration including allargument types. Inline and pure virtual functions cannot be instantiated.

NOTE: Another pragma, can_instantiate, indicates that a specified entity can be instantiated in the currentcompilation, but need not be; it is used in conjunction with automatic instantiation, to indicate potential sitesfor instantiation if the template entity turns out to be required.

Instantiating C++ templates

Pragma interface 129

Page 137: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

Instantiation via the command line

Another way of manually controlling instantiation is via the −T option to CC. The suboptions you can use todo this are:

none − do not do any instantiation.• used − instantiate those template entities used in the compilation. This includes all static datamembers for which there are definitions.

all − instantiate all template entities declared or referenced in the compilation unit. For eachinstantiated template class, this includes all its member functions and static data members, whether ornot they are used. Non−member function templates will be instantiated even if the only reference wasa declaration.

local − like used except that all the instantiated functions are given internal linkage. This will preventname collisions between object files, at the expense of possibly generating many copies ofinstantiated functions across a set of source files, which may make this option unsuitable forproduction use.

These options to CC might typically be used with dummy source files that include the headers that describethe templates and their argument types and that make dummy references to the template classes used by therest of the application. These source files would be compiled using the various options mentioned above andthe resulting objects linked with the application.

Single files

If you specify a single source file (with no other objects or binaries) to CC for both compilation and linking(compile load and go), the equivalent of −Tused will be enabled, and all necessary instantiation done into theexecutable binary that is produced.

Automatic instantiation

Automatic instantiation is enabled via the −Tauto option to CC, which is on by default. (It is turned off by the−Tno_auto option.) As stated above, the goal of this approach is to provide painless, transparent instantiaion.For the most part, it is not necessary to know how automatic instantiation works in order to use it.

If you have an application that uses templates, one way of taking care of instantiation is via the manualapproaches described above. These work fine but can be tedious to manage. However, automatic instantiationcan be tricky, because it's not clear just when instantiation should be done. If it's done for each compilationunit, then there will be duplication, resulting in link errors about multiply−defined symbols, or at the leastmuch duplication of effort and bigger disk size for object files. Also, it's not always possible to know at thetime a given compilation unit is encountered which members of a template will be used. It's desirable toinstantiate only those members actually used, to keep the object file small.

Instead of instantiating at compile time, link−directed instantiation is used. The automatic instantiationmethod works as follows.

The first time the source files of a program are compiled, no template entities are instantiated.However, an associated generated file with the suffix abc.ti (if the source file is abc.C) containsinformation about things that could have been instantiated in each compilation.

1.

Instantiating C++ templates

Instantiation via the command line 130

Page 138: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

When the object files are linked together, a program called the prelinker is run. It examines the objectfiles, looking for references and definitions of template entities, and for the added information aboutentities that could be instantiated.

2.

If the prelinker finds a reference to a template entity for which there is no definition anywhere in theset of object files, it looks for a file that indicates that it could instantiate that template entity. When itfinds such a file, it assigns the instantiation to it. The set of instantiations assigned to a given file, sayabc.C, is recorded in an associated .ii file, for example abc.ii.

3.

The prelinker then executes the compiler again to recompile each file for which the .ii file waschanged. The original compilation command−line options (saved in the abc.ti file) are used for therecompilation.

4.

When the compiler compiles a file, it reads the .ii file for that file and obeys the instantiation requeststherein. It produces a new object file containing the requested template entities (and all the otherthings that were already in the object file).

5.

The prelinker repeats steps 3−5 until there are no more instantiations to be adjusted.6. The object files are linked together.7.

Once the program has been linked correctly, the .ii files contain a complete set of instantiation assignments.>From then on, whenever source files are recompiled, the compiler will consult the .ii files and do theindicated instantiations as it does the normal compilations. That means that, except in cases where the set ofrequired instantiations changes, the prelink step from then on will find that all the necessary instantiations arepresent in the object files and no instantiation assignment adjustments need be done. That's true even if theentire program is recompiled.

If the programmer provides a specialization of a template entity somewhere in the program, the specializationwill be seen as a definition by the prelinker. Since that definition satisfies whatever references there might beto that entity, the prelinker will see no need to request an instantiation of the entity. If the programmer adds aspecialization to a program that has previously been compiled, the prelinker will notice that too and removethe assignment of the instantiation from the proper .ii file.

The .ii files should not, in general, require any manual intervention. One exception: if a definition is changedin such a way that some instantiation no longer compiles (it gets errors), and at the same time a specializationis added in another file, and the first file is being recompiled before the specialization file and is getting errors,the .ii file for the file getting the errors must be deleted manually to allow the prelinker to regenerate it.

Using the −v option to CC will let you see what the prelinker is doing. For example, if the prelinker changesan instantiation assignment, it will issue a message like:

C++ prelinker: A<:int>::f(void) assigned to file test.o C++ prelinker: executing: CC −c test.c

The automatic instantiation scheme can coexist with partial explicit control of instantiation by theprogrammer through the use of pragmas or command−line specification of the instantiation mode.

Dependency management

Dependency management refers to the automatic re−instantiation or not of out−of−date instantiations, such asmight be triggered by a header file changing. For the manual schemes above this would be handled bymakefile rules. That is, a specific object file is produced by telling the compiler to instantiate particularentities, and that object file depends on particular headers that declare and define the templates to be used.

Instantiating C++ templates

Dependency management 131

Page 139: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

For automatic instantiation, instantiation is done by replaying source files. Those source files should be madeto depend not only on the normal headers, but also on the template declaration and definition files.

Performance

Automatic instantiation adds compilation/link time. There are two ways of avoiding this overhead, at the costof more explicit user intervention:

Use one of the manual instantiation schemes described earlier. Typically you would identify whichtemplate classes were used by a program and then create a specific source file to contain theinstantiations.

Identify instantiations that are needed for a project and do them up front, then put the resulting objectsinto an archive library for use by the project. Remember that the automatic instantiation process isdriven by unresolved external symbols, and provision of the needed instantiations preempts any needfor automatic instantiation.

What can go wrong

The automatic instantiation scheme has some limitations that should be noted:

No directory locking takes place, therefore there will be problems if multiple compilation processesshare the same files and directories.

The existence of old .ii and .ti files can disrupt automatic instantiation in arbitrary ways. If yoususpect problems of this nature, delete all existing .ii and .ti files and start over.

Manual editing of a .ii or .ti file can likewise cause problems.• Naming conventions must be followed.• Source files must be available to replay.•

Other considerations

Other considerations when using templates include inlines, specializations, libraries and special symbols.

Inlines

Template functions or class members that are inline have no special requirements. Such inline functions aredefined in headers either in the class body or by specifying the inline keyword.

Specializations

A specialization is a particular instance of a template function or data item that overrides the template version.This might be done, for example, to improve performance for specific common cases. Some examples ofspecializations:

template <class T> void f(T) {} // template

template<> void f(char* s) {} // specialization

Instantiating C++ templates

Performance 132

Page 140: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

template <class T> struct A {void f();} // template

template<> void A<int>::f() {} // specialization

template <class T> struct B {static int x;}; // template template <class T> int A<T>::x = 37;

template<> int A<double>::x = −59; // specialization

Such specializations are automatically picked up by the link simulation mentioned earlier, and they overridegeneral templates.

Libraries

A ``library'' is an archive of object files. There are several issues with libraries when using templates. First ofall, as noted in ``Performance'' , it is possible to pre−instantiate needed template functions into an archive andthen specify that archive on the link line and do away with the need for any instantiation at all.

When instantiating templates that will be used internally within an archive, use the CC −Tprelink_objectsoption. Use of this option will cause the prelinking phase to run, but the linking phase to be suppressed. Atypical usage might be:

CC −c a.C b.C c.C CC −Tprelink_objects ar rv libfoo.a a.o b.o c.o

or

CC −Tprelink_objects a.C b.C v.C ar rv libfoo.a a.o b.o c.o

Another issue with libraries is the notion of ``closure''. Suppose that you provide a library to anotherdevelopment group, and that the library uses templates internally (whether or not the applications that use thelibrary do is irrelevant). Instantiation in such a case will fail, because there are no library sources to replay.Therefore a library must be self−contained as regards template functions; if an object file in the library usestemplates, some other object file in that library should contain definitions of the necessary template functions.

More on linking template code into archives

There are some unique problems that can be encountered when using automatic instantiation and preparingarchive libraries (.a files) that contain template−based code. However, the bottom line is that the automaticinstantiation system should handle most of them without user invention; the material in this section is for raresituations where this does not happen, or for those looking to fine−tune their instantiations.

To begin with, consider this simple example:

$ cat A.h template<class T> class A { public: int fff(); int ggg(); int hhh(); };

Instantiating C++ templates

Libraries 133

Page 141: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

#include "A.C"

$ cat A.C template<class T> int A<T>::fff() { return (T) 1; }

template<class T> int A<T>::ggg() { return (T) 2; }

template<class T> int A<T>::hhh() { return (T) 3; }

$ cat al1.C #include "A.h"

int al1() { A<int> a; return a.fff() + a.ggg(); }

$ cat al2.C #include "A.h"

int al2() { A<int> a; return a.fff() + a.hhh(); }

$ cat main.C int al1(); int al2();

int main() { return al1() + al2(); }

The main function calls two other functions. The two other functions each call member functions of the sametemplate instantiation.

However, as it happens the two other functions are in separate archive libraries, so we build the first library:

$ CC −c al1.C $ CC −Tprelink_objects al1.o prelink: INFO: C++ prelinker: executing: CC −c al1.C $ ar rv libal1.a al1.o a − al1.o UX:ar: INFO: Creating libal1.a

and then the second library:

$ CC −c al2.C $ CC −Tprelink_objects al2.o prelink: INFO: C++ prelinker: executing: CC −c al2.C $ ar rv libal2.a al2.o a − al2.o UX:ar: INFO: Creating libal2.a

Now, pretend we are using an earlier release of the UDK C++ compiler. We build the main programexecutable and link it against the two archive libraries.

$ CC −c main.C $ CC −L. main.o −lal1 −lal2 UX:ld: ERROR: ./libal2.a(al2.o): fatal error: symbol `A<T1>::fff(void) [with T1=int, return type=int]` multiply−defined, also in file ./libal1.a(al1.o)

What happened? After the prelinking and archiving, libal1.a:al1.o has definitions of "A<int>:fff()" and"A<int>:ggg()", while libal2.a:al2.o has definitions of "A<int>:fff()" and "A<int>:hhh()". When the mainprogram pulls in libal1.a:al1.o to resolve al1() and libal2.a:al2.o to resolve al2(), it also pulls in bothdefinitions of "A<int>:fff()", and hence the error.

Instantiating C++ templates

Libraries 134

Page 142: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

This is a very common situation to get into when using archive libraries for template−based code. There arefour possible solutions:

Do no instantiations in the archives (which means in practice omitting the prelinking step), but insteaddo all of the instantiations when linking the executable. This gets around the multiple definition error,but presents equally difficult other problems.

First, the template definition source code may not be available at the time the executable isbuilt (think of a set of archive libraries shipped as part of a vendor's product, where thetemplate is internal to the libraries).

Second, even if the template definition source is available for instantiation, there may be noinstantiation request (meaning a use, or a template instantiation directive, or a #pragma) in the.o files that make up the executable. Without such a request, the compiler's automaticinstantiation scheme will not create or assign an instantiation (it can not assign one to the .o'sin the archive since they are essentially read−only at that point). Thus, you will typically get alink that fails for unresolved symbols (rather than multiply−defined symbols) with thisapproach.

1.

Another way would be to abandon automatic instantiation and manually assign instantiations toindividual .o's, knowing ahead of time how the .o's will be combined into archives and used byexecutables. This will work for small cases, but quickly becomes hopelessly infeasible for real−sizedprojects.

2.

The straightforward solution adopted in current releases of the UDK C++ compiler, is to generate alltemplate instantiations (of both template functions and template static data members) as "weaksymbols". See ``Handling multiply defined symbols''. This means that the multiple definitions canco−exist together without producing a linker error; all references to a symbol are bound to one copyof the symbol, and the other copies are ignored. This approach has the great advantage of beingtransparent to the user, and in the great majority of cases is the way to go.

However, there are two disadvantages. Extra copies of template instantiations are still kept in the .ofiles making up the archive, and depending upon the application this could lead to significant codebloat.

NOTE: This code bloat only exists when archives are involved; the normal automatic instantiationscheme prevents this from happening in other kinds of links.

In addition, certain arrangements of template source code and usages, especially those developedunder the gcc compiler using the −frepo option, may not be linkable in this approach withoutsignificantly rearranging the contents of the archives, a rearrangement that may not be feasible.

3.

The fourth way is presented in the next section.4.

The one instantiation per object scheme

Given the results of the previous section, what is needed in occasional cases is a way to include all thenecessary instantiations with the archive libraries, but in such a way that the libraries can be linked againstwithout getting multiple definition errors (if the symbols were not weak).

The scheme is do this is called "one instantiation per object". The idea is that, when the appropriate option isgiven, a source file with instantiations in it will generate more than one object file. In the example of theprevious section, the al1.C source file will generate three object files: al1.o, containing the al1() function, and"A<int>:fff().o" and "A<int>:ggg().o" (the actual filenames are in a mangled form). These three object fileswill all be placed into libal1.a. Similarly, al2.C will generate al2.o, "A<int>:fff().o", and "A<int>:hhh().o", all

Instantiating C++ templates

The one instantiation per object scheme 135

Page 143: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

of which go into libal2.a. In general, there will be a separate .o for each template function, member functionof a template class, or static data member of a template class, that is instantiated.

Now, when the main program is linked against the archives, the linker will pull in libal1.a:al1.o,libal1.a:"A<int>:fff().o", libal1.a:"A<int>:ggg().o", libal2.a:al2.o, and libal2.a:"A<int>:hhh().o". It will notpull in libal2.a:"A<int>:fff().o" because it will have already resolved that symbol with thelibal1.a:"A<int>:fff().o", and it will not pull in a second "A<int>:fff()" from any other .o because eachinstantiation is isolated within its own .o.

In this way it is possible to close each archive library and still be able to link against them without fear ofmultiple copies of instantiation code being pull in.

The CC −Toipo and CC −Toipo_list (oipo is the acronym for "one instantiation per object") options are usedto implement this scheme. They are not on by the default because this scheme is complicated and is onlyrarely needed. When building most archives or dynamic libraries (.so files) or straight executables, the oneinstantiation per object is not necessary. It can be used in those circumstances and it will work, but it maycause longer build times (especially when the CC −g or −O options are being used), because some of thecompilation processes are cycled through for each instantiation.

In plainer words, do not use this scheme unless you are really sure you need to!

Here is how these options are used to build the example from the previous section. First we compile andprelink the a1.C source:

$ CC −c −Toipo=Template.al1 al1.C al1.C: $ CC −Tprelink_objects −Toipo=Template.al1 al1.o prelink: INFO: C++ prelinker: executing: CC −c −Toipo=Template.al1 al1.C al1.C: fff__10A__tm__2__b29006af.oipo: ggg__10A__tm__2__9fcf3e2f.oipo:

The commands are the same as before except we have added a −Toipo option, which is supplied to bothcompilation and prelinking commands. When the additional instantiation objects are created in this scheme,they are placed in a separate directory, to keep them out of the way of the "normal" objects of the application.By default this separate directory is the subdirectory ./Template.dir; the compiler will create it if it does notalready exist. This directory should be kept around between builds, but can be removed by a make clean ormake clobber type of action.

In this simple example we are building two archive libraries from the same source directory, and so we wantto keep the instantiation objects in two separate subdirectories. To do this we supply the optional =dirnameargument to the −Toipo option. In a real−life application, each archive would probably be built from aseparate part of the overall source tree, and the default Template.dir in each directory would work fine.

NOTE: The compiler prints out the names of the instantiation objects as it compiles them, much as it doeswhen you specify multiple source files to a compilation.

Now it is time to make the archive. We want to put not just al1.o into the archive but also these additionalinstantiation objects. We do could a

$ ls Template.al1 fff__10A__tm__2__b29006af.o ggg__10A__tm__2__9fcf3e2f.o

Instantiating C++ templates

The one instantiation per object scheme 136

Page 144: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

to see the objects that are there, but these are mangled filename forms that do not have much meaning to us.Furthermore the number of names of them will frequently change as our application is developed.

Instead, we use the CC −Toipo_list command. Given a set of primary object files from our application, it willprint to standard output the complete set of object files including instantiation objects:

$ CC −Toipo_list al1.o al1.o Template.al1/ggg__10A__tm__2__9fcf3e2f.o Template.al1/fff__10A__tm__2__b29006af.o

Now, we use this command within backquotes to supply the arguments to ar:

$ ar rv libal1.a `CC −Toipo_list al1.o` r − al1.o r − Template.al1/ggg__10A__tm__2__9fcf3e2f.o r − Template.al1/fff__10A__tm__2__b29006af.o UX:ar: INFO: Creating libal1.a

Next we do the same process to build libal2.a.

$ CC −c −Toipo=Template.al2 al2.C al2.C: $ CC −Tprelink_objects −Toipo=Template.al2 al2.o prelink: INFO: C++ prelinker: executing: CC −c −Toipo=Template.al2 al2.C al2.C: fff__10A__tm__2__b29006af.oipo: hhh__10A__tm__2__ecda57ee.oipo: $ ar rv libal2.a `CC −Toipo_list al2.o` a − al2.o a − Template.al2/hhh__10A__tm__2__ecda57ee.o a − Template.al2/fff__10A__tm__2__b29006af.o UX:ar: INFO: Creating libal2.a

Now it is time to link the executable. We are not using the one instantiation per object scheme in this step,since the main function is not going into an archive. So we use the same commands as before:

$ CC −c main.C $ CC −L. main.o −lal1 −lal2 $ ./a.out $ echo $? 7

This time, the link is successful and the program produces the expected output.

Previously, we stated that it is not necessary to use −Toipo unless templates are being built for archives, butthat it can be done. If you are doing this, you should first prelink the objects, and then use −Toipo_list tosupply input to the link step. Using the same example as above, a straight build without archives but using−Toipo would look like:

$ CC −c −Toipo al1.C al2.C main.C al1.C: al2.C: main.C: $ CC −Tprelink_objects −Toipo al1.o al2.o main.o prelink: INFO: C++ prelinker: executing: CC −c −Toipo al1.C al1.C: fff__10A__tm__2__b29006af.oipo: ggg__10A__tm__2__9fcf3e2f.oipo:

Instantiating C++ templates

The one instantiation per object scheme 137

Page 145: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

hhh__10A__tm__2__ecda57ee.oipo: $ CC `CC −Toipo_list al1.o al2.o main.o` $ ./a.out $ echo $? 7

As can be seen from these examples, it should never be necessary for you to manually deal with the contentsof the instantiations objects subdirectory. By naming the directory (if not the default) on the −Toipo option,and then by using the −Toipo_list option, the compilation system will kept track of the instantiation objectsfor you. One very minor exception to this is if you use a CC −Toipo −S command to generate assembly (.s)generated code files. This will produce .s files for the primary and all instantiation generated code files.However, if you subsequently do CC −Toipo whatever .s to assemble these files, only the primary file will beassembled. You will need to assemble the instantiation generated code files by hand yourself.

Special symbols

The instantiation process may introduce new symbols into the intermediate .o files and final executable file.Those symbols have special names and are used only to pass information along between phases of thecompilation process. The special names consist of a prefix, followed by the name of the function or object tobe instantiated. The prefix may be one of:

__CBI__ // Can be instantiated

__DNI__ // Do not instantiate

__TRI__ // Template instantiation request

For example, for the following C++ source:

template<class T> T func(T t) { return t; }

#pragma instantiate int func(int) #pragma do_not_instantiate double func(double)

int foo() { return func(1) + func(2.0); }

the object file would contain these symbols (from nm −C):

... [6] | 1| 1|OBJT |GLOB |0 |COMMON |__TIR__func(int) [7] | 1| 1|OBJT |GLOB |0 |COMMON |__CBI__func(int) [8] | 1| 1|OBJT |GLOB |0 |COMMON |__TIR__func(double) [9] | 1| 1|OBJT |GLOB |0 |COMMON |__DNI__func(double) [10] | 84| 14|FUNC |GLOB |0 |1 |func(int) [11] | 0| 0|NOTY |GLOB |0 |UNDEF |func(double) [12] | 0| 82|FUNC |GLOB |0 |1 |foo(void)

You do not need to be concerned about the meaning of these symbols.

Instantiating C++ templates

Special symbols 138

Page 146: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

Using C++ exception handlingThe C++ language has a feature known as exception handling. This is a scheme for dealing with errorconditions in programs that is designed to fit in well with object−oriented programming.

In general exception handling will function the same way in any C++ compiler. However there are someimplementation dependencies in a few particular areas:

Performance implications• Mixed language programming• Other implementation dependencies•

The following sections will describe these.

Performance implications

A general performance design goal of the C++ language is the maxim:

If you don't use a language feature, you don't pay for it.

But with exception handling this is somewhat difficult to achieve. Even the most innocuous−looking code:

void f() { A a; g(); }

will have a performance burden for exception handling, because if A is a class with a destructor and g() is afunction that might throw an exception, then the C++ runtime system has to know that object a must bedestroyed when the stack frame for f() is unwound.

This compiler attempts to minimize this distributed performance overhead, by tracking try blocks, catchhandlers, and object lifetimes with program−counter−based tables of instruction ranges built at compile− andlink−time, rather than with execution−time data structures and operations. These tables are created in specialnew sections, of new section types, within the object format, and in dynamic linking contexts are not relocateduntil and unless needed.

This has the following benefits:

no direct execution−time overhead for exception handling• minimal cost for setting up a try block• modest space overhead for exception handling•

with this drawback:

actual throwing and handling an exception may be slow•

In other words, performance tradeoffs have been selected to minimize the effect of exception handling whenno exceptions are thrown, at the sacrifice of performance for the case where exceptions are thrown. So a new

Using C++ exception handling 139

Page 147: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

maxim becomes:

Only use throw exceptions in exceptional situations

Note the phrasing "no direct execution−time overhead ..." above. There is some indirect cost to optimization(CC −O) due to exception handling, because certain kinds of optimizations must be suppressed lest theyinterfere with the range tables that the C++ runtime uses to process exceptions.

There is one area in which application programmers can further minimize the effects of exception handlingsupport. This is by using exception specifications. When functions have empty exception specifications, thecompiler knows that no exception can be thrown back through a call to such a function. Taking the aboveexample, if function g() is declared as

void g() throw();

then the compiler may be able to generate less exception handling support information for function f().Furthermore, if the constructors and destructors for class A are also declared with empty exceptionspecifications

class A { A() throw (); ~A() throw (); };

then the compiler can tell that an exception cannot possibly be raised within function f(), and no exceptionhandling support information need be generated at all. Use of exception specifications may also help minimizeoptimization degradation due to exception handling.

Note however that exception specifications should be used only when appropriate. If the natural way afunction would report an error condition is through an exception, then it should be written that way, unlessthere are extreme performance considerations involved. In particular, throwing an exception is almost alwaysthe best way for a constructor to indicate an error.

Mixed language programming

When an application has a mixture of C++ and C code, the question arises as to how exception handlingworks.

The most typical case involves C++ code that sets up a try block and then calls a C function, which in turncalls a C++ function that may throw an exception. A simple example:

file1.F (C++):

extern "C" void bar(int);

void foo() { try { bar(7); } catch (int i) { // do something } }

file2.c (C):

extern void gok(int);

Using C++ exception handling

Mixed language programming 140

Page 148: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

void bar(int k) { gok(k*k); }

file3.C (C++):

extern "C" void gok(int i) { throw i; }

This will work correctly in the C++ and C compilers. There can be various intervening layers of C++ and Cfunctions between the try block and the throw. However, there is one very important restriction:

All C functions that may have an exception thrown throughthem MUST be compiled with the cc −Kframe option.

This is because the C++ runtime system needs to be able to unwind any stacks from the point of the throw tothe point of the handler, and if those stacks are from C functions, the full frame pointer model (which is what−Kframe specifies) is the only one that is self−descriptive enough for the runtime system to understand.

So in the above example, file2.c must be compiled with −Kframe. Optimization may still be used, i.e., cc −O−Kframe is allowed.

Note that this constitutes a significant restriction regarding Gemini or third−party libraries that employ"callback" functions. Examples include qsort(3C) and bsearch(3C) from the Standard C library, and manyfunctions from the X Window System. In general these libraries will have been compiled with the−Kno_frame option (for performance reasons). As a consequence, if C++ functions are being used for thosecallbacks, such functions cannot exit via an exception. (They can however use exceptions to throw and catchexceptions internally, and then translate caught exceptions into some meaningful error code to the libraryinterface.)

There is no support for forms of mixed−language exception handling in which C functions are able to "throw"or "catch" exceptions imported from C++ code.

Other implementation dependencies

By default, when an exception is unhandled, the stack is not unwound. This allows symbolic debugging fromthe place of origin. However, if the UNWIND_STACK_BEFORE_TERMINATE environment variable isdefined, the C++ support runtime will save an image of the process in file throw.core.process−id, thenunwind the stack fully before calling terminate().

The setjmp(3C) and longjmp(3C) facility from the Standard C library will work in conjunction with C++exception handling, as long as:

longjmp does not bypass any destructible objects (this is a restriction in the ISO C++ standard), and• longjmp does not transfer control into or out of a try block.•

Using C++ exception handling

Other implementation dependencies 141

Page 149: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

Compiler diagnosticsThis topic consists of two subtopics. The first subtopic contains the text and an explanation of all themessages produced by the C compiler. The second subtopic contains a description of the messages producedby the C++ compiler.

C compiler diagnostics

This subtopic contains the text and explanation for all the warning and error messages produced by the Ccompiler. All messages are given as they appear in the C locale. The messages are listed in alphanumericorder (special characters are ignored). Numbers precede capital letters and capital letters precede lowercaseletters. n, when it represents a number, comes at the beginning of the list.

The message entries are formatted as follows:

Entry Comment

n extra byte(s) in string literal initializerignored

Text of message.

Type: Warning Options: all Type of message and command−line options which mustbe set for the message to appear (``all'' indicates that themessage is independent of options).

A string literal that initializes a character arraycontains n more characters than the array can hold.

Explanation of message.

char ca[3] = "abcd"; Example of code that might generate the message.

"file", line 1: warning: 1 extra byte(s) in stringliteral initializer ignored

Message output.

When an error occurs, the error message is preceded by a file name and line number. All messages output areone line long (no newline characters). The line number is usually the line on which a problem has beendiagnosed. Occasionally the compiler must read the next token before it can diagnose a problem, in whichcase the line number in the message may be a higher line number than that of the offending line.

NOTE: For internationalized systems, the format of the error message output differs slightly. For suchsystems, the error message will be prefixed by:

UX:acomp:type: error_message

where type is either WARNING or ERROR depending on the type of message. error_message is the messageoutput as shown above.

Note that lint issues all the messages listed in this topic, and additional messages about potential bugs andportability problems.

NOTE: See ``Analyzing your code with lint'' for further information.

Compiler diagnostics 142

Page 150: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

Message types and applicable options

Each message description includes a Type and an Options field as follows:

Typeindicates whether the message is a warning, an error, a fatal error, or a combination of error types (seebelow).

Optionsindicates which cc command options must be set for the message to appear. ``all'' implies that themessage is independent of cc options.

The following paragraphs explain the differences between warnings messages, error messages, and fatalerrors.

Warning messages, in which warning: appears after the file name and line number, provide usefulinformation without interrupting compilation. They may diagnose a programming error, or a violation of Csyntax or semantics, for which the compiler will nevertheless generate valid object code.

Error messages, which lack the warning: prefix, will cause the cc command to fail. Errors occur when thecompiler has diagnosed a serious problem that makes it unable to understand the program or to continue togenerate correct object code. It will attempt to examine the rest of the program for other errors, however. Thecc command will not link the program if the compiler diagnoses errors.

Fatal errors cause the compiler to stop immediately and return an error indication to the cc command. A fatalerror message is prefixed with fatal:. Such messages typically apply to start−up conditions, such as beingunable to find a source file.

Operator names in messages

Some messages include the name of a compiler operator, as in:

operands must have arithmetic type: op ""+"" .

Usually the operator in the message is a familiar C operator. At other times the compiler uses its internal namefor the operator, like U−. This subtopic lists internal operator names that the compiler may use in errormessages with definitions of these names.

,OPThe C ``comma operator'' (as distinct from the , that is used to separate function arguments).

ARGA function argument, or a value passed to a function.

AUTOAn automatic variable that has not been allocated to a register.

CALL

Compiler diagnostics

Message types and applicable options 143

Page 151: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

A function call with arguments.

CBRANCHA conditional branch. (This may be part of an if or loop statement.)

CONVA conversion. It may have been explicit, in the form of a cast, or implicit, in the semantics of a Cstatement.

FCONA floating−point constant.

ICONAn integer or address constant.

NAMEAn object or function with extern or static storage class.

PARAMA function parameter, a value that is received by a function.

REGAn object that has been allocated to a register.

RETURNThe operation that corresponds to a return statement.

STARThe indirection operator, as in p.

STRINGA string literal.

U&The ``take address of'' operator (as distinct from the bit−wise AND operation).

U−The arithmetic negation operator (as distinct from subtraction).

UCALLA function call with no arguments.

UGEAn unsigned >= comparison.

UGTAn unsigned > comparison.

ULEAn unsigned <= comparison.

Compiler diagnostics

Message types and applicable options 144

Page 152: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

ULTAn unsigned < comparison.

UPLUSThe ANSI C ``unary +'' operator.

Messages

//−style comments accepted as an extensionType: Warning Options: −XcA standard conforming C compiler must emit a diagnostic about "syntax errors" like two adjacentdivision operators. This warning is produced only at the first use of a // comment in a compile.

int a; // declare a

"file", line 1: warning: //−style comments accepted as an extension

'$' in an identifier accepted as an extensionType: Warning Options: −Kdollar −XcThis one−time warning is issued on the first occurrence of a $ in an identifier when the −Kdollar and−Xc options are used together.

n extra byte(s) in string literal initializer ignoredType: Warning Options: allA string literal that initializes a character array contains n more characters than the array can hold.

char ca[3] = "abcd";

"file," line 1: warning: 1 extra byte(s) in string literal initializer ignored

0 is invalid in # <number> directiveType: Error Options: allThe line number in a line number information directive (which the compiler uses for internalcommunication) must be a positive, non−zero value.

# 0 "foo.c"

"file," line 1: 0 is invalid in # <number> directive

0 is invalid in #line directiveType: Error Options: allThis diagnostic is similar to the preceding one, except the invalid line number appeared in a #linedirective.

#line 0

"file," line 1: 0 is invalid in #line directive

Compiler diagnostics

Messages 145

Page 153: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

ANSI C behavior differs; not modifying typedef with modifierType: Warning Options: −Xa, −XcA typedefed type may not be modified with the short, long, signed, or unsigned type modifiers,although earlier releases permitted it. modifier is ignored. A related message is modifyingtypedef with modifier; only qualifiers allowed.

typedef int INT;unsigned INT ui;

"file," line 2: warning: ANSI C behavior differs; not modifying typedef with "unsigned"

ANSI C predefined macro cannot be redefinedType: Warning Options: allThe source code attempted to define or redefine a macro that is predefined by ANSI C. Thepredefined macro is unchanged.

#define __FILE__ "xyz.c"

"file," line 1: warning: ANSI C predefined macro cannot be redefined

ANSI C predefined macro cannot be undefinedType: Warning Options: allThe source code contains an attempt to undefine a macro that is predefined by ANSI C.

#undef __FILE__

"file," line 1: warning: ANSI C predefined macro cannot be undefined

ANSI C requires formal parameter before "..."Type: Warning Options: −Xc, −vThe compiler implementation allows you to define a function with a variable number of argumentsand no fixed arguments. ANSI C requires at least one fixed argument.

f(...){}

"file," line 1: warning: ANSI C requires formal parameter before "..."

ANSI C treats constant as unsigned: op "operator"Type: Warning Options: −vThe type promotion rules for ANSI C are slightly different from those of previous releases. In thecurrent release the default behavior is to use the ANSI C rules.

Obtain the older behavior by using the −Xt option for the cc command.

Previous type promotion rules were ``unsigned−preserving.'' If one of the operands of an expressionwas of unsigned type, the operands were promoted to a common unsigned type before the operationwas performed.

ANSI C uses ``value−preserving'' type promotion rules. An unsigned type is promoted to a signedtype if all its values may be represented in the signed type.

Compiler diagnostics

Messages 146

Page 154: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

ANSI C also has a different rule from previous releases for the type of an integral constant thatimplicitly sets the sign bit.

The different type promotion rules may lead to different program behavior for the operators that areaffected by the unsigned−ness of their operands:

The division operators: /, /=, %, %=.◊ The right shift operators: >>, >>=.◊ The relational operators: <, <=, >, >=.◊

The warning message tells that the program contains an expression in which the behavior of operatorwill changed from earlier compilers. To guarantee the desired behavior, insert an explicit cast in theexpression.

f(void){ int i; / constant was integer, unsigned in ANSI C / i /= 0xf0000000;}

"file," line 4: warning: ANSI C treats constant as unsigned: op "/=" To get the same behavior as inprevious releases add an explicit cast:

f(void){ int i; / constant was integer, unsigned in ANSI C / i /= (int) 0xf0000000;}

−D option argument not an identifierType: Error Options: allAn identifier must follow the −D cc command line option.

cc −D3b2 −c x.ccommand line: −D option argument not an identifier

−D option argument not followed by "="Type: Warning Options: allIf any tokens follow an identifier in a −D command line option to the cc command, the first suchtoken must be =.

cc −DTWO+2 −c x.ccommand line: warning: −D option argument not followed by "="

EOF in argument list of macro: nameType: Error Options: allThe compiler reached end−of−file while reading the arguments for an invocation of function−likemacro name.

#define mac(a)mac( arg1

"file," line 5: EOF in argument list of macro: mac

Compiler diagnostics

Messages 147

Page 155: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

EOF in asm function definitionType: Error Options: allThe compiler reached end−of−file while reading an enhanced asm function definition.

EOF in character constantType: Error Options: allThe compiler encountered end−of−file inside a character constant.

EOF in commentType: Warning Options: allThe compiler encountered end−of−file while reading a comment.

EOF in string literalType: Error Options: allThe compiler encountered end−of−file inside a string literal.

NUL in asm function definitionType: Warning Options: allThe compiler encountered a NUL (zero) character while reading an enhanced asm function definition.The NUL is ignored.

−U option argument not an identifierType: Error Options: allAn identifier must follow the −U cc command line option.

cc −U3b2 −c x.ccommand line: −U option argument not an identifier

a cast does not yield an lvalueType: Warning, Error Options: allA cast may not be applied to the operand that constitutes the object to be changed in an assignmentoperation. The diagnostic is a warning if the size of the operand type and the size of the type beingcast to are the same; otherwise it is an error.

f(void){ int i; (long) i = 5; (short) i = 4;}

"file," line 3: warning: a cast does not yield an lvalue "file," line 4: a cast does not yield an lvalue

\a is ANSI C "alert" characterType: Warning Options: −XtIn earlier releases, '\a' was equivalent to 'a'. However, ANSI C defines '\a' to be an alert character. Inthe implementation, the corresponding character code is 07, the BEL character.

int c = '\a';

"file," line 1: warning: \a is ANSI C "alert" character

Compiler diagnostics

Messages 148

Page 156: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

access through "void" pointer ignoredType: Warning Options: all A pointer to void may not be used to access an object. An expression waswritten that does an indirection through a (possibly qualified) pointer to void. The indirection isignored, although the rest of the expression (if any) is honored.

f(){ volatile void vp1, vp2;

(vp1 = vp2); / assignment does get done /}

"file," line 3: warning: access through "void" pointer ignored

argument cannot have unknown size: arg #nType: Error Options: allAn argument in a function call must have a completed type. A struct, union, or enum object waspassed whose type is incomplete.

f(){ struct s st; g( st);}

"file," line 3: argument cannot have unknown size: arg #1

argument does not match remembered type: arg #nType: Warning Options: −vAt a function call, the compiler determined that the type of the n−th argument passed to a functiondisagrees with other information it has about the function. That other information comes from twosources:

An old−style (non−prototype) function definition, or1. A function prototype declaration that has gone out of scope, but whose type information isstill remembered.

2.

The argument in question is promoted according to the default argument promotion rules.

This diagnostic may be incorrect if the old−style function definition case applies and the functiontakes a variable number of arguments.

void f(i)int i;{ }

void g() { f("erroneous"); } "file," line 7: warning: argument does not match remembered type: arg #1

argument is incompatible with prototype: arg #nType: Error Options: allA function was called with an argument whose type cannot be converted to the type in the functionprototype declaration for the function.

struct s {int x;} q;f(void){ int g(int,int); g(3,q);

Compiler diagnostics

Messages 149

Page 157: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

}

"file," line 4: argument is incompatible with prototype: arg #2

argument mismatchType: Warning Options: allThe number of arguments passed to a macro was different from the number in the macro definition.

#define twoarg(a,b) a+bint i = twoarg(4);

"file," line 2: warning: argument mismatch

argument mismatch: n1 arg[s] passed, n2 expectedType: Warning Options: −vAt a function call, the compiler determined that the number of arguments passed to a functiondisagrees with other information it has about the function. That other information comes from twosources:

An old−style (non−prototype) function definition, or1. A function prototype declaration that has gone out of scope, but whose type information isstill remembered.

2.

This diagnostic may be incorrect if the old−style function definition case applies and the functiontakes a variable number of arguments.

extern int out_of_scope();int f(){ / function takes no args / extern int out_of_scope(int);}

int g() { f(1); / f takes no args / out_of_scope(); / out_of_scope expects one arg / }

"file," line 9: warning: argument mismatch: 1 arg passed, 0 expected "file," line 10: warning:argument mismatch: 0 args passed, 1 expected

array too bigType: Error Options: allAn array declaration has a combination of dimensions such that the declared object is too big for thetarget machine.

int bigarray[1000][1000][1000];

"file," line 1: array too big

asm() argument must be normal string literalType: Error Options: allThe argument to an old−style asm() must be a normal string literal, not a wide one.

asm(L"wide string literal not allowed");

"file," line 1: asm() argument must be normal string literal

Compiler diagnostics

Messages 150

Page 158: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

asm definition cannot have old−style parametersType: Error Options: allThe definition of an enhanced asm function may use the ANSI C function prototype notation todeclare types for parameters. It may not declare parameters by using the old−style C functiondefinition notation of an identifier list, followed by a declaration list that declares parameter types.

__asm is an extension of ANSI CType: Warning Options: −XcCode declaring an enhanced asm function was compiled with −Xc. This warning tells that theenhanced __asm is a violation of ANSI C syntax, which the compiler is obliged to diagnose, and isnot a compatible extension.

"asm" valid only for function definitionType: Warning Options: allThe asm storage class may only be used for function definitions. It is ignored here.

asm int f(void);

"file," line 1: warning: "asm" valid only for function definition

"#assert identifier (..." expectedType: Error Options: allIn a #assert directive, the token following the predicate was not the ( that was expected.

#assert system unix

"file," line 1: "#assert identifier (..." expected

"#assert identifier" expectedType: Error Options: allIn a #assert directive, the token following the directive was not the name of the predicate.

#assert 5

"file," line 1: "#assert identifier" expected

"#assert" missing ")"Type: Error Options: allIn a #assert directive, the parenthesized form of the assertion lacked a closing ).

#assert system(unix

"file," line 1: "#assert" missing ")"

assignment type mismatchType: Warning, Error Options: allThe operand types for an assignment operation are incompatible. The message is a warning when thetypes are pointer types that do not match. Otherwise the message is an error.

struct s { int x; } st;f(void){

Compiler diagnostics

Messages 151

Page 159: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

int i; char cp; const char ccp; i = st; cp = ccp;}

"file," line 6: assignment type mismatch "file," line 7: warning: assignment type mismatch

auto/register/asm inappropriate hereType: Error Options: allA declaration outside any function has storage class auto or register or a declaration within afunction has storage class asm.

auto int i;f(void){ asm int j;}

"file," line 1: auto/register/asm inappropriate here "file," line 3: auto/register/asm inappropriate here

automatic redeclares external: nameType: Warning Options: allAn automatic variable was declared name in the same block and with the same name as anothersymbol that is extern. ANSI C prohibits such declarations, but previous compilation systems ofallowed them. For compatibility with previous releases, references to name in this block will be to theautomatic.

f(void){ extern int i; int i;}

"file," line 3: warning: automatic redeclares external: i

bad file specificationType: Error Options: allThe file specifier in a #include directive was neither a string literal nor a well−formed header name.

#include stdio.h

"file," line 1: bad file specification

bad octal digit: 'digit'Type: Warning Options: −XtAn integer constant that began with 0 included the non−octal digit digit. An 8 is taken to have value8, and a 9 is taken to have value 9, even though they are invalid. In ANSI C (with the −Xa or −Xcoptions), the compiler will reject such a constant with the diagnostic invalid token.

int i = 08;

"file," line 1: warning: bad octal digit: '8'

bad #pragma pack value: n

Compiler diagnostics

Messages 152

Page 160: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

Type: Warning Options: allThe value n that was specified in a #pragma pack directive was not one of the acceptable values: 1,2, or 4. The erroneous value is ignored and the directive has no effect.

bad token in #error directive: tokenType: Error Options: allThe tokens in a #error directive must be valid C tokens. The source program contained the invalidtoken token.

#error "this is an invalid token

"file," line 1: bad token in #error directive: " "file," line 1: #error: "this is an invalid token

bad use of "#" or "##" in macro #defineType: Warning Options: allIn a macro definition, a # or ## operator was followed by a # or ## operator.

#define bug(s) # # s#define bug2(s) # ## s

"file," line 1: warning: bad use of "#" or "##" in macro #define "file," line 2: warning: bad use of "#"or "##" in macro #define

base type is really "type tag": nameType: Warning Options: −XtA type was declared with a struct, union, or enum type specifier and with tag tag, and then used witha different type specifier to declare name. type is the type specifier used for the original declaration.

With the −Xt option, the compiler treats the two types as the same for compatibility with previousreleases. In ANSI C (with the −Xa or −Xc options), the types are different.

struct s { int x,y,z; };f(void){ union s foo;}

"file," line 3: warning: base type is really "struct s": foo "file," line 3: warning: declaration introducesnew type in ANSI C: union s

bit−field size <= 0: nameType: Error Options: allThe declaration for bit−field name specifies a zero or negative number of bits.

struct s { int x:−3; };

"file," line 1: bit−field size <= 0: x

bit−field too big: nameType: Error Options: allThe declaration for bit−field name specifies more bits than will fit in an object of the declared type.

struct s { char c:20; };

Compiler diagnostics

Messages 153

Page 161: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

"file," line 1: bit−field too big: c

"break" outside loop or switchType: Error Options: allA function contains a break statement in an inappropriate place, namely outside any loop or switchstatement.

f(void){ break;}

"file," line 2: "break" outside loop or switch

call to function lacking prototype: nameType: Warning Options: −vThe program calls function _name_, which has been declared, but without any parameter information.

extern void g();void f(void) { g();}

"file", line 3: warning: call to function lacking prototype: g

cannot access member of non−struct/union objectType: Error Options: allThe structure or union member must be completely contained within the left operand of the . operator.

f(void){ struct s { int x; }; char c; c.x = 1;}

"file," line 4: warning: left operand of "." must be struct/ union object "file," line 4: cannot accessmember of non−struct/union object

cannot begin macro replacement with "##"Type: Warning Options: allThe ## operator is a binary infix operator and may not be the first token in the macro replacement listof a macro definition.

#define mac(s) ## s

"file," line 1: warning: cannot begin macro replacement with "##"

cannot concatenate wide and regular string literalsType: Warning, Error Options: allRegular string literals and string literals for wide characters may be concatenated only if they are bothregular or both wide. The compiler issues a warning if a wide string literal is followed by a regularone (and both are treated as wide); it issues an error if a regular string literal is followed by a wideone.

Compiler diagnostics

Messages 154

Page 162: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

#include <stddef.h>wchar_t wa[] = L"abc" "def";char a[] = "abc" L"def";

"file," line 2: warning: cannot concatenate wide and regular string literals "file," line 3: cannotconcatenate wide and regular string literals

cannot declare array of functions or voidType: Error Options: allAn array of functions was or an array of void. was declared.

int f[5]();

"file," line 1: cannot declare array of functions or void

cannot define "defined"Type: Warning Options: allThe predefined preprocessing operator defined may not be defined as a macro name.

#define defined xyz

"file," line 1: warning: cannot define "defined"

cannot dereference non−pointer typeType: Error Options: allThe operand of the (pointer dereference) operator must have pointer type. This diagnostic is alsoissued for an array reference to a non−array.

f(){ int i;

i = 4; i[4] = 5;}

"file," line 3: cannot dereference non−pointer type "file," line 4: cannot dereference non−pointer type

cannot do pointer arithmetic on operand of unknown sizeType: Error Options: allAn expression involves pointer arithmetic for pointers to objects whose size is unknown.

f(void){ struct s ps; g(ps+1);}

"file," line 3: cannot do pointer arithmetic on operand of unknown size

cannot end macro replacement with "#" or "##"Type: Warning Options: allA # or ## operator may not be the last token in the macro replacement list of a macro definition.

#define mac1(s) abc ## s ###define mac2(s) s #

Compiler diagnostics

Messages 155

Page 163: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

"file," line 1: warning: cannot end macro replacement with "#" or "##" "file," line 2: warning: cannotend macro replacement with "#" or "##"

cannot find include file: filenameType: Error Options: allThe file filename specified in a #include directive could not be located in any of the directories alongthe search path.

#include "where_is_it.h"

"file," line 1: cannot find include file: "where_is_it.h"

cannot have "..." in asm functionType: Warning Options: allAn enhanced asm definition may not be a function prototype definition with ellipsis notation.

cannot have void object: nameType: Error Options: allAn object may not be declared as having type void.

void v;

"file," line 1: cannot have void object: v

cannot initialize "extern" declaration: nameType: Error Options: allWithin a function, the declaration of an object with extern storage class may not have an initializer.

f(void){ extern int i = 1;}

"file," line 2: cannot initialize "extern" declaration: i

cannot initialize function: nameType: Error Options: allA name declared as a function may not have an initializer.

int f(void) = 3;

"file," line 1: cannot initialize function: f

cannot initialize parameter: nameType: Error Options: allOld−style function parameter name may not have an initializer.

int f(i)int i = 4;{}

"file," line 2: cannot initialize parameter: i

Compiler diagnostics

Messages 156

Page 164: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

cannot initialize typedef: nameType: Error Options: allA typedef may not have an initializer.

typedef int INT = 1;

"file," line 1: cannot initialize typedef: INT

cannot open file: explanationType: Fatal Options: allThe compiler was unable to open an input or output file. Usually this means the file name argumentpassed to the cc command was incorrect. explanation describes why file could not be opened.

cc glorch.c −c x.c

command line: fatal: cannot open glorch.c: No such file or directory

cannot open include file (too many open files): filenameType: Error Options: allThe compiler could not open a new include file, filename, because too many other include files arealready open. This scenario could arise if you have file1 that includes file2 that includes file3, and soon. The compiler supports at least eight levels of nesting, up to a maximum defined by the operatingsystem. The most likely reason for the diagnostic is that at some point an include file includes a filethat had already been included. For example, this could happen if file1 includes file2, which includesfile1 again.

In this example, imagine that the file i1.h contains #include "i1.h".

#include "i1.h"

"./i1.h", line 1: cannot open include file (too many open files): "i1.h"

cannot recover from previous errorsType: Error Options: allEarlier errors in the compilation have confused the compiler, and it cannot continue to process theprogram. Please correct those errors and try again.

cannot return incomplete typeType: Error Options: allWhen a function is called that returns a structure or union, the complete declaration for the structureor union must have been seen already. Otherwise this message results.

f(){ struct s g(); g();}

"file," line 3: cannot return incomplete type

cannot take address of bit−field: name

Compiler diagnostics

Messages 157

Page 165: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

Type: Error Options: allIt is not permitted to take the address of a bit−field member of a structure or union.

f(void){ struct s { int x:3, y:4; } st; int ip = &st.y ;}

"file," line 3: cannot take address of bit−field: y

cannot take address of register: nameType: Warning, Error Options: allWhen name is declared with register storage class, it is not permissible to take the address of name,even if the compiler actually allocates the object to a register. The attempt to take an object's addressmay have been implicit, such as when an array is dereferenced. The diagnostic is an error if a registerwas allocated for the object and a warning otherwise.

f(void){ register int i; register int ia[5]; int ip = &i ; ia[2] = 1;}

"file," line 4: cannot take address of register: i "file," line 5: warning: cannot take address of register:ia

cannot take sizeof bit−field: nameType: Warning Options: allThe sizeof operator may not be applied to bit−fields.

struct s { int x:3; } st;int i = sizeof(st.x);

"file," line 2: warning: cannot take sizeof bit−field: x

cannot take sizeof function: nameType: Error Options: allThe sizeof operator may not be applied to functions.

int f(void);int i = sizeof(f);

"file," line 2: cannot take sizeof function: f

cannot take sizeof voidType: Error Options: allThe sizeof operator may not be applied to type void.

void v(void);int i = sizeof(v());

"file," line 2: cannot take sizeof void

Compiler diagnostics

Messages 158

Page 166: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

cannot undefine "defined"Type: Error Options: allThe predefined preprocessing operator defined may not be undefined.

#undef defined

"file," line 1: warning: cannot undefine "defined"

can't take address of object of type voidType: Warning Options: −XcStrict ANSI conformance requires a diagnostic when taking the address of an object of type void.

extern void_end;

foo () { foo (&_end) ; }

"file," line 4: warning: can't take address of object of type void

case label affected by conversion: valueType: Warning Options: −vThe value for the case label cannot be represented by the type of the controlling expression of aswitch statement. If the type of the case expression and the type of the controlling expression have thesame size, the actual bit representation of the case expression is unchanged, but its interpretation isdifferent. For example, the controlling expression may have type int and the case expression mayhave type unsigned int. In the diagnostic, value is represented as a hexadecimal value if the caseexpression is unsigned, decimal if it is signed.

f(){ int i;

switch( i ){ case 0xffffffffu: ; } }

"file," line 5: warning: case label affected by conversion: 0xffffffff In this example 0xffffffffu isnot representable as an int. When the case expression is converted to the type of the controllingexpression (int), its effective value is −1, which means that the case will be reached if i has the value−1, rather than 0xffffffff.

"case" outside switchType: Error Options: allA case statement occurred outside the scope of any switch statement.

f(void){ case 4: ;}

"file," line 2: "case" outside switch

character constant too longType: Warning Options: allThe character constant contains too many characters to fit in an integer. Only the first four charactersof a regular character constant, and only the first character of a wide character constant, are used.

Compiler diagnostics

Messages 159

Page 167: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

(Character constants that are longer than one character are non−portable.)

int i = 'abcde';

"file," line 1: warning: character constant too long

character escape does not fit in characterType: Warning Options: allA hexadecimal or octal escape sequence in a character constant or string literal produces a value thatis too big to fit in an unsigned char. The value is truncated to fit.

char p = "\x1ff\400";

"file," line 1: warning: \x is ANSI C hex escape "file," line 1: warning: character escape does not fit incharacter "file," line 1: warning: character escape does not fit in character

character escape does not fit in wide characterType: Warning Options: allThis message diagnoses a condition similar to the previous one, except the character constant or stringliteral is prefixed by L to designate a wide character constant or string literal.The character escape is too large to fit in an object of type wchar_t and is truncated to fit.

comment does not concatenate tokensType: Warning Options: −Xa, −XcIn previous releases, it was possible to ``paste'' two tokens together by juxtaposing them in a macrowith a comment between them. This behavior was never defined or guaranteed. ANSI C provides awell−defined operator, ##, that serves the same purpose and should be used. This diagnostic warnsthat the old behavior is not being provided.

#define PASTE(a,b) a/ GLUE /bint PASTE(prefix,suffix) = 1; / does not create / / prefixsuffix /

"file," line 1: warning: comment does not concatenate tokens "file," line 2: syntax error, probablymissing ",", ";" or "=" "file," line 2: syntax error before or at: suffix "file," line 2: warning: declarationmissing specifiers: assuming "int"

comment is replaced by "##"Type: Warning Options: −XtThis message is closely related to comment does not concatenate tokens. Thediagnostic tells that the compiler is treating an apparent concatenation as if it were the ## operator.The source code should be updated to use the new operator.

#define PASTE(a,b) a/ GLUE /bint PASTE(prefix,suffix) = 1; / creates prefixsuffix /

"file," line 1: warning: comment is replaced by "##"

const object should have initializer: nameType: Warning Options: −vA const object cannot be modified. If an initial value is not supplied, the object will have a value ofzero, or for automatics its value will be indeterminate.

Compiler diagnostics

Messages 160

Page 168: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

const int i;

"file," line 1: warning: const object should have initializer: i

constant value too big: constantType: Warning Options: allAn operand in a #if or #elif directive has a value greater than ULLONG_MAX (one less than 2^64 or18446744073709551616).

#if 18446744073709551617 > 0int a;#endif

"file", line 1: warning: constant value too big: 18446744073709551617

"continue" outside loopType: Error Options: allThe program contains a continue statement outside the scope of any loop.

f(void){ continue;}

"file," line 2: "continue" outside loop

controlling expressions must have scalar typeType: Error Options: allThe expression for an if, for, while, or do−while must be an integral, floating−point, or pointer type.

f(void){ struct s {int x;} st; while (st) {}}

"file," line 3: controlling expressions must have scalar type

conversion of double to float is out of rangeType: Warning, Error Options: allA double expression has too large a value to fit in a float. The diagnostic is a warning if theexpression is in executable code and an error otherwise.

float f = 1e30 1e30;

"file," line 1: conversion of double to float is out of range

conversion of double to integral is out of rangeType: Warning, Error Options: allA double constant has too large a value to fit in an integral type. The diagnostic is a warning if theexpression is in executable code and an error otherwise.

int i = 1e100;

"file," line 1: conversion of double to integral is out of range

Compiler diagnostics

Messages 161

Page 169: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

conversion of floating−point constant to type out of rangeType: Error Options: allA floating−point constant has too large a value to fit in type type (float, double, long double).

float f = 1e300f;

"file," line 1: conversion of floating−point constant to float out of range

decimal constant promoted to long long: nType: Warning Options: allBy including long long and unsigned long long types, decimal constants in the range[LONG_MAX+1,ULONG_MAX] will now have type long long instead of unsigned long. Integerconstants with other bases and decimal constants with a u or U suffix are not affected.

extern void g();void f(void) { g(2147483648); /* 2^31 */}

"file", line 2: warning: decimal constant promoted to long long: 2147483648

declaration hides parameter: nameType: Warning Options: allAn identifier name was declared with the same name as one of the parameters of the function.References to name in this block will be to the new declaration.

int f(int i,int INT){ int i; typedef int INT;}

"file," line 2: warning: declaration hides parameter: i "file," line 3: warning: declaration hidesparameter: INT

declaration introduces new type in ANSI C: type tagType: Warning Options: −Xtstruct, union, or enum tag has been redeclared in an inner scope. In previous releases, this tag wastaken to refer to the previous declaration of tag. In ANSI C, the declaration introduces a new type.When the −Xt option is selected, the earlier behavior is reproduced.

struct s1 { int x; };f(void){ struct s1; struct s2 { struct s1 ps1; }; / s1 refers to line 1 / struct s1 { struct s2 ps2; };}

"file," line 3: warning: declaration introduces new type in ANSI C: struct s1

declaration missing specifiers; add "int"Type: Warning Options: allObjects and functions that are declared at file scope must have a storage class or type specifier. If both

Compiler diagnostics

Messages 162

Page 170: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

are omitted the following warning is displayed:

i;f(void);

"file," line 1: warning: declaration missing specifiers; assuming "int" "file," line 2: warning:declaration missing specifiers; assuming "int"

"default" outside switchType: Error Options: allA default label appears outside the scope of a switch statement.

f(void){default: ;}

"file," line 2: "default" outside switch

#define requires macro nameType: Error Options: allA #define directive must be followed by the name of the macro to be defined.

#define +3

"file," line 1: #define requires macro name

digit sequence expected after "#line"Type: Error Options: allThe compiler expected to find the digit sequence that comprises a line number after #line, but thetoken it found there is either an inappropriate token or a digit sequence whose value is zero.

#line 09a

"file," line 1: digit sequence expected after "#line"

directive is an upward−compatible ANSI C extensionType: Warning Options: −XcThis diagnostic is issued when the C compiler sees a directive that it supports, but that is not part ofthe ANSI C standard, and −Xc has been selected.

#assert system( unix )

"file," line 1: warning: directive is an upward−compatible ANSI C extension

directive not honored in macro argument listType: Warning, Error Options: allA directive has appeared between the ( )'s that delimit the arguments of a function−like macroinvocation. The following directives are disallowed in such a context: #ident, #include, #line,#undef. The diagnostic is a warning if it appears within a false group of an if−group, and an errorotherwise.

#define flm(a) a+4int i = flm(

Compiler diagnostics

Messages 163

Page 171: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

#ifdef flm / allowed / #undef flm / disallowed: error / 4#else / allowed / #undef flm / disallowed: warn / 6#endif / allowed /);

"file," line 4: directive not honored in macro argument list "file," line 7: warning: directive nothonored in macro argument list

division by 0Type: Warning, Error Options: allAn expression contains a division by zero that was detected at compile−time. If the division is part ofa #if or #elif directive, the result is taken to be zero.

The diagnostic is a warning if the division is in executable code, an error otherwise.

f(void) { int i = 1/0;}

"file," line 2: warning: division by 0

dubious type declaration; use tag only: tagType: Warning Options: allA new struct, union, or enum type with tag tag was declared within a function prototype declarationor the parameter declaration list of an old−style function definition, and the declaration includes adeclarator list for type. Calls to the function would always produce a type mismatch, because the tagdeclaration goes out of scope at the end of the function prototype declaration or definition, accordingto ANSI C's scope rules. It is not possible to declare an object of that type outside the function.

This can be fixed by declaring the struct, union, or enum ahead of the function prototype or functiondefinition and then referring to it just by its tag.

int f(struct s {int x;} st){}

"file," line 1: warning: dubious struct declaration; use tag only: s

Rewrite this as

struct s {int x;};int f(struct s st){}

dubious escape: \cType: Warning Options: allOnly certain characters may follow \ in string literals and character constants; c was not one of them.The \ is ignored.

int i = '\q';

Compiler diagnostics

Messages 164

Page 172: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

"file," line 1: warning: dubious escape: \q

dubious escape: \<hex value>Type: Warning Options: allThis message diagnoses the same condition as the preceding one, but the character that follows \ inthe program is a non−printing character. The hex value between the brackets in the diagnostic is thecharacter's code, printed as a hexadecimal number.

dubious reference to type typedef: typedefType: Warning Options: allThis message is similar to dubious tag in function prototype: type tag. A functionprototype declaration refers to a type struct, union, or enum typedef with name typedef. Because thestruct, union, or enum has been declared within a function, it could not be in scope when thefunction whose prototype is being declared was defined. The prototype declaration and functiondefinition thus could never match.

f(){ struct s { int x; }; typedef struct s ST; extern int g(ST, struct s);}

"file," line 4: warning: dubious reference to struct typedef: ST "file," line 4: warning: dubious tag infunction prototype: struct s

dubious static function at block levelType: Warning Options: −XcA function was declared with storage class static at block scope. The ANSI C standard says that thebehavior is undefined if a function is declared at block scope with an explicit storage class other thanextern. Although functions may be declared this way, other implementations may not recognize them,or may attach a different meaning to such a declaration.

voidf(void){ static void g(void);}

"file," line 3: warning: dubious static function at block level

dubious tag declaration: type tagType: Warning Options: allA new struct, union, or enum type with tag tag was declared within a function prototype declarationor the parameter declaration list of an old−style function definition. Calls to the function wouldalways produce a type mismatch, because the tag declaration goes out of scope at the end of thefunction declaration or definition, according to ANSI C's scope rules. It is not possible to declare anobject of that type outside the function.

int f(struct s );

"file," line 1: warning: dubious tag declaration: struct s

dubious tag in function prototype: type tag

Compiler diagnostics

Messages 165

Page 173: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

Type: Warning Options: allThis message is similar to the previous one. A function prototype declaration refers to a struct,union, or enum type with tag tag. The tag has been declared within a function. Therefore it could notbe in scope when the function whose prototype is being declared was defined. The prototypedeclaration and function definition thus could never match.

f(){ struct s {int x;}; int g(struct s );}

"file," line 3: warning: dubious tag in function prototype: struct s

duplicate case in switch: valueType: Error Options: allThere are two case statements in the current switch statement that have the same constant valuevalue.

f(void){ int i = 5; switch(i) { case 4: case 4: break; }}

"file," line 5: duplicate case in switch: 4

duplicate case value value; first on line line

Replaces "duplicate case in switch: value". The existing test case now produces

line 5: duplicate case value 4; first on line 4

duplicate "default" in switchType: Error Options: allThere are two default labels in the current switch statement.

f(void){ int i = 5; switch(i) { default: default: break; }}

"file," line 5: duplicate "default" in switch

duplicate formal parameter: nameType: Warning Options: allIn a function−like macro definition, name was used more than once as a formal parameter.

#define add3(a,a,c) a + b + c

Compiler diagnostics

Messages 166

Page 174: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

"file," line 1: warning: duplicate formal parameter: a

duplicate member name: memberType: Error Options: allA struct or union declaration uses the name member for more than one member.

union u { int i; float i;};

"file," line 3: duplicate member name: i

duplicate name in % line specification: nameType: Error Options: allFormal parameter name was mentioned more than once in the % line of an enhanced asm function.

#elif follows #elseType: Warning Options: allA preprocessing if−section must be in the order #if, optional #elif's, followed by optional #else and#endif. The code contains a #elif after the #else directive.

#if defined(ONE) int i = 1;#elif defined(TWO) int i = 2;#else int i = 3;#elif defined(FOUR) int i = 4;#endif

"file," line 7: warning: #elif follows #else

#elif has no preceding #ifType: Error Options: allAn #elif directive must be part of a preprocessing if−section, which begins with a #if directive. Thecode in question lacked the #if.

#elif defined(TWO) int i = 2;#endif

"file," line 1: #elif has no preceding #if "file," line 3: #if−less #endif

#elif must be followed by a constant expressionType: Error Options: allThere was no expression following the #elif directive.

#if defined(ONE) int i = 1;#elif int i = 4;#endif

Compiler diagnostics

Messages 167

Page 175: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

"file," line 3: warning: #elif must be followed by a constant expression

#else has no preceding #ifType: Error Options: allAn #else directive was encountered that was not part of a preprocessing if−section.

#else int i =7;#endif

"file," line 1: #else has no preceding #if "file," line 3: #if−less #endif

embedded NUL not permitted in asm()Type: Error Options: allThe string literal that appears in an old−style asm() contains an embedded NUL character (charactercode 0).

asm("this is an old−style asm with embedded NUL: \0");

"file," line 1: embedded NUL not permitted in asm()

empty #assert directiveType: Error Options: allA #assert directive contained no predicate name to assert.

#assert

"file," line 1: empty #assert directive

empty character constantType: Error Options: allThe program has a character constant without any characters in it.

int i = '';

"file," line 1: empty character constant

empty constant expression after macro expansionType: Error Options: allA #if or #elif directive contained an expression that, after macro expansion, consisted of no tokens.

#define EMPTY#if EMPTY char mesg = "EMPTY is non−empty";#endif

"file," line 2: empty constant expression after macro expansion

empty #define directive lineType: Error Options: allA #define directive lacked both the name of the macro to define and any other tokens.

#define

Compiler diagnostics

Messages 168

Page 176: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

"file," line 1: empty #define directive line

empty file nameType: Error Options: allThe file name in a #include directive is null.

#include <>

"file," line 1: empty file name

empty header nameType: Error Options: allThis diagnostic is similar to the preceding one, but the null file name arises after macro substitution.

#define NULLNAME <>#include NULLNAME

"file," line 2: empty header name

empty predicate argumentType: Error Options: allThe compiler expects to find tokens between the ( )'s that delimit a predicate's assertions in a#unassert directive. None were present.

#unassert machine()

"file," line 1: empty predicate argument

empty translation unitType: Warning Options: allThe source file has no tokens in it after preprocessing is complete. The ANSI C standard requires thecompiler to diagnose a file that has no tokens in it.

#ifdef COMPILE int token;#endif

"file," line 5: warning: empty translation unit

empty #unassert directiveType: Error Options: allA #unassert contained no predicate name to discard.

#unassert

"file," line 1: empty #unassert directive

empty #undef directive, identifier expectedType: Error Options: allA #undef directive lacked the name of a macro to ``undefine.''

#undef

Compiler diagnostics

Messages 169

Page 177: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

"file," line 1: empty #undef directive, identifier expected

{}−enclosed initializer requiredType: Warning Options: allWhen initializing an aggregate, except in the case of initializing a character array with a string literalor an automatic structure with an expression, the initializes must be enclosed in { }'s.

int ia[5] = 1;f(void){ struct s { int x,y; } st = 1;}

"file," line 1: warning: {}−enclosed initializer required "file," line 3: warning: {}−enclosed initializerrequired "file," line 3: struct/union−valued initializer required

end−of−loop code not reachedType: Warning Options: allA loop was written in such a way that the code at the end of the loop that the compiler generates tobranch back to the beginning of the loop is not reachable and will never be executed.

f(void){ int i = 1; while (i) { return 4; }}

"file," line 5: warning: end−of−loop code not reached

enum constants have different types: op "operator"Type: Warning Options: −vA relational operator operator was used to compare enumeration constants from two differentenumeration types. This may indicate a programming error. Note also that the sense of thecomparison is known at compile time, because the constants' values are known.

enum e1 { ec11, ec12 } ev1;enum e2 { ec21, ec22 } ev2;void v(void){ if (ec11 > ec22) ;}

"file," line 4: warning: enum constants have different types: op ">"

enum type mismatch: arg #nType: Warning Options: −vThe program is passing an enumeration constant or object to a function for which a prototypedeclaration is in scope. The passed argument is of a different enumerated type from the one in thefunction prototype, which may indicate a programming error.

enum e1 { ec11 } ev1;enum e2 { ec21 } ev2;void ef(enum e1);

Compiler diagnostics

Messages 170

Page 178: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

void v(void){ ef(ec21); }

"file," line 6: warning: enum type mismatch: arg #1

enum type mismatch: op "operator"Type: Warning Options: −vThis message is like the previous one. One of the operands of operator is an enumeration object orconstant, and the other is an enumeration object or constant from a different enumerated type.

enum e1 { ec11, ec12 } ev1;enum e2 { ec21, ec22 } ev2;void v(void){ if (ev1 > ec22) ;}

"file," line 4: warning: enum type mismatch: op ">"

enumeration constant hides parameter: nameType: Warning Options: allA declaration of an enumerated type within a function includes an enumeration constant with thesame name as parameter name. The enumeration constant hides the parameter.

intf(int i){ enum e { l, k, j, i };}

"file," line 3: warning: enumeration constant hides parameter: i

enumerator used in its own initializer: nameType: Warning Options: allWhen setting the value of enumerator name name was used in the expression. ANSI C's scope rulestake name in the expression to be whatever symbol was in scope at the time.

int i;f(void){ enum e { i = i+1, j, k }; / uses global i in i+1 /}

"file," line 3: warning: enumerator used in its own initializer: i "file," line 3: integral constantexpression expected

enumerator value overflows INT_MAX (2147483647)Type: Warning Options: allThe value for an enumeration constant overflowed the maximum integer value.

enum e { e1=2147483647, e2 }; / overflow for e2 /

"file," line 1: warning: enumerator value overflows INT_MAX (2147483647)

#error: tokensType: Error Options: all

Compiler diagnostics

Messages 171

Page 179: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

A #error directive was encountered in the source file. The other tokens in the directive are printed aspart of the message.

#define ONE 2#if ONE != 1#error ONE != 1#endif

"file," line 3: #error: ONE != 1

%error encountered in asm functionType: Error Options: allA %error specification line was encountered while an enhanced asm was being expanded.

error in asm; expect ";" or "\n", saw 'c'Type: Error Options: allIn a % line of an enhanced asm function, the compiler expected to read a semi−colon or new−lineand found character c instead.

error writing output fileType: Error Options: allAn output error occurred while the compiler attempted to write its output file or a temporary file. Themost likely problem is that a file system is out of space.

")" expectedType: Error Options: allIn an #unassert directive, the assertion of a predicate to be dropped must be enclosed in ( ).

#unassert system(unix

"file," line 1: ")" expected

"(" expected after "# identifier"Type: Error Options: allWhen the # operator is used in a #if or #elif directive to select a predicate instead of a like−namedmacro, the predicate must be followed by a parenthesized list of tokens.

#assert system(unix)#define system "unix"#if #system char systype = system;#endif

"file," line 3: "(" expected after "# identifier"

"(" expected after first identifierType: Error Options: allIn an #unassert directive, the assertion of a predicate to be dropped must be enclosed in ( ).

#unassert system unix

"file," line 1: "(" expected after first identifier

Compiler diagnostics

Messages 172

Page 180: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

extern and prior uses redeclared as static: nameType: Warning Options: −Xc, −vname was declared at file scope as an extern, then later the same object or function was declared asstatic. ANSI C rules require that the first declaration of an object or function give its actual storageclass. The compilation system accepts the declaration and treats the object or function as if the firstdeclaration had been static.

extern int i;static int i;

"file," line 2: warning: extern and prior uses redeclared as static: i

first operand must have scalar type: op "?:"Type: Error Options: allThe conditional expression in a ? : expression must have scalar (integral, floating−point, or pointer)type.

struct s { int x; } st;f(void){ int i = st ? 3 : 4;}

"file," line 3: first operand must have scalar type: op "?:"

floating−point constant calculation out of range: op "operator"Type: Warning, Error Options: allThe compiler detected an overflow at compile time when it attempted the operator operation betweentwo floating−point operands. The diagnostic is a warning if the expression is in executable code andan error otherwise.

double d1 = 1e300 1e300;

"file," line 1: floating−point constant calculation out of range: op ""

floating−point constant folding causes exceptionType: Error Options: allThis message is like the previous one, except that the operation caused a floating−point exception thatcauses the compiler to exit.

formal parameter lacks name: param #nType: Error Options: allIn a function prototype definition, a name was not provided for the n−th parameter.

int f(int){}

"file," line 1: formal parameter lacks name: param #1

function cannot return function or arrayType: Error Options: allA function was declared whose return type would be a function or array, rather than a pointer to one

Compiler diagnostics

Messages 173

Page 181: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

of those.

int f(void)[]; / function returning array of ints /

"file," line 1: function cannot return function or array

function designator is not of function typeType: Error Options: allAn expression was used in a function call as if it were the name of a function or a pointer to a functionwhen it was not.

f(void){ char p; p();}

"file," line 3: function designator is not of function type

function expects to return value: nameType: Warning Options: −vThe current function was declared with a non−void type, but contained a return statement with noreturn value expression.

f(void){ return;}

"file," line 2: warning: function expects to return value: f

Function illegally defined in hosted mode: function_nameType: Warning Options: −KhostThe function function_name, having the same name as an ANSI function, was defined with externallinkage.

int abs();main(){ int i = abs(7);}abs(i){ printf("fooled ya\n");}

"file", line 7: warning: Function illegally defined in hosted mode: abs

function prototype parameters must have typesType: Warning Options: allA function prototype declaration cannot contain an identifier list; it must declare types. The identifierlist is ignored.

int f(i);

"file," line 1: warning: function prototype parameters must have types

Compiler diagnostics

Messages 174

Page 182: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

identifier expected after "#"Type: Error Options: allThe compiler expected to find an identifier, a predicate name, after a # in a conditional compilationdirective, and none was there.

#if #system(unix) || # char os = "sys";#endif

"file," line 1: identifier expected after "#"

identifier expected after #undefType: Error Options: allA #undef must be followed by the name of the macro to be undefined. The token following thedirective was not an identifier.

#undef 4

"file," line 1: identifier expected after #undef

identifier or "−" expected after −AType: Error Options: allThe cc command line argument −A must be followed by the name of a predicate to assert, or by a −,to eliminate all predefined macros and predicates. The token following −A was neither of these.

cc −A3b2 −c x.ccommand line: identifier or "−" expected after −A

identifier or digit sequence expected after "#"Type: Error Options: allAn invalid token or non−decimal number follows the # that introduces a preprocessor directive line.

# 0x12

"file," line 1: identifier or digit sequence expected after "#"

identifier redeclared: nameType: Warning, Error Options: allThe identifier name was declared in a way that is inconsistent with a previous appearance of name, orname was declared twice in the same scope.

Previous releases were forgiving of inconsistent redeclarations if the types were ``nearly'' the same.ANSI C considers the types to be different. The −Xt option will allow you to retain the previousbehavior, although the compiler will issue a warning. When the types are manifestly different, thediagnostic is always an error. The −Xa and −Xc options always produce an error when the types aredifferent.

int x;long x;int y;double y;

Compiler diagnostics

Messages 175

Page 183: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

"file," line 2: warning: identifier redeclared: x "file," line 4: identifier redeclared: y

Declarations of functions with and without argument information can often lead to confusingdiagnostics. The following example illustrates.

int f(char);int f();

"file," line 2: warning: identifier redeclared: f

According to ANSI C's type compatibility rules, a function declaration that lacks type information(i.e., one that is not a function prototype declaration) is compatible with a function prototype onlywhen each parameter type is unchanged by the default argument promotion rules. In the example,char would be affected by the promotion rules (it would be promoted to int). Therefore the twodeclarations have incompatible types.

identifier redeclared; ANSI C requires "static": nameType: Warning Options: allname was declared twice at file scope. The first one used storage class static, but the second onespecified no storage class. ANSI C's rules for storage classes require that all redeclarations of nameafter the first must specify static.

static int i;int i;

"file," line 2: warning: identifier redeclared; ANSI C requires "static": i

identifier redefined: nameType: Error Options: allname was defined more than once. An object with an initializer was declared more than once, or afunction was defined more than once.

int i = 1;int i = 1;

"file," line 2: identifier redefined: i

#if must be followed by a constant expressionType: Warning Options: allNo expression appeared after a #if directive.

#if int i = 4;#endif

"file," line 1: warning: #if must be followed by a constant expression

#if on line n has no #endifType: Error Options: allThe compiler reached end of file without finding the #endif that would end the preprocessingif−section that began with the if directive that was on line n. The if directive is one of #if, #ifdef, or#ifndef.

Compiler diagnostics

Messages 176

Page 184: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

#ifdef NOENDIF int i = 1;

"file," line 5: #ifdef on line 1 has no matching #endif "file," line 5: warning: empty translation unit

#if−less #endifType: Error Options: allAn #endif directive was encountered that was not part of a preprocessing if−section.

int i = 1;#endif

"file," line 2: #if−less #endif

#ifdef must be followed by an identifierType: Warning Options: allA #ifdef preprocessing directive must be followed by the name of the macro to check for beingdefined. The source code omitted the identifier. The #ifdef is treated as if it were false.

#ifdef int i = 1;#endif

"file," line 1: warning: #ifdef must be followed by an identifier

#ifndef must be followed by an identifierType: Warning Options: allThe #ifndef directive must be followed by the identifier that is to be tested for having been defined.

#ifndef int i = 5;#endif

"file," line 1: warning: #ifndef must be followed by an identifier

ignoring malformed #pragma int_to_unsigned symbolType: Warning Options: allThe compiler encountered a #pragma int_to_unsigned directive that did not have the form shown.The erroneous directive is ignored.

#pragma int_to_unsigned strlen();

"file," line 1: warning: ignoring malformed #pragma int_to_unsigned symbol

ignoring malformed #pragma pack(n)Type: Warning Options: allThe compiler encountered a #pragma pack directive that did not have the form shown. Theerroneous directive is ignored.

ignoring malformed #pragma weak symbol [=value]Type: Warning Options: allThe compiler encountered a #pragma weak directive that did not have the form shown. Theerroneous directive is ignored.

Compiler diagnostics

Messages 177

Page 185: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

#pragma weak write,_write

"file," line 1: warning: ignoring malformed #pragma weak symbol [=value]

implicitly declaring function to return int: name()Type: Warning Options: −vThe program calls function name, which has not been previously declared. The compiler warns that itis assuming that function name returns int.

void v(void){ g();}

"file," line 2: warning: implicitly declaring function to return int: g()

improper cast of void expressionType: Error Options: allA void expression cannot be cast to something other than void.

f(void){ void v(void); int i = (int) v();}

"file," line 3: improper cast of void expression

improper member use: nameType: Warning, Error Options: allThe program contains an expression with a −> or . operator, and name is not a member of thestructure or union that the left side of the operator refers to, but it is a member of some other structureor union.

This diagnostic is an error if the member is not ``unique.'' A unique member is part of one or morestructures or unions but has the same type and offset in all of them.

struct s1 { int x,y; };struct s2 { int q,r; };f(void){ struct s1 ps1; ps1−>r = 3;}

"file," line 5: warning: improper member use: r

improper pointer subtractionType: Warning, Error Options: allThe operands of a subtraction are both pointers, but they point at different types. Only pointers of thesame type that point to the same array may be subtracted.

The diagnostic is a warning if the pointers point to objects of the same size, and an error otherwise.

f(void){ int ip; char cp;

Compiler diagnostics

Messages 178

Page 186: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

int i = ip − cp;}

"file," line 4: improper pointer subtraction

improper pointer/integer combination: arg #nType: Warning Options: allAt a function call for which there is a function prototype declaration in scope, the code is passing aninteger where a pointer is expected, or vice versa.

int f(char );g(void){ f(5);}

"file," line 3: warning: improper pointer/integer combination: arg #1

improper pointer/integer combination: op operatorType: Warning Options: allOne of the operands of operator is a pointer and the other is an integer, but this combination isinvalid.

f(void){ int i = "abc"; int j = i ? 4 : "def";}

"file," line 2: warning: improper pointer/integer combination: op "=" "file," line 3: warning: improperpointer/integer combination: op ":" "file," line 3: warning: improper pointer/integer combination: op"="

inappropriate qualifiers with "void"Type: Warning Options: allWhen void stands by itself, it may not be qualified with const or volatile.

int f(const void);

"file," line 1: warning: inappropriate qualifiers with "void"

#include <... missing '>'Type: Warning Options: allIn a #include directive for which the header name began with <, the closing > character was omitted.

#include <stdio.h

"file," line 1: warning: #include <... missing '>'

#include directive missing file nameType: Error Options: allA #include directive did not specify a file to include.

#include

Compiler diagnostics

Messages 179

Page 187: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

"file," line 1: #include directive missing file name

#include of /usr/include/... may be non−portableType: Warning Options: allThe source file included a file with the explicit prefix /usr/include. Such an inclusion isimplementation−dependent and non−portable. On some systems the list of default places to look for aheader might not include the /usr/include directory. In such a case the wrong file might be included.

#include </usr/include/stdio.h>

"file," line 1: warning: #include of /usr/include/... may be non−portable

incomplete #define macro parameter listType: Error Options: allIn the definition of a function−like parameter, the compiler did not find a ) character on the same(logical) line as the #define directive.

#define mac(a

"file," line 1: incomplete #define macro parameter list

incomplete struct/union/enum tag: nameType: Error Options: allAn object name, with struct, union, or enum type and tag tag, was declared but the type isincomplete.

struct s st;

"file," line 1: incomplete struct/union/enum s: st

inconsistent redeclaration of extern: nameType: Warning Options: allA function or object was redeclared name with storage class extern for which there was a previousdeclaration that has since gone out of scope. The second declaration has a type that conflicts with thefirst.

f(void){ int p = (int ) malloc(5 sizeof(int));}g(void){ void malloc();}

"file," line 5: warning: inconsistent redeclaration of extern: malloc

inconsistent redeclaration of static: nameType: Warning Options: allAn object or function that was originally declared with storage class static was redeclared. Thesecond declaration has a type that conflicts with the first.

The two most frequent conditions under which this diagnostic may be issued are:

Compiler diagnostics

Messages 180

Page 188: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

A function was originally declared at other than file scope and with storage class static. Thesubsequent declaration of the function has a type that conflicts with the first.

1.

A function or object was originally declared at file scope and with storage class static. Asubsequent declaration of the same object or function at other than file scope used storageclass extern (or possibly no storage class, if a function), and there was an intervening,unrelated, declaration of the same name.

2.

f(void){ static int myfunc(void);}g(void){ static char myfunc(void);}

"file," line 5: warning: inconsistent redeclaration of static: myfunc

static int x;f(void){ int x; / unrelated / { extern float x; / related to first declaration / }}"file," line 5: warning: inconsistent redeclaration of static: x

inconsistent storage class for function: nameType: Warning Options: allANSI C requires that the first declaration of a function or object at file scope establish its storageclass. Function name was redeclared in an inconsistent way according to these rules.

g(void){ int f(void); static int f(void);}

"file," line 3: warning: inconsistent storage class for function: f

initialization type mismatchType: Warning Options: allThe type of an initializer value is incompatible with the type of the object being initialized. Thisspecific message usually applies to pointers.

int a;unsigned int pa = &a ;

"file," line 2: warning: initialization type mismatch

initializer does not fit: valueType: Warning Options: allThe value value does not fit in the space provided for it. If it were fetched from that space, it wouldnot reproduce the same value as was put in. In the message, value is represented as a hexadecimalvalue if the initializer is unsigned, decimal if it is signed.

struct s {signed int m1:3; unsigned int m2:3;} st = {4, 5};

Compiler diagnostics

Messages 181

Page 189: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

unsigned char uc = 300u;

"file," line 1: warning: initializer does not fit: 4 "file," line 2: warning: initializer does not fit: 0x12c

integer overflow detected: op "operator"Type: Warning Options: allThe compiler attempted to compute the result of an operator expression at compile−time, anddetermined that the result would overflow. The low−order 32 bits of the result are retained, and thecompiler issues this diagnostic.

int i = 1000000 1000000;

"file," line 1: warning: integer overflow detected: op ""

integral constant expression expectedType: Error Options: allThe compiler expected (required) an integral constant or an expression that can be evaluated atcompile time to yield an integral value. The expression written contained either a non−integral value,a reference to an object, or an operator that cannot be evaluated at compile time.

int ia[5.0];

"file," line 1: integral constant expression expected

integral constant too largeType: Warning Options: allAn integral constant is too large to fit in an unsigned long.

int i = 1234567890123;

"file," line 1: warning: integral constant too large

internal compiler error: messageType: Error Options: allThis message does not diagnose a user programming error (usually), but rather a problem with thecompiler itself. One of the compiler's internal consistency checks has failed.

interpreted as a #line directiveType: Warning Options: −XcA source line was encountered that had a number where the directive name usually goes. Such a lineis reserved for the compiler's internal use, but it must be diagnosed in the −Xc (strictly conforming)mode.

# 9

"file," line 1: warning: interpreted as a #line directive "file," line 1: warning: directive is anupward−compatible ANSI C extension

invalid cast expressionType: Error Options: allA cast cannot be applied to the expression because the types are unsuitable for casting. Both the typeof the expression being cast and the type of the cast must be scalar types. A pointer may only be cast

Compiler diagnostics

Messages 182

Page 190: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

to or from an integral type.

f(void){ struct s {int x;} st; int i = (int) st;}

"file," line 3: invalid cast expression

invalid class in asm % line: classType: Error Options: allThe storage class class that the compiler encountered in an enhanced asm % line is not one of theacceptable classes.

invalid compiler control line in ".i" fileType: Error Options: allA .i file, the result of a cc −P command, is assumed to be a reserved communication channel betweenthe preprocessing phase and the compilation phase of the compiler. The .i file lets you examine thatintermediate form to detect errors that may otherwise be hard to detect. However, the compilerexpects to find only a few directives that are used for internal communication. The source file thatwas compiled (a .i file) contained a preprocessing directive other than one of the special directives.

invalid directiveType: Error Options: allThe identifier that follows a # in a preprocessing directive line was one that the compiler did notrecognize.

# unknown

"file," line 1: invalid directive

invalid initializerType: Error Options: allThe program contains an initializer for an extern or static that attempts to store a pointer in a smallerthan pointer−sized object. Such initializations are not supported.

int j;char c = (char) &j ;

"file," line 2: invalid initializer

invalid multibyte characterType: Error Options: allA multibyte character in a string literal or character constant could not be converted to a single widecharacter in the host environment.

invalid source character: 'c'Type: Error Options: allThe compiler encountered a character (c) in the source program that is not a valid ANSI C token.

int i = 1$;

Compiler diagnostics

Messages 183

Page 191: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

"file," line 1: invalid source character: '$'

invalid source character: <hex value>Type: Error Options: allThis message diagnoses the same condition as the previous one, but the invalid character is notprintable. The hex value between the brackets in the diagnostic is the hexadecimal value of thecharacter code.

invalid switch expression typeType: Error Options: allThe controlling expression of a switch statement could not be converted to int. This message alwaysfollows switch expression must have integral type.

f(){ struct s {int x;} sx; switch(sx){ case 4: ; }}

"file," line 3: switch expression must have integral type "file," line 3: invalid switch expression type

invalid token: non−tokenType: Error Options: allThe compiler encountered a sequence of characters that does not comprise a valid token. An invalidtoken may result from the preprocessing ## operator. The offending non−token is shown in thediagnostic. If the non−token is longer than 20 characters, the first 20 are printed, followed by ``...''.The offending invalid token is ignored.

#define PASTE(l,r) l ## rdouble d1 = 1e;double d2 = PASTE(1,e);int i = 1veryverylongnontoken;

"file," line 2: invalid token: 1e "file," line 2: syntax error before or at: ; "file," line 2: warning: syntaxerror: empty declaration "file," line 3: invalid token: 1e "file," line 3: syntax error before or at: ; "file,"line 3: warning: syntax error: empty declaration "file," line 4: invalid token: 1veryverylongnontoke..."file," line 4: syntax error before or at: ; "file," line 4: warning: syntax error: empty declaration

invalid token in #define macro parameters: tokenType: Error Options: allThe compiler encountered an inappropriate token while processing the argument list of afunction−like macro definition. token is the erroneous token.

#define mac(a,4) a b c

"file," line 1: invalid token in #define macro parameters: 4

invalid token in directiveType: Error Options: allThe compiler found an invalid token at the end of what would otherwise be a correctly formeddirective.

Compiler diagnostics

Messages 184

Page 192: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

#line 7 "file.c

"file," line 1: warning: string literal expected after #line <number> "file," line 1: invalid token indirective: " "file," line 1: warning: tokens ignored at end of directive line

invalid type combinationType: Error Options: allAn inappropriate combination of type specifiers was used in a declaration.

short float f;

"file," line 1: invalid type combination

invalid type for bit−field: nameType: Error Options: allThe type chosen for bit−field name is not permitted for bit−fields. Bit−fields may only be declaredwith integral types.

struct s { float f:3; };

"file," line 1: invalid type for bit−field: f

invalid use of "defined" operatorType: Error Options: allA defined operator in a #if or #elif directive must be followed by an identifier or ( )'s that enclose anidentifier. The source code did not use it that way.

#if defined int i = 1;#endif

"file," line 1: invalid use of "defined" operator

invalid white space character in directiveType: Warning Options: allThe only white space characters that are permitted in preprocessing directives are space andhorizontal tab. The source code included some other white space character, such as form feed orvertical tab. The compiler treats this character like a space.

label redefined: nameType: Error Options: allThe same label name has appeared more than once in the current function. (A label's scope is anentire function.)

f(void){ int i; i = 1; if (i) {L: while (i) g(); goto L; }L: ;

Compiler diagnostics

Messages 185

Page 193: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

}

"file," line 10: label redefined: L

left operand must be modifiable lvalue: op "operator"Type: Error Options: allThe operand on the left side of operator must be a modifiable lvalue, but it wasn't.

f(void){ int i = 1; +i −= 1;}

"file," line 3: left operand must be modifiable lvalue: op "−="

left operand of "−>" must be pointer to struct/unionType: Warning, Error Options: allThe operand on the left side of a −> operator must be a pointer to a structure or union, but it wasn't.The diagnostic is a warning if the operand is a pointer, an error otherwise.

struct s { int x; };f(void){ long lp; lp−>x = 1;}

"file," line 4: warning: left operand of "−>" must be pointer to struct/union

left operand of "." must be lvalue in this contextType: Warning Options: allThe operand on the left side of a . operator is an expression that does not yield an lvalue. Usually thisresults from trying to change the return value of a function that returns a structure.

struct s { int ia[10]; };struct s sf(void);f(void){ sf().ia[0] = 3;}

"file," line 4: warning: left operand of "." must be lvalue in this context

left operand of "." must be struct/union objectType: Warning, Error Options: allThe . operator is only supposed to be applied to structure or union objects. The diagnostic is an errorif the operand to the left of . is an array, pointer, function call, enumeration constant or variable, or aregister value that got allocated to a register; it is a warning otherwise.

f(void){ struct s { short s; }; int i; i.s = 4;}

"file," line 4: warning: left operand of "." must be struct/ union object

Compiler diagnostics

Messages 186

Page 194: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

()−less function definitionType: Error Options: allThe declarator portion of a function definition must include parentheses. A function cannot be definedby writing a typedef name for a function type, followed by an identifier and the braces ({ }) thatdefine a function.

typedef int F();F f{ }

"file," line 2: ()−less function definition

long long and unsigned long long are extensions to ANSI CType: Warning Options: −XcA standard conforming C compiler must emit a diagnostic about "bad" use of type specifiers. Thiswarning is produced only at the first use of long long in a compile.

long long a;

"file", line 1: warning: long long and unsigned long long are extensions to ANSI C

loop not entered at topType: Warning Options: allThe controlling expression at the beginning of a for or while loop cannot be reached by sequentialflow of control from the statement before it.

f(void){ int i; goto lab; for (i = 1; i > 0; −−i) {lab:; i=5; }}

"file," line 4: warning: loop not entered at top

macro nesting too deepType: Fatal Options: allThe source code defines a macro that causes the compiler to run out of memory during preprocessing.

macro recursionType: Warning Options: −XtThe source code calls a macro that calls itself, either directly or indirectly. ANSI C's semanticsprevent further attempts to rescan the macro. Older C compilers would try to rescan the macro.

Because the rescanning rules are different for ANSI C and its predecessor, the compiler provides theold behavior in −Xt mode, which includes producing this diagnostic when macro recursion isdetected.

#define a(x) b(x)#define b(x) a(x)a(3)

Compiler diagnostics

Messages 187

Page 195: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

"file," line 3: warning: macro recursion

macro redefined: nameType: Warning Options: allThe source code redefined a macro. Previous releases allowed such redefinitions silently if bothdefinitions were identical except for the order and spelling of formal parameters. ANSI C requiresthat, when a macro is redefined correctly, the definitions must be identical including the order andspelling of formal parameters. This diagnostic is produced under all options if the new macrodefinition disagrees with the old one. For strict conformance, it is also produced under the −Xc optionwhen the macro definitions disagree only in the spelling of the formal parameters.

#define TIMES(a,b) a b#define TIMES(a,b) a − b

"file," line 2: warning: macro redefined: TIMES

macro replacement within a character constantType: Warning Options: −XtPrevious releases allowed the value of a formal parameter to be substituted in a character constant thatis part of a macro definition. ANSI C does not permit such a use. The compiler provides the oldbehavior in −Xt mode with the warning.

#define CTRL(x) ('x'&037) / form control character /

int ctrl_c = CTRL(c);

"file," line 1: warning: macro replacement within a character constant

The proper way to express this construct in ANSI C is the following:

#define CTRL(x) (x&037) / form control character /

int ctrl_c = CTRL('c');

macro replacement within a string literalType: Warning Options: −XtThis message diagnoses a similar condition to the preceding one, except the substitution is beingmade into a string literal.

#define HELLO(name) "hello, name"

char hello_dave = HELLO(Dave);

"file," line 1: warning: macro replacement within a string literal

ANSI C provides a way to accomplish the same thing. The # ``string−ize'' operator turns the tokens ofa macro argument into a string literal, and adjacent string literals are concatenated. The correct formis:

#define HELLO(name) "hello, " #name

char hello_dave = HELLO(Dave);

Compiler diagnostics

Messages 188

Page 196: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

member cannot be function: nameType: Error Options: allA function may not be a member of a structure or union, although a pointer to a function may.Member name was declared as a function.

struct s { int f(void); };

"file," line 1: member cannot be function: f

mismatched "?" and ":"Type: Error Options: allAn expression in a #if or #elif directive contained a malformed ? : expression.

#if defined(foo) ? 5 int i;#endif

"file," line 1: mismatched "?" and ":"

mismatched parenthesesType: Error Options: allParentheses were mismatched in a preprocessing conditional compilation directive.

#if ((1) int i = 1;#endif

"file," line 1: mismatched parentheses

missing ")"Type: Error Options: allIn a test of a predicate that follows a # operator in a #if or #elif directive, the ) that follows theassertion was missing.

#if # system(unix char system = "unix";#endif

"file," line 1: missing ")"

missing formal name in % lineType: Error Options: allIn an enhanced asm function, a % line specified a storage class, but not the formal parameter than hasthat storage class.

missing operandType: Error Options: allThe constant expression of a preprocessing conditional compilation directive is malformed. Anexpected operand for some operator was missing.

#define EMPTY#if EMPTY / 4

Compiler diagnostics

Messages 189

Page 197: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

int i = 1;#endif

"file," line 2: missing operand

missing operatorType: Error Options: allThe constant expression of a preprocessing conditional compilation directive is malformed. Anoperator was expected but was not encountered.

#if 1 4 int i = 1;#endif

"file," line 1: missing operator

missing tokens between parenthesesType: Error Options: allIn a #assert directive, there are no assertions within the parentheses of the predicate.

#assert system()

"file," line 1: missing tokens between parentheses

modifying typedef with "modifier"; only qualifiers allowedType: Warning Options: −XtANSI C prohibits applying a type modifier to a typedef name. ANSI C only permits modifying atypedef with a type qualifier (const, volatile). However, for compatibility, in −Xt mode the compileraccepts the declaration and treats it as did previous releases. In −Xa or −Xc mode, this declaration isrejected.

typedef int INT;unsigned INT i;

"file," line 2: warning: modifying typedef with "unsigned"; only qualifiers allowed

modulus by zeroType: Warning, Error Options: allThe second operand of a % operator is zero. If the modulus operation is part of a #if or #elif directive,the result is taken to be zero.

The diagnostic is a warning if the modulus is in executable code, an error otherwise.

#if 42 % 0 int i = 1;#endif

"file," line 1: warning: modulus by zero

more than one character honored in character constant: constantType: Warning Options: allA character constant has an integral value that derives from the character codes of the characters. If acharacter constant comprises more than one character, the encoding of the additional characters

Compiler diagnostics

Messages 190

Page 198: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

depends on the implementation. This warning alerts you that the encoding that the preprocessingphase uses for the character constant constant is different in this release of the C compiler from theone in previous releases, which only honored the first character.

(The encoding for character constants you use in executable code is unchanged.)

#if 'ab' != ('b' 256 + 'a')#error unknown encoding#endif

"file," line 1: warning: more than one character honored in character constant: 'ab'

"#" must be followed by formal identifier in #defineType: Error Options: allThe ``string−ize'' operator # must be followed by the name of a formal parameter in a function−likemacro.

#define mac(a) # + a

"file," line 1: "#" must be followed by formal identifier in #define

must have type "function−returning−unsigned": nameType: Warning Options: allThe name that is a part of a #pragma int_to_unsigned directive must be an identifier whose type isfunction−returning−unsigned.

extern int f(int);#pragma int_to_unsigned f

"file," line 2: warning: must have type "function−returning−unsigned": f

name in asm % line is not a formal: nameType: Error Options: allThe identifier name that followed a storage class specifier in the % line of an enhanced asm functionwas not one of the formal parameters of the function.

nested asm calls not now supportedType: Error Options: allThe compiler does not now support calls to enhanced asm functions as part of the argumentexpression for another enhanced asm function.

newline in character constantType: Error Options: allA character constant was written that had no closing ' on the same line as the beginning '.

int i = 'a;

"file," line 1: newline in character constant

newline in string literal

Compiler diagnostics

Messages 191

Page 199: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

Type: Warning, Error Options: allA string literal was written that had no closing " on the same line as the beginning ". The diagnosticis a warning if the string literal is part of a preprocessing directive (and the compiler provides themissing ") and an error otherwise.

char p = "abc;

"file," line 1: newline in string literal

newline not last character in fileType: Warning Options: allEvery non−empty source file and header must consist of complete lines. This diagnostic warns thatthe last line of a file did not end with a newline.

no %−specification line found in asm: nameType: Warning Options: allThe program includes the definition of an enhanced asm function name with parameters but that doesnot contain at least one match specification line (a line that begins with a ``%'' character). Such asmdefinitions will have no effect.

asm int identity(int arg) { movl arg, %eax}

"file", line 3: no %−specification line found in asm: identity

no actual for asm formal: nameType: Error Options: allAn enhanced asm function was called with fewer arguments than there were parameters in thedefinition. Thus there was no actual argument for parameter name.

no closing ">" in "#include <..."Type: Error Options: allA #include directive that used the < > form of header omitted the closing >.

#include <stdio.h

"file," line 1: warning: #include <... missing '>'

no file name after expansionType: Error Options: allThe form of #include directive was used that permits macro expansion of its argument, but theresulting expansion left no tokens to be taken as a file name.

#define EMPTY#include EMPTY

"file," line 2: no file name after expansion

no hex digits follow \x

Compiler diagnostics

Messages 192

Page 200: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

Type: Warning Options: −Xa, −XcThe \x escape in character constants and string literals introduces a hexadecimal character escape. The\x must be followed by at least one hexadecimal digit.

char cp = "\xz";

"file," line 1: warning: no hex digits follow \x

no macro replacement within a character constantType: Warning Options: −Xa, −XcThis message is the inverse of macro replacement within a character constant. Itinforms you that the macro replacement that was done for −Xt mode is not being done in −Xa or −Xtmode.

no macro replacement within a string literalType: Warning Options: −Xa, −XcThis message is the inverse of macro replacement within a string literal. Itinforms you that the macro replacement that was done for −Xt mode is not being done in −Xa or −Xcmode.

no tokens after expansionType: Error Options: allAfter macro expansion was applied to the expression in a #line directive, there were no tokens left tobe interpreted as a line number.

#define EMPTY#line EMPTY

"file," line 2: no tokens after expansion

no tokens follow "#pragma"Type: Warning Options: −vThe compiler encountered a #pragma directive that contained no other tokens.

#pragma

"file," line 1: warning: no tokens follow "#pragma"

no tokens following "#assert name ("Type: Error Options: allA use of the #assert directive is malformed. The assertions and the ) that should follow are missing.

#assert system(

"file," line 1: no tokens following "#assert name ("

no tokens in #line directiveType: Error Options: allThe rest of a #line directive was empty; the line number and optional file name were missing.

#line

Compiler diagnostics

Messages 193

Page 201: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

"file," line 1: no tokens in #line directive

no type specifiers present: assuming "int"Type: Warning Options: −vDeclarations without any type specifiers is primarily supported for compatibility and is expected to bean anachronism in the next C standard.

static x=2;f(){return 7;}

"file", line 1: no type specifiers present: assuming "int" "file", line 2: no type specifiers present:assuming "int"

non−constant initializer: op "operator"Type: Error Options: allThe initializer for an extern, static, or array object must be a compile−time constant. The initializersfor an automatic structure or union object, if enclosed in { }, must also be compile−time constants.operator is the operator whose operands could not be combined at compile time.

int j;int k = j+1;

"file," line 2: non−constant initializer: op "+"

non−formal identifier follows "#" in #defineType: Warning Options: allThe identifier that follows a # operator in a macro definition must be a formal parameter of afunction−like macro.

#define mac(a) "abc" # b

"file," line 1: non−formal identifier follows "#" in #define

non−integral case expressionType: Error Options: allThe operand of a case statement must be an integral constant.

f(void){ int i = 1; switch (i) { case 5.0: ; }}

"file," line 4: non−integral case expression

non−unique member requires struct/union: nameType: Error Options: allThe operand on the left side of a . operator was not a structure, union, or a pointer to one, and membername was not unique among all structure and union members that you have declared. Use . only withstructures or unions. The member should belong to the structure or union corresponding to the leftoperand.

Compiler diagnostics

Messages 194

Page 202: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

struct s1 { int x,y; };struct s2 { int y,z; };f(void){ long lp; lp.y = 1;}

"file," line 5: non−unique member requires struct/union object: y "file," line 5: left operand of "."must be struct/union object

non−unique member requires struct/union pointer: nameType: Error Options: allThis message diagnoses the same condition as the preceding one, but for the −> operator.

null character in inputType: Error Options: allThe compiler encountered a null character (a character with a character code of zero).

null dimension: nameType: Warning, Error Options: allA dimension of an array is null in a context where that is prohibited. The diagnostic is a warning if theoffending dimension is outermost and an error otherwise.

int ia[4][];struct s { int x, y[]; };int i = sizeof(int []);

"file," line 1: null dimension: ia "file," line 2: warning: null dimension: y "file," line 3: warning: nulldimension: sizeof()

number expectedType: Error Options: allThe compiler did not find a number where it expected to find one in a #if or #elif directive.

#if 1 + int i = 1;#endif

"file," line 1: number expected

old−style declaration hides prototype declaration: nameType: Warning Options: −vThe function name was declared in an inner scope. The outer declaration was a function prototypedeclaration, but the inner one lacks parameter information. By ANSI C's scoping rules, the parameterinformation is hidden and the automatic conversions of types that the prototype would have providedare suppressed.

extern double sin(double);f(void){ extern double sin(); double d; d = sin(1); / Note: no conversion to double! /}

Compiler diagnostics

Messages 195

Page 203: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

"file," line 3: warning: old−style declaration hides prototype declaration: sin "file," line 5: warning:argument does not match remembered type: arg #1

only one storage class allowedType: Error Options: allMore than one storage class was specified in a declaration.

f(void){ register auto i;}

"file," line 2: only one storage class allowed

only qualifiers allowed after Type: Error Options: allOnly the const or volatile type qualifiers may be specified after a in a declaration.

int const p;int unsigned q;

"file," line 2: only qualifiers allowed after

only "register" valid as formal parameter storage classType: Error Options: allA storage class specifier may be specified in a function prototype declaration, but only register ispermitted.

int f( register int x, auto int y);

"file," line 3: only "register" valid as formal parameter storage class

operand cannot have void type: op "operator"Type: Error Options: allOne of the operands of operator has void type.

f(void){ void v(void); int i = v();}

"file," line 3: operand cannot have void type: op "=" "file," line 3: assignment type mismatch

operand must be modifiable lvalue: op "operator"Type: Error Options: allThe operand of operator must be a modifiable lvalue, but it wasn't.

f(void){ int i = −−3;}

"file," line 2: operand must be modifiable lvalue: op "−−"

Compiler diagnostics

Messages 196

Page 204: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

operand treated as unsigned: constantType: Warning Options: −XtAn operand used in a #if or #elif directive has a value greater than LONG_MAX (2147483647) buthas no unsigned modifier suffix (u or U). Previous releases treated such constants as signed quantitieswhich, because of their values, actually became negative. ANSI C treats such constants as unsignedlong integers, which may affect their behavior in expressions. This diagnostic is a transition aid thatinforms you that the value is being treated differently from before.

#if 2147483648 > 0 char mesg = "ANSI C−style";#endif

"file," line 1: warning: operand treated as unsigned: 2147483648

operands have incompatible pointer types: op "operator"Type: Warning Options: alloperator was applied to pointers to different types.

f(void){ char cp; int ip; if (ip < cp) ;}

"file," line 4: warning: operands have incompatible pointer types: op "<"

operands have incompatible types: op "operator"Type: Error Options: allThe types of the operands for operand are unsuitable for that type of operator.

f(void){ char cp; int ip; void vp = ip + cp;}

"file," line 4: operands have incompatible types: op "+"

operands must have category type: op "operator"Type: Error Options: allThe operands for operator do not fall into the appropriate category for that operator. category may bearithmetic, integral, or scalar.

f(void){ int ia[5]; int ip = ia/4;}

"file," line 3: operands must have arithmetic type: op "/"

out of scope extern and prior uses redeclared as static: name

Compiler diagnostics

Messages 197

Page 205: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

Type: Warning Options: −Xc, −vname was declared as extern in a block that has gone out of scope. Then name was declared again,this time as static. The compiler treats the object or function as if it were static, and all references,including ones earlier in the source file, apply to the static version.

f(void){ extern int i;}static int i;

"file," line 4: warning: out of scope extern and prior uses redeclared as static: i

overflow in hex escapeType: Warning Options: allIn a hexadecimal escape (\x) in a character constant or string literal, the accumulated value for theescape grew too large. Only the low−order 32 bits of value are retained.

int i = '\xabcdefedc';

"file," line 1: warning: \x is ANSI C hex escape "file," line 1: warning: overflow in hex escape "file,"line 1: warning: character escape does not fit in character

parameter mismatch: ndecl declared, ndef definedType: Warning Options: allA function prototype declaration and an old−style definition of the function disagree in the number ofparameters. The declaration had ndecl parameters, while the definition had ndef.

int f(int);int f(i,j)int i,j;{}

"file," line 4: warning: parameter mismatch: 1 declared, 2 defined

parameter not in identifier list: nameType: Error Options: allVariable name appears in an old−style function definition's parameter declarations, but it does notappear in the parameter identifier list.

f(a,b)int i;{}

"file," line 2: parameter not in identifier list: i

parameter redeclared: nameType: Error Options: allname was used more than once as the name for a parameter in a function definition.

int f(int i, int i) { }int g(i,j)int i;int i;{ }

Compiler diagnostics

Messages 198

Page 206: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

"file," line 1: parameter redeclared: i "file," line 4: parameter redeclared: i

preprocessing a .i fileType: Warning Options: allThe source file is a .i file, a file that has already been preprocessed, and the −E cc option wasselected. The compiler will simply copy the input file to the standard output without furtherprocessing.

prototype mismatch: n1 arg[s] passed, n2 expectedType: Error Options: allA function for which there is a function prototype declaration in scope, and the number of argumentsin the call, n2, did not match the number of parameters in the declaration, n1.

int f(int);g(void){ f(1,2);}

"file," line 3: prototype mismatch: 2 args passed, 1 expected

return value type mismatchType: Error Options: allA value from a function cannot be returned that cannot be converted to the return−type of thefunction.

f(void){ struct s { int x; } st; return( st );}

"file," line 3: return value type mismatch

semantics of "operator" change in ANSI C; use explicit castType: Warning Options: −vThe type promotion rules for ANSI C are slightly different from those of previous releases. In thecurrent release the default behavior is to use the ANSI C rules. To obtain the old behavior, use the−Xt option for the cc command.

Previous type promotion rules were ``unsigned−preserving.'' If one of the operands of an expressionwas of unsigned type, the operands were promoted to a common unsigned type before the operationwas performed.

ANSI C uses ``value−preserving'' type promotion rules. An unsigned type is promoted to a signedtype if all its values may be represented in the signed type.

The different type promotion rules may lead to different program behavior for the operators that areaffected by the unsigned−ness of their operands:

The division operators: /, /=, %, %=.◊ The right shift operators: >>, >>=.◊ The relational operators: <, <=, >, >=.◊

Compiler diagnostics

Messages 199

Page 207: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

The warning message tells that the program contains an expression in which the behavior of operatorchanges in the −Xa or −Xc mode. To guarantee the desired behavior, insert an explicit cast in theexpression.

f(void){ unsigned char uc; int i; / was unsigned divide, signed in ANSI C / i /= uc;}

"file," line 5: warning: semantics of "/=" change in ANSI C; use explicit cast

To get the same behavior as in previous releases add an explicit cast:

f(void){ unsigned char uc; int i; / was unsigned divide, signed in ANSI C / i /= (unsigned int) uc;}

shift count negative or too big: op nType: Warning Options: allThe compiler determined that the shift count (the right operand) for shift operator op is either negativeor bigger than the size of the operand being shifted.

f(){ short s; s <<= 25;}

"file," line 3: warning: shift count negative or too big: <<= 25

statement not reachedType: Warning Options: allThis statement in the program cannot be reached because of goto, break, continue, or returnstatements preceding it.

f(void){ int i; return i; i = 4;}

"file," line 4: warning: statement not reached

static function called but not defined: name()Type: Warning Options: allThe program calls function name, which has been declared static, but no definition of name appearsin the translation unit. (The line number that is displayed in the message is one more than the numberof lines in the file, because this condition can be diagnosed only after the entire translation unit hasbeen seen.)

static int statfunc(int);

Compiler diagnostics

Messages 200

Page 208: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

voidf(){ int i = statfunc(4);}

"file," line 7: warning: static function called but not defined: statfunc()

static redeclares external: nameType: Warning Options: allname was reused as the name of a static object or function after having used it in the same block asthe name of an extern object or function. The version of name that remains visible is the staticversion.

f(void){ extern int i; static int i;}

"file," line 3: warning: static redeclares external: i

storage class after type is obsolescentType: Warning Options: −vAccording to the ANSI C standard, writing declarations in which the storage class specifier is not firstis ``obsolescent.''

int static i;

"file," line 1: warning: storage class after type is obsolescent

storage class for function must be static or externType: Warning Options: allAn inappropriate storage class specifier for a function declaration or definition was used. Only externand static may be used, or the storage class may be omitted. The specifier is ignored.

f(void){ auto g(void);}

"file," line 2: warning: storage class for function must be static or extern

string literal expected after # <number>Type: Warning Options: allThe # line information directive takes an optional second token, a file name. If present, it must be inthe form of a string literal.

# 1 x.c

"file," line 1: warning: string literal expected after # <number> "file," line 1: warning: tokens ignoredat end of directive line

string literal expected after #fileType: Error Options: allThe #file directive (which is reserved for the compilation system) is used for internal communication

Compiler diagnostics

Messages 201

Page 209: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

between preprocessing and compilation phases. A string literal operand is expected as the operand.

string literal expected after #identType: Error Options: allA #ident directive must be followed by a normal (not wide character) string literal.

#ident no−string

"file," line 1: string literal expected after #ident

string literal expected after #line <number>Type: Warning Options: allThis diagnostic is similar to string literal expected after # <number>, except thatit applies to the standard #line directive.

string literal must be sole array initializerType: Warning Options: allIt is not permissible to initialize a character array with both a string literal and other values in thesame initialization.

char ca[] = { "abc", 'd' };

"file," line 1: warning: string literal must be sole array initializer

struct/union has no named membersType: Warning Options: allA structure or union was declared in which none of the members is named.

struct s { int :4; char :0; };

"file," line 1: warning: struct/union has no named members

struct/union−valued initializer requiredType: Error Options: allANSI C allows you to initialize an automatic structure or union, but the initializer must have the sametype as the object being initialized.

f(void){ int i; struct s { int x; } st = i;}

"file," line 3: warning: {}−enclosed initializer required "file," line 3: struct/union−valued initializerrequired

switch expression must have integral typeType: Warning, Error Options: allA switch statement was written in which the controlling expression did not have integral type. Themessage is a warning if the invalid type is a floating−point type and an error otherwise. Afloating−point switch expression is converted to int.

f(void){

Compiler diagnostics

Messages 202

Page 210: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

float x; switch (x) { case 4: ; }}

"file," line 3: warning: switch expression must have integral type

syntax error before or at: tokenType: Error Options: allThis is an all−purpose diagnostic that means you have juxtaposed two (or more) language tokensinappropriately. The compiler shows you the token at which the error was detected.

f(void){ int i = 3+;}

"file," line 2: syntax error before or at: ;

syntax error in macro parametersType: Error Options: allThe macro parameter list part of a function−like macro definition is malformed. The list must be acomma−separated list of identifiers and was not.

#define mac(a,b,) a b

"file," line 1: syntax error in macro parameters

syntax error, probably missing ",", ";" or "="Type: Error Options: allA declaration that looked like a function definition was written, except that the type of the symboldeclared was not ``function returning.'' A ; or =. is needed.

int iint j;

"file," line 2: syntax error, probably missing ",", ";" or "=" "file," line 2: parameter not in identifierlist: j "file," line 4: syntax error before or at: <EOF>

syntax error: empty declarationType: Warning Options: allA null statement was written at file scope. This looks like an empty declaration statement and waspreviously permitted, but ANSI C does not.

int i;;

"file," line 1: warning: syntax error: empty declaration

syntax error: "&..." invalidType: Warning Options: −XcA &... was written in a program that was compiled with the −Xc option. &... is invalid ANSI Csyntax. Do not use this notation explicitly.

Compiler diagnostics

Messages 203

Page 211: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

syntax requires ";" after last struct/union memberType: Warning Options: allThe ; that C syntax requires after the last structure or union member in a structure or union declarationwas omitted.

struct s { int x };

"file," line 1: warning: syntax requires ";" after last struct/union member

(type) tag redeclared: nameType: Error Options: allThe tag name that was originally a type (struct, union, or enum) tag was redeclared.

struct q { int m1, m2; };enum q { e1, e2 };

"file," line 2: (struct) tag redeclared: q

token not allowed in directive: tokenType: Error Options: allA token was used in a #if or #elif directive that is neither a valid operator for constant expressions,nor a valid integer constant.

#if 1 > "1" int i = 1;#endif

"file," line 1: token not allowed in directive: "1"

token−less macro argumentType: Warning Options: −XcThe actual argument to a preprocessor macro consisted of no tokens. The ANSI C standard regardsthis condition as undefined. The compiler treats the empty list of tokens as an empty argument, and,under the −Xc mode, it also issues this warning.

#define m(x) x+3int i = m();

"file," line 2: warning: token−less macro argument

tokens after −A− are ignoredType: Warning Options: allIn the −A− option to the cc command, there were additional tokens adjacent to the option. They areignored.

cc −A−extra −c x.ccommand line: warning: tokens after −A− are ignored

tokens expected after "# identifier ("Type: Error Options: allWhen the # operator is used in a #if or #elif directive to select a predicate instead of a like−namedmacro, the predicate must be followed by a parenthesized list of tokens.

Compiler diagnostics

Messages 204

Page 212: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

#if #system( char system = "unix";#endif

"file," line 1: tokens expected after "# identifier ("

tokens expected after "("Type: Error Options: allIn a #unassert directive, the assertion(s) and closing ) after the predicate were missing.

#unassert system(

"file," line 1: tokens expected after "("

tokens expected between parenthesesType: Error Options: allThe name of an assertion of a predicate to test was omitted in an #if or #elif directive.

#if #system() char sysname = "??";#endif

"file," line 1: tokens expected between parentheses

tokens ignored after "−U{identifier}"Type: Warning Options: allIn the command line −U option, there were tokens following the name of the macro to be undefined.

cc −Uunix,u3b2 −c x.ccommand line: warning: tokens ignored after "−U{identifier}"

tokens ignored at end of directive lineType: Warning Options: allA directive line contains extra tokens that are not expected as part of the directive.

#undef a b / can only undefine one /

"file," line 1: warning: tokens ignored at end of directive line

too many array initializersType: Error Options: allMore initializers than the array can hold were provided for the array.

int ia[3] = { 1, 2, 3, 4 };

"file," line 1: too many array initializers

too many #else'sType: Warning Options: allThe code contained more that one #else directive in a preprocessing if−section. All #else directivesafter the first are taken to be false.

#ifdef ONE

Compiler diagnostics

Messages 205

Page 213: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

int i = 1;#else int i = 2;#else int i = 3;#endif

"file," line 5: warning: too many #else's

too many errorsType: Fatal Options: allThe compiler encountered too many errors to make further processing sensible. Rather than producefurther diagnostics, the compiler exits.

too many initializers for scalarType: Error Options: allA { }−bracketed initialization for a scalar contains more than one value.

int i = { 1, 2 };

"file," line 1: too many initializers for scalar

too many struct/union initializersType: Error Options: allToo many initializers for a structure or union were provided.

struct s { int x,y; } st = { 1,2,3 };

"file," line 1: too many struct/union initializers

trailing "," prohibited in enum declarationType: Warning Options: −Xc, −vAn extra comma was supplied at the end of an enumeration type declaration. The extra comma isprohibited by the syntax.

enum e { e1, e2, };

"file," line 1: warning: trailing "," prohibited in enum declaration

trigraph sequence replacedType: Warning Options: −XtANSI C introduces the notion of trigraphs, three−character sequences that stand for a single character.All such sequences begin with ??. Because sequences that are interpreted as trigraphs may appear inexisting code, the compiler produces a transitional diagnostic when such sequences are encountered intransition mode (−Xt.

char surprise = "this is a trigraph??!";

"file," line 1: warning: trigraph sequence replaced

type does not match prototype: nameType: Warning Options: allA function prototype declaration for a function was provided, but it used an old−style definition. The

Compiler diagnostics

Messages 206

Page 214: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

type for parameter name in that definition is incompatible with the type you used in the prototypedeclaration.

int f(char );int f(p)int p;{}

"file," line 4: warning: type does not match prototype: p

The following example shows an especially confusing instance of this diagnostic.

int f(char);int f(c)char c;{}

"file," line 3: warning: identifier redeclared: f "file," line 4: warning: type does not match prototype: c

f has an old−style definition. For compatibility reasons, f's arguments must therefore be promotedaccording to the default argument promotions, which is how they were promoted before the existenceof function prototypes. Therefore the value that must actually be passed to f is an int, although thefunction will only use the char part of the value. The diagnostic, then, identifies the conflict betweenthe int that the function expects and the char that the function prototype would (conceptually) causeto be passed.

There are two ways to fix the conflict:

Change the function prototype to read int f(int);1. Define f with a function prototype definition:

int f(char);int f(char c){}

2.

typedef already qualified with "qualifier"Type: Warning Options: allA type specifier includes a typedef and an explicit type qualifier, qualifier. The typedef alreadyincluded qualifier when it was declared.

typedef volatile int VOL;volatile VOL v;

"file," line 2: warning: typedef already qualified with "volatile"

typedef declares no type nameType: Warning Options: allIn a declaration with storage class typedef, no type name was actually declared. This is probably aprogramming error.

typedef struct s { int x; };

"file," line 1: warning: typedef declares no type name

Compiler diagnostics

Messages 207

Page 215: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

typedef redeclared: nameType: Warning Options: alltypedef name was declared more than once. The later declaration has an identical type to the first.

typedef int i;typedef int i;

"file," line 2: warning: typedef redeclared: i

typedef redeclares external: nameType: Warning Options: alltypedef name was declared, but there is an extern of the same name in the same block. The typedefhides the external.

f(void){ extern int INT; typedef int INT;}

"file," line 3: warning: typedef redeclares external: INT

"typedef" valid only for function declarationType: Warning Options: allA function definition may not have the typedef storage class. It is ignored here.

typedef int f(void){}

"file," line 1: warning: "typedef" valid only for function declaration

unacceptable operand for unary &Type: Error Options: allAn attempt was made to take the address of something whose address cannot be taken.

f(void){ int ip = &g();}

"file," line 2: unacceptable operand for unary &

#unassert requires an identifier tokenType: Error Options: allThe #unassert directive must name a predicate to ``un−assert.''

#unassert 5

"file," line 1: #unassert requires an identifier token

undefined label: labelType: Error Options: allA goto was written in the current function, but the target label was never defined anywhere within thefunction.

Compiler diagnostics

Messages 208

Page 216: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

f(void){ goto L;}

"file," line 3: undefined label: L

undefined struct/union member: nameType: Error Options: allThe program made reference to a structure or union member, name, that has not been declared as partof any structure.

struct s { int x; };f(void){ struct s q; q.y = 1;}

"file," line 4: undefined struct/union member: y

undefined symbol: nameType: Error Options: allA symbol name was referred to for which there is no declaration in scope.

f(void){ g(i);}

"file," line 2: undefined symbol: i

undefining __STDC__Type: Warning Options: −XtANSI C prohibits undefining the predefined symbol __STDC__. However, C issue 5 permits you todo so in transition mode (only). Use this feature to test C code that has been written to work in both anANSI C and non−ANSI C environment.

For example, suppose you have C code that checks __STDC__, declaring function prototypedeclarations if it is defined, and old−style function declarations (or definitions) if not. Because thecompiler predefines __STDC__, you would ordinarily be unable to check the old−style code, and youwould have to run the code through another (non−ANSI C) compiler. By undefining __STDC__(usually on the command line), you can use the compiler to do the checking. This diagnostic tells you,as required, that you are violating ANSI C constraints.

#undef __STDC__ / usually −U__STDC__ on cc line /

#ifdef __STDC__ int myfunc(const char arg1, int arg2) #else / non−ANSI C case / intmyfunc(arg1,arg2) char arg1, / oops / int arg2; #endif { }

"file," line 1: warning: undefining __STDC__ "file," line 10: syntax error before or at: int "file," line12: syntax error before or at: {

unexpected "("Type: Error Options: allA misplaced ( was encountered in a #if or #elif directive.

Compiler diagnostics

Messages 209

Page 217: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

#if 1 ( int i = 1;#endif

"file," line 1: unexpected "("

unexpected ")"Type: Error Options: allA misplaced ) was encountered in a #if or #elif directive.

#if ) 1 int i = 1;#endif

"file," line 1: unexpected ")"

unexpected character in asm % line: 'c'Type: Error Options: allIn the % specification line of an enhanced asm function, the compiler expected to see an alphabeticcharacter that begins a storage class specifier. Instead it encountered the character c.

unknown operand size: op "operator"Type: Error Options: alloperator ++, −−, or = was applied to an operand whose size is unknown. The operand is usually apointer to a structure or union whose members have not been declared.

f(void){ struct s sp; sp++;}

"file," line 3: unknown operand size: op "++"

unnamed typememberType: Warning Options: allIn the type declaration, you failed to give a member a name.

union s { int; char c; };

"file," line 1: warning: unnamed union member

unreachable case label: valueType: Warning Options: allThe expression specified in a case statement has a value outside the range of the type of thecontrolling expression of the enclosing switch statement. Therefore the case label can never bereached. In the message, value is represented as a hexadecimal value if the case expression isunsigned, decimal if it is signed.

f(){ unsigned char uc;

switch( uc ){ case 256: ; } }

Compiler diagnostics

Messages 210

Page 218: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

"file," line 5: warning: unreachable case label: 256

unreached side effect or comma operator: op operatorType: Warning Options: −Xc −vThe C Standard requires that constant expressions shall not contain assignment, increment,decrement, function calls, or comma operators, except when they are contained within the operand ofa sizeof operator.

void f(int j){static int i = 1 ? 2 : (++j);}

"file", line 1: unreached side effect or comma operator: op "+="

unrecognized #pragma ignored: pragmaType: Warning Options: −vBecause #pragma directives are implementation−specific, when the −v compilation flag is set, thecompiler warns about any such directives that it is ignoring. The compiler does not recognize#pragma pragma.

#pragma list

"file," line 1: warning: unrecognized #pragma ignored: list

unused label: labelType: Warning Options: −vWhen using acomp (a component name for the ANSI C compiler) if the −v compilation flag is set, thecompiler warns if a label is declared but never used in a function. When removing labels from thesymbol table, it checks the SY_REF flag and warns if it is unset.

"file," line 4: warning: unused label: label

use "double" instead of "long float"Type: Warning Options: allAn object or function was declared to be long float, which was a synonym for double. ANSI C doesnot permit long float, although the compiler accepts it as a transition aid.

long float f = 1.0;

"file," line 1: warning: use "double" instead of "long float"

useless declarationType: Warning Options: allANSI C requires that every declaration actually declare something, such as

a declarator,◊ a structure or union tag,◊ structure or union members, or◊ enumeration constants.◊

A declaration was written that provided no information to the compiler.

int; / no identifier /enum e { e1, e2 }; / introduces enum e /enum e; / no new information /

Compiler diagnostics

Messages 211

Page 219: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

"file," line 1: warning: useless declaration "file," line 3: warning: useless declaration

using out of scope declaration: nameType: Warning Options: allname was previously declared in a scope that is no longer active. In some ANSI C implementations,referring to such an object would yield an error; calling such a function would be interpreted ascalling a function returning int. The compiler remembers the previous declaration and uses it. Thiswarning tells what the compiler has done.

f(void){ extern int i; double sin(double);}g(void){ double d = sin(1.5); i = 1;}

"file," line 6: warning: using out of scope declaration: sin "file," line 7: warning: using out of scopedeclaration: i

using out of scope dimension: operatorType: Warning Options: allThe size of what is conceptually an incomplete external array type is needed by the operator; thecompiler is able to use the dimension defined at some previous inner scope.

int i[];foo() { {extern int i[10];} bar(sizeof)(i));}

"file", line 4: warning: using out of scope dimension: sizeof()

void expressions may not be arguments: arg #nType: Error Options: allA function call contains an argument for which the expression type is void.

f(void){ void v(void); g(v());}

"file," line 3: void expressions may not be arguments: arg #1

void function cannot return valueType: Warning Options: allA return was written statement with an expression, but the declared type of the function is void.

void v(void){ return 3;}

"file," line 2: void function cannot return value

Compiler diagnostics

Messages 212

Page 220: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

"void" must be sole parameterType: Error Options: allOnly the first parameter in a function prototype declaration may have void type, and it must be theonly parameter.

int f(int,void);

"file," line 1: "void" must be sole parameter

void parameter cannot have name: nameType: Error Options: allThe parameter name was declared in a function prototype declaration that has void type.

int f(void v);

"file," line 1: void parameter cannot have name: v

\x is ANSI C hex escapeType: Warning Options: −XtIn earlier releases, '\x' was equivalent to 'x'. However, in ANSI C, '\x' introduces a hexadecimalcharacter escape. This diagnostic warns of the new meaning.

If valid hexadecimal characters follow '\x', they are interpreted as part of the new escape sequence.Otherwise '\x' is treated as it was in previous releases.

int i = '\x';

"file," line 1: warning: \x is ANSI C hex escape

zero or negative subscriptType: Warning, Error Options: allThe size in an array declaration is zero or negative. The diagnostic is a warning if the size is zero andan error otherwise.

int ia[−5];int ib[0];

"file," line 1: zero or negative subscript "file," line 2: warning: zero or negative subscript

zero−sized struct/unionType: Error Options: allA structure or union was declared with the size of zero.

struct s { int ia[0]; };

"file," line 1: warning: zero or negative subscript "file," line 1: zero−sized struct/union

Compiler diagnostics

Messages 213

Page 221: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

Other error messages

The following messages may appear at compile time, but they are not generated by the compiler. Messagesbeginning with UX:as:ERROR are produced by as, the assembler. Messages beginning with UX:ld:ERRORare generated by ld, the link editor. Note that the format of the messages varies, and some of the messages aredisplayed over several lines.

UX:as:ERROR: file.c aline n (cline n) : trouble writing; probably out of temp−file space

The file system may be low on space, or the temporary file or output file exceeded the current ulimit.

UX:as:ERROR: file.c aline n (cline n) Cannot open Output File filename

The directory containing the source file is unwritable, or the file system containing source file is mountedread−only.

UX:ld:ERROR:file:fatal error: symbol `sym' multiply−defined, also in file file1.o

A symbol name was defined more than once.

undefined first referenced symbol in file

sym1 file1.o

UX:ld:ERROR:a.out:fatal error: Symbol referencing errors. No output written to a.out

A referenced symbol was not found. Compilation terminates.

C++ compiler diagnostics

The diagnostic messages produced by the C++ Compilation System are designed to be self−explanatory.Diagnostic messages have an associated severity:

Catastrophic errors indicate problems of such severity that the compilation cannot continue. Theseinclude command−line errors, internal errors, and missing include files. If multiple source files arebeing compiled, any source files after the current one will not be compiled.

Errors indicate violations of the syntax or semantic rules of the C++ language. Compilation continues,but object code is not generated.

Warnings indicate something valid but questionable. Compilation continues and object code isgenerated (if no errors are detected).

Remarks indicate something that is valid and probably intended, but which a careful programmer maywant to check. These diagnostics are not issued by default; you can specify that they should be issuedby using the CC −v option. Compilation continues and object code is generated (if no errors aredetected).

In some cases violations of the C++ standard are diagnosed with a warning rather than an error; this behavior

Compiler diagnostics

Other error messages 214

Page 222: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

is still standards−conforming since the standard only requires a diagnostic of some kind.

Diagnostics are written to stderr with a form like the following:

"test.c", line 5: a break statement may only be used within a loop or switch break; ^

Note that the message identifies the file and line involved, and that the source line itself (with positionindicated by the caret ``^'') follows the message. If there are several diagnostics in one source line, eachdiagnostic will have the form above, with the result that the text of the source line will be displayed severaltimes, with an appropriate position each time.

Long messages are wrapped to additional lines when necessary.

For some messages, a list of entities is useful; they are listed following the initial error message:

"test.c", line 4: error: more than one instance of overloaded function "f" matches the argument list: function "f(int)" function "f(float)" f(1.5); ^

In some cases, some additional context information is provided; specifically, such context information isuseful when the compiler issues a diagnostic while doing a template instantiation or while generating aconstructor, destructor, or assignment operator function. For example:

"test.c", line 7: error: "A::A()" is inaccessible B x; ^ detected during implicit generation of "B::B()" at line 7

Without the context information, it may be difficult to figure out what the error refers to.

Compiler diagnostics

Other error messages 215

Page 223: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

Object filesThis topic describes the object file format, called ELF (Executable and Linking Format). There are three maintypes of object files.

A relocatable file holds code and data suitable for linking with other object files to create anexecutable or a shared object file.

An executable file holds a program suitable for execution; the file specifies how exec(2) creates aprogram's process image.

A shared object file holds code and data suitable for linking in two contexts. First, the link editor (seeld(1)) processes the shared object file with other relocatable and shared object files to create anotherobject file. Second, the dynamic linker combines it with an executable file and other shared objects tocreate a process image.

Created by the assembler and link editor, object files are binary representations of programs intended to beexecuted directly on a processor. Programs that require other abstract machines, such as shell scripts, areexcluded.

File format

Object files participate in program linking (building a program) and program execution (running a program).For convenience and efficiency, the object file format provides parallel views of a file's contents, reflectingthe differing needs of those activities. The object file's organization is shown below.

Linking View Execution View

ELF header ELF header

Program header table Program headertable

optional

Section 1 Segment 1

[. . .]

Section n Segment 2

[. . .]

[. . .] [. . .]

Section header table Section headertable

optional

Object file format

An ELF header resides at the beginning and holds a ``road map'' describing the file's organization. Sectionshold the bulk of object file information for the linking view: instructions, data, symbol table, relocationinformation, and so on, these are described in ``Special sections''.

A program header table tells the system how to create a process image. Files used to build a process image(execute a program) must have a program header table; relocatable files do not need one. A section headertable contains information describing the file's sections. Every section has an entry in the table; each entry

Object files 216

Page 224: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

gives information such as the section name, the section size, and so on. Files used during linking must have asection header table; other object files may or may not have one.

NOTE: Although the figure shows the program header table immediately after the ELF header, and thesection header table following the sections, actual files may differ. Moreover, sections and segments have nospecified order. Only the ELF header has a fixed position in the file.

Data representation

As described here, the object file format supports various processors with 8−bit bytes and 32−bit or 64−bitarchitectures. Nevertheless, it is intended to be extensible to larger (or smaller) architectures. Object filestherefore represent some control data with a machine−independent format, making it possible to identifyobject files and interpret their contents in a common way. Remaining data in an object file use the encoding ofthe target processor, regardless of the machine on which the file was created.

Name Size Alignment Purpose

Elf32_Addr 4 4 Unsigned programaddress

Elf32_Off 4 4 Unsigned file offset

Elf32_Half 2 2 Unsigned mediuminteger

Elf32_Word 4 4 Unsigned integer

Elf32_Sword 4 4 Signed integer

unsigned char 1 1 Unsigned smallinteger

32−Bit data types

Name Size Alignment Purpose

Elf64_Addr 8 8 Unsigned program address

Elf64_Off 8 8 Unsigned file offset

Elf64_Half 2 2 Unsigned medium integer

Elf64_Word 4 4 Unsigned integer

Elf64_Sword 4 4 Signed integer

Elf64_Xword 8 8 Unsigned long integer

Elf64_Sxword 8 8 Signed long integer

unsigned char 1 1 Unsigned small integer

64−Bit data types

All data structures that the object file format defines follow the ``natural'' size and alignment guidelines for therelevant class. If necessary, data structures contain explicit padding to ensure 8−byte alignment for 8−byteobjects, 4−byte alignment for 4−byte objects, to force structure sizes to a multiple of 4 or 8, and so forth. Dataalso have suitable alignment from the beginning of the file. Thus, for example, a structure containing an

Object files

Data representation 217

Page 225: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

Elf32_Addr member will be aligned on a 4−byte boundary within the file.

For portability reasons, ELF uses no bit−fields.

ELF header

Some object file control structures can grow, because the ELF header contains their actual sizes. If the objectfile format changes, a program may encounter control structures that are larger or smaller than expected.Programs might therefore ignore ``extra'' information. The treatment of ``missing'' information depends oncontext and will be specified when and if extensions are defined.

#define EI_NIDENT 16

typedef struct { unsigned char e_ident[EI_NIDENT]; Elf32_Half e_type; Elf32_Half e_machine; Elf32_Worde_version; Elf32_Addr e_entry; Elf32_Off e_phoff; Elf32_Off e_shoff; Elf32_Word e_flags; Elf32_Halfe_ehsize; Elf32_Half e_phentsize; Elf32_Half e_phnum; Elf32_Half e_shentsize; Elf32_Half e_shnum;Elf32_Half e_shstrndx; } Elf32_Ehdr;

typedef struct { unsigned char e_ident[EI_NIDENT]; Elf64_Half e_type; Elf64_Half e_machine; Elf64_Worde_version; Elf64_Addr e_entry; Elf64_Off e_phoff; Elf64_Off e_shoff; Elf64_Word e_flags; Elf64_Halfe_ehsize; Elf64_Half e_phentsize; Elf64_Half e_phnum; Elf64_Half e_shentsize; Elf64_Half e_shnum;Elf64_Half e_shstrndx; } Elf64_Ehdr;

ELF header

e_identThe initial bytes mark the file as an object file and provide machine−independent data with which todecode and interpret the file's contents. Complete descriptions appear in ``ELF identification''.

e_typeThis member identifies the object file type.

Name Value Meaning

ET_NONE 0 No file type

ET_REL 1 Relocatable file

ET_EXEC 2 Executable file

ET_DYN 3 Shared object file

ET_CORE 4 Core file

ET_LOOS 0xff00 Operating system−specific

ET_HIOS 0xff00 Operating system−specific

ET_LOPROC 0xff00 Processor−specific

ET_HIPROC 0xffff Processor−specific

Although the core file contents are unspecified, type ET_CORE is reserved to mark the file. Valuesfrom ET_LOOS through ET_HIOS (inclusive) are reserved for operating system−specific semantics.Values from ET_LOPROC through ET_HIPROC (inclusive) are reserved for processor−specific

Object files

ELF header 218

Page 226: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

semantics. If meanings are specified, the processor supplement explains them. Other values arereserved and will be assigned to new object file types as necessary.

e_machineThis member's value specifies the required architecture for an individual file.

Name Value Meaning

EM_NONE 0 No machine

EM_M32 1 AT&T WE 32100

EM_SPARC 2 SPARC

EM_386 3 Intel 80386

EM_68K 4 Motorola 68000

EM_88K 5 Motorola 88000

RESERVED 6 Reserved for future use

EM_860 7 Intel 80860

EM_MIPS 8 MIPS I Architecture

EM_S370 9 IBM System/370 Processor

EM_MIPS_RS3_LE 10 MIPS RS3000 Little−endian

RESERVED 11−14Reserved for future use

EM_PARISC 15 Hewlett−Packard PA−RISC

RESERVED 16 Reserved for future use

EM_VPP500 17 Fujitsu VPP500

EM_SPARC32PLUS 18 Enhanced instruction set SPARC

EM_960 19 Intel 80960

EM_PPC 20 PowerPC

EM_PPC64 21 64−bit PowerPC

RESERVED 22−35Reserved for future use

EM_V800 36 NEC V800

EM_FR20 37 Fujitsu FR20

EM_RH32 38 TRW RH−32

EM_RCE 39 Motorola RCE

EM_ARM 40 Advanced RISC Machines ARM

EM_ALPHA 41 Digital Alpha

EM_SH 42 Hitachi SH

EM_SPARCV9 43 SPARC Version 9

EM_TRICORE 44 Siemens Tricore embedded processor

EM_ARC 45 Argonaut RISC Core, Argonaut Technologies Inc.

EM_H8_300 46 Hitachi H8/300

EM_H8_300H 47 Hitachi H8/300H

EM_H8S 48 Hitachi H8S

EM_H8_500 49 Hitachi H8/500

EM_IA_64 50 Intel IA−64 processor architecture

EM_MIPS_X 51 Stanford MIPS−X

Object files

ELF header 219

Page 227: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

EM_COLDFIRE 52 Motorola ColdFire

EM_68HC12 53 Motorola M68HC12

EM_MMA 54 Fujitsu MMA Multimedia Accelerator

EM_PCP 55 Siemens PCP

EM_NCPU 56 Sony nCPU embedded RISC processor

EM_NDR1 57 Denso NDR1 microprocessor

EM_STARCORE 58 Motorola Star*Core processor

EM_ME16 59 Toyota ME16 processor

EM_ST100 60 STMicroelectronics ST100 processor

EM_TINYJ 61 Advanced Logic Corp. TinyJ embedded processor family

Reserved 62−65Reserved for future use

EM_FX66 66 Siemens FX66 microcontroller

EM_ST9PLUS 67 STMicroelectronics ST9+ 8/16 bit microcontroller

EM_ST7 68 STMicroelectronics ST7 8−bit microcontroller

EM_68HC16 69 Motorola MC68HC16 Microcontroller

EM_68HC11 70 Motorola MC68HC11 Microcontroller

EM_68HC08 71 Motorola MC68HC08 Microcontroller

EM_68HC05 72 Motorola MC68HC05 Microcontroller

EM_SVX 73 Silicon Graphics SVx

EM_ST19 74 STMicroelectronics ST19 8−bit microcontroller

EM_VAX 75 Digital VAX

EM_CRIS 76 Axis Communications 32−bit embedded processor

EM_JAVELIN 77 Infineon Technologies 32−bit embedded processor

EM_FIREPATH 78 Element 14 64−bit DSP Processor

EM_ZSP 79 LSI Logic 16−bit DSP Processor

Other values are reserved and will be assigned to new machines as necessary. Processor−specific ELFnames use the machine name to distinguish them. For example, the flags mentioned in the next tableuse the prefix EF_; a flag named WIDGET for the EM_XYZ machine would be calledEM_XYZ_WIDGET.

e_versionThis member identifies the object file version.

Name Value Meaning

EV_NONE 0 Invalid version

EV_CURRENT 1 Current version

The value ``1'' signifies the original file format; extensions will create new versions with highernumbers. Although the value of ``EV_CURRENT'' is shown as 1 in the previous table, it will changeas necessary to reflect the current version number.

e_entryThis member gives the virtual address to which the system first transfers control, thus starting theprocess. If the file has no associated entry point, this member holds zero.

Object files

ELF header 220

Page 228: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

e_phoffThis member holds the program header table's file offset in bytes. If the file has no program headertable, this member holds zero.

e_shoffThis member holds the section header table's file offset in bytes. If the file has no section header table,this member holds zero.

e_flagsThis member holds processor−specific flags associated with the file. Flag names take the formEF_machine_flag.

e_ehsizeThis member holds the ELF header's size in bytes.

e_phentsizeThis member holds the size in bytes of one entry in the file's program header table; all entries are thesame size.

e_phnumThis member holds the number of entries in the program header table. Thus the product ofe_phentsize and e_phnum gives the table's size in bytes. If a file has no program header table,e_phnum holds the value zero.

e_shentsizeThis member holds a section header's size in bytes. A section header is one entry in the section headertable; all entries are the same size.

e_shnumThis member holds the number of entries in the section header table. Thus the product of e_shentsizeand e_shnum gives the section header table's size in bytes. If a file has no section header table,e_shnum holds the value zero.

e_shstrndxThis member holds the section header table index of the entry associated with the section name stringtable. If the file has no section name string table, this member holds the value SHN_UNDEF.

NOTE: See ``Sections'' and ``String table'' for more information.

ELF identification

ELF provides an object file framework to support multiple processors, multiple data encodings, and multipleclasses of machines. To support this object file family, the initial bytes of the file specify how to interpret thefile, independent of the processor on which the inquiry is made and independent of the file's remainingcontents.

The initial bytes of an ELF header (and an object file) correspond to the e_ident member.

Object files

ELF identification 221

Page 229: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

Name Value Purpose

EI_MAG0 0 File identification

EI_MAG1 1 File identification

EI_MAG2 2 File identification

EI_MAG3 3 File identification

EI_CLASS 4 File class

EI_DATA 5 Data encoding

EI_VERSION 6 File version

EI_OSABI 7 Operating system/ABI identification

EI_ABIVERSION 8 ABI version

EI_PAD 9 Start of padding bytes

EI_NIDENT 10 Size of e_ident[]

e_ident[ ] identification indexes

These indexes access bytes that hold the following values.

EI_MAG0 to EI_MAG3A file's first 4 bytes hold a ``magic number,'' identifying the file as an ELF object file.

Name Value Position

ELFMAG0 0x7f e_ident[EI_MAG0]

ELFMAG1 'E' e_ident[EI_MAG1]

ELFMAG2 'L' e_ident[EI_MAG2]

ELFMAG3 'F' e_ident[EI_MAG3]

EI_CLASSThe next byte, e_ident[EI_CLASS], identifies the file's class, or capacity.

Name Value Meaning

ELFCLASSNONE 0 Invalidclass

ELFCLASS32 1 32−bitobjects

ELFCLASS64 2 64−bitobjects

The file format is designed to be portable among machines of various sizes, without imposing thesizes of the largest machine on the smallest. The class of the file defines the basic types used by thedata structures of the object file container itself. The data contained in object file sections may followa different programming model. If so, the processor supplement describes the model used.

Class ELFCLASS32 supports machines with 32−bit architectures. It uses the basic types defined inthe table labeled ``32−Bit Data Types.''

Object files

ELF identification 222

Page 230: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

Class ELFCLASS64 supports machines with 64−bit architectures. It uses the basic types defined inthe table labeled ``64−Bit Data Types.''

Other classes will be defined as necessary, with different basic types and sizes for object file data.

EI_DATAByte e_ident[EI_DATA] specifies the data encoding of the processor−specific data in the object file.The following encodings are currently defined.

Name Value Meaning

ELFDATANONE 0 Invalid data encoding

ELFDATA2LSB 1 See below

ELFDATA2MSB 2 See below

Other values are reserved and will be assigned to new encodings as necessary.

NOTE: Primarily for the convenience of code that looks at the ELF file at runtime, the ELF datastructures are intended to have the same byte order as that of the running program.

EI_VERSIONByte e_ident[EI_VERSION] specifies the ELF header version number. Currently, this value must beEV_CURRENT, for e_version.

EI_OSABIByte e_ident[EI_OSABI] identifies the operating system and ABI to which the object is targeted.Some fields in other ELF structures have flags and values that have operating system and/or ABIspecific meanings; the interpretation of those fields is determined by the value of this byte. The valueof this byte must be interpreted differently for each machine. That is, each value for the e_machinefield determines a set of values for the EI_OSABI byte. Values are assigned by the ABI processorsupplement for each machine. If the processor supplement does not specify a set of values, the value 0shall be used and indicates unspecified.

EI_ABIVERSIONByte e_ident[EI_ABIVERSION] identifies the version of the ABI to which the object is targeted.This field is used to distinguish among incompatible versions of an ABI. The interpretation of thisversion number is dependent on the ABI identified by the EI_OSABI field. If no values are specifiedfor the EI_OSABI field by the processor supplement or no version values are specified for the ABIdetermined by a particular value of the EI_OSABI byte, the value 0 shall be used for theEI_ABIVERSION byte; it indicates unspecified.

EI_PADThis value marks the beginning of the unused bytes in e_ident. These bytes are reserved and set tozero; programs that read object files should ignore them. The value of EI_PAD will change in thefuture if currently unused bytes are given meanings.

A file's data encoding specifies how to interpret the basic objects in a file. Class ELFCLASS32 files useobjects that occupy 1, 2, and 4 bytes. Class ELFCLASS64 files use objects that occupy 1, 2, 4, and 8 bytes.Under the defined encodings, objects are represented as shown below.

Object files

ELF identification 223

Page 231: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

Encoding ELFDATA2LSB specifies 2's complement values, with the least significant byte occupying thelowest address.

Data encoding ELFDATA2LSB

Encoding ELFDATA2MSB specifies 2's complement values, with the most significant byte occupying thelowest address.

Object files

ELF identification 224

Page 232: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

Data encoding ELFDATA2MSB

Sections

An object file's section header table lets one locate all the file's sections. The section header table is an arrayof Elf32_Shdr or Elf64_Shdr structures as described in A``Special section indexes''. into this array. The ELFheader's e_shoff member gives the byte offset from the beginning of the file to the section header table.e_shnum specifies how many entries the section header table contains. e_shentsize specifies the size inbytes of each entry.

Some section header table indexes are reserved; an object file will not have sections for these special indexes.

Name Value

SHN_UNDEF 0

SHN_LORESERVE0xff00

SHN_LOPROC 0xff00

SHN_HIPROC 0xff1f

SHN_LOOS 0xff20

SHN_HIOS 0xff3f

SHN_ABS 0xfff1

SHN_COMMON 0xfff2

SHN_HIRESERVE 0xffff

Special section indexes

Object files

Sections 225

Page 233: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

``SHN_UNDEF''This value marks an undefined, missing, irrelevant, or otherwise meaningless section reference. Forexample, a symbol ``defined'' relative to section number ``SHN_UNDEF'' is an undefined symbol.

NOTE: Although index 0 is reserved as the undefined value, the section header table contains an entry forindex 0. If the e_shnum member of the ELF header says a file has 6 entries in the section header table, theyhave the indexes 0 through 5. The contents of the initial entry are described later in this section.

``SHN_LORESERVE''This value specifies the lower bound of the range of reserved indexes.

``SHN_LOPROC'' through ``SHN_HIPROC''Values in this inclusive range are reserved for processor−specific semantics.

``SHN_LOOS'' through ``SHN_HIOS''Values in this inclusive range are reserved for operating system−specific semantics.

``SHN_ABS''This value specifies absolute values for the corresponding reference. For example, symbols definedrelative to section number ``SHN_ABS'' have absolute values and are not affected by relocation.

``SHN_COMMON''Symbols defined relative to this section are common symbols, such as FORTRAN COMMON orunallocated C external variables.

``SHN_HIRESERVE''This value specifies the upper bound of the range of reserved indexes. The system reserves indexesbetween ``SHN_LORESERVE'' and ``SHN_HIRESERVE'', inclusive; the values do not reference thesection header table. The section header table does not contain entries for the reserved indexes.

Sections contain all information in an object file except the ELF header, the program header table, and thesection header table. Moreover, object files' sections satisfy several conditions.

Every section in an object file has exactly one section header describing it. Section headers may existthat do not have a section.

Each section occupies one contiguous (possibly empty) sequence of bytes within a file.• Sections in a file may not overlap. No byte in a file resides in more than one section.• An object file may have inactive space. The various headers and the sections might not ``cover'' everybyte in an object file. The contents of the inactive data are unspecified.

A section header has the following structure.

typedef struct { Elf32_Word sh_name; Elf32_Word sh_type; Elf32_Word sh_flags; Elf32_Addr sh_addr; Elf32_Off sh_offset; Elf32_Word sh_size;

Object files

Sections 226

Page 234: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

Elf32_Word sh_link; Elf32_Word sh_info; Elf32_Word sh_addralign; Elf32_Word sh_entsize; } Elf32_Shdr;

typedef struct { Elf64_Word sh_name; Elf64_Word sh_type; Elf64_Word sh_flags; Elf64_Addr sh_addr;Elf64_Off sh_offset; Elf64_Word sh_size; Elf64_Word sh_link; Elf64_Word sh_info; Elf64_Wordsh_addralign; Elf64_Word sh_entsize; } Elf64_Shdr;

Section header

sh_nameThis member specifies the name of the section. Its value is an index into the section header stringtable section [See ``String table'' for more information], giving the location of a null−terminatedstring.

sh_typeThis member categorizes the section's contents and semantics. Section types and their descriptionsappear in ``Section types, sh_type''.

sh_flagsSections support 1−bit flags that describe miscellaneous attributes. Flag definitions appear in``Section types, sh_type''.

sh_addrIf the section will appear in the memory image of a process, this member gives the address at whichthe section's first byte should reside. Otherwise, the member contains 0.

sh_offsetThis member's value gives the byte offset from the beginning of the file to the first byte in the section.One section type, SHT_NOBITS described in ``Section types, sh_type'', occupies no space in the file,and its sh_offset member locates the conceptual placement in the file.

sh_sizeThis member gives the section's size in bytes. Unless the section type is SHT_NOBITS, the sectionoccupies sh_size bytes in the file. A section of type SHT_NOBITS may have a non−zero size, but itoccupies no space in the file.

sh_linkThis member holds a section header table index link, whose interpretation depends on the sectiontype. ``Section types, sh_type'' describes the values.

sh_infoThis member holds extra information, whose interpretation depends on the section type. ``Sectiontypes, sh_type'' describes the values.

sh_addralignSome sections have address alignment constraints. For example, if a section holds a doubleword, thesystem must ensure doubleword alignment for the entire section. The value of sh_addr must be

Object files

Sections 227

Page 235: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

congruent to 0, modulo the value of sh_addralign. Currently, only 0 and positive integral powers oftwo are allowed. Values 0 and 1 mean the section has no alignment constraints.

sh_entsizeSome sections hold a table of fixed−size entries, such as a symbol table. For such a section, thismember gives the size in bytes of each entry. The member contains 0 if the section does not hold atable of fixed−size entries.

A section header's sh_type member specifies the section's semantics.

Name Value

SHT_NULL 0

SHT_PROGBITS 1

SHT_SYMTAB 2

SHT_STRTAB 3

SHT_RELA 4

SHT_HASH 5

SHT_DYNAMIC 6

SHT_NOTE 7

SHT_NOBITS 8

SHT_REL 9

SHT_SHLIB 10

SHT_DYNSYM 11

SHT_INIT_ARRAY 14

SHT_FINI_ARRAY 15

SHT_PREINIT_ARRAY 16

SHT_LOOS 0x60000000

SHT_HIOS 0x6fffffff

SHT_LOPROC 0x70000000

SHT_HIPROC 0x7fffffff

SHT_LOUSER 0x80000000

SHT_HIUSER 0xffffffff

Section types, sh_type

SHT_NULLThis value marks the section header as inactive; it does not have an associated section. Othermembers of the section header have undefined values.

SHT_PROGBITSThe section holds information defined by the program, whose format and meaning are determinedsolely by the program.

SHT_SYMTAB and SHT_DYNSYM

Object files

Sections 228

Page 236: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

These sections hold a symbol table. Currently, an object file may have only one section of each type,but this restriction may be relaxed in the future. Typically, SHT_SYMTAB provides symbols for linkediting, though it may also be used for dynamic linking. As a complete symbol table, it may containmany symbols unnecessary for dynamic linking. Consequently, an object file may also contain aSHT_DYNSYM section, which holds a minimal set of dynamic linking symbols, to save space.

NOTE: See ``Symbol table'' for details.

SHT_STRTABThe section holds a string table. An object file may have multiple string table sections.

NOTE: See ``String table'' for details.

SHT_RELAThe section holds relocation entries with explicit addends, such as type Elf32_Rela for the 32−bit ortype Elf64_Rela for the 64−bit class of object files. An object file may have multiple relocationsections.

NOTE: See ``Relocation'' for details.

SHT_HASHThe section holds a symbol hash table. Currently, an object file may have only one hash table, but thisrestriction may be relaxed in the future. See ``Hash table'' for details.

SHT_DYNAMICThe section holds information for dynamic linking. Currently, an object file may have only onedynamic section, but this restriction may be relaxed in the future. See ``Dynamic section'' for details.

SHT_NOTEThe section holds information that marks the file in some way. See ``Note section'' for details.

SHT_NOBITSA section of this type occupies no space in the file but otherwise resembles SHT_PROGBITS.Although this section contains no bytes, the sh_offset member contains the conceptual file offset.

SHT_RELThe section holds relocation entries without explicit addends, such as type Elf32_Rel for the 32−bitclass of object files. An object file may have multiple relocation sections.

NOTE: See ``Relocation'' for details.

SHT_SHLIBThis section type is reserved but has unspecified semantics.

SHT_INIT_ARRAY

Object files

Sections 229

Page 237: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

This section contains an array of pointers to initialization functions, as described in ``Initialization andtermination functions''. Each pointer in the array is taken as a parameterless procedure with a voidreturn.

SHT_FINI_ARRAYThis section contains an array of pointers to termination functions, as described in ``Initialization andtermination functions''. Each pointer in the array is taken as a parameterless procedure with a voidreturn.

SHT_PREINIT_ARRAYThis section contains an array of pointers to functions that are invoked before all other initializationfunctions, as described in ``Initialization and termination functions''. Each pointer in the array is takenas a parameterless procedure with a void return.

SHT_LOOS through SHT_HIOSValues in this inclusive range are reserved for operating system−specific semantics.

SHT_LOPROC through SHT_HIPROCValues in this inclusive range are reserved for processor−specific semantics.

SHT_LOUSERThis value specifies the lower bound of the range of indexes reserved for application programs.

SHT_HIUSERThis value specifies the upper bound of the range of indexes reserved for application programs.Section types between SHT_LOUSER and SHT_HIUSER may be used by the application, withoutconflicting with current or future system−defined section types.

Other section type values are reserved. As mentioned before, the section header for index 0 (SHN_UNDEF)exists, even though the index marks undefined section references. This entry holds the following.

Name Value Note

sh_name 0 No name

sh_type SHT_NULL Inactive

sh_flags 0 No flags

sh_addr 0 No address

sh_offset 0 No file offset

sh_size 0 No size

sh_link SHN_UNDEFNo link information

sh_info 0 No auxiliary information

sh_addralign 0 No alignment

sh_entsize 0 No entries

Section header table entry: Index 0

A section header's sh_flags member holds 1−bit flags that describe the section's attributes. Defined valuesappear in the table shown below, other values are reserved.

Object files

Sections 230

Page 238: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

Name Value

SHF_WRITE 0x1

SHF_ALLOC 0x2

SHF_EXECINSTR 0x4

SHF_MERGE 0x10

SHF_STRINGS 0x20

SHF_INFO_LINK 0x40

SHF_LINK_ORDER 0x80

SHF_OS_NONCONFORMING 0x100

SHF_MASKOS 0x0ff00000

SHF_MASKPROC 0xf0000000

Section attribute flags, sh_flags

If a flag bit is set in sh_flags, the attribute is on for the section. Otherwise, the attribute is off or does notapply. Undefined attributes are set to zero.

SHF_WRITEThe section contains data that should be writable during process execution.

SHF_ALLOCThe section occupies memory during process execution. Some control sections do not reside in thememory image of an object file; this attribute is off for those sections.

SHF_EXECINSTRThe section contains executable machine instructions.

SHF_MERGEThe data in the section may be merged to eliminate duplication. Unless the SHF_STRINGS flag isalso set, the data elements in the section are of a uniform size. The size of each element is specified inthe section header's sh_entsize field. If the SHF_STRINGS flag is also set, the data elements consistof null−terminated character strings. The size of each character is specified in the section header'ssh_entsize field.

Each element in the section is compared against other elements in sections with the same name, typeand flags. Elements that would have identical values at program run−time may be merged.Relocations referencing elements of such sections must be resolved to the merged locations of thereferenced values. Note that any relocatable values, including values that would result in run−timerelocations, must be analyzed to determine whether the run−time values would actually be identical.An ABI−conforming object file may not depend on specific elements being merged, and anABI−conforming link editor may choose not to merge specific elements.

SHF_STRINGSThe data elements in the section consist of null−terminated character strings. The size of eachcharacter is specified in the section header's sh_entsize field.

SHF_INFO_LINK

Object files

Sections 231

Page 239: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

The sh_info field of this section header holds a section header table index.

SHF_LINK_ORDERThis flag adds special ordering requirements for link editors. The requirements apply if the sh_linkfield of this section's header references another section (the linked−to section). If this section iscombined with other sections in the output file, it must appear in the same relative order with respectto those sections, as the linked−to section appears with respect to sections the linked−to section iscombined with.

NOTE: A typical use of this flag is to build a table that references text or data sections in addressorder.

SHF_OS_NONCONFORMINGThis section requires special OS−specific processing (beyond the standard linking rules) to avoidincorrect behavior. If this section has either an sh_type value or contains sh_flags bits in theOS−specific ranges for those fields, and a link editor processing this section does not recognize thosevalues, then the link editor should reject the object file containing this section with an error.

SHF_MASKOSAll bits included in this mask are reserved for operating system−specific semantics.

SHF_MASKPROCAll bits included in this mask are reserved for processor−specific semantics.

Two members in the section header, sh_link and sh_info, hold special information, depending on sectiontype.

sh_type sh_link sh_info

SHT_DYNAMICThe section header index of the stringtable used by entries in the section.

0

SHT_HASH The section header index of the symboltable to which the hash table applies.

0

SHT_RELSHT_RELA

The section header index of theassociated symbol table.

The section header index of the section to whichthe relocation applies.

SHT_SYMTABSHT_DYNSYM

The section header index of theassociated string table.

One greater than the symbol table index of thelast local symbol (binding STB_LOCAL).

sh_link and sh_info interpretation

Rules for linking unrecognized sections

If a link editor encounters sections whose headers contain OS−specific values it does not recognize in thesh_type or sh_flags fields, the link editor should combine those sections as described below. If the section'ssh_flags bits include the attribute SHF_OS_NONCONFORMING, then the section requires specialknowledge to be correctly processed, and the link editor should reject the object containing the section with anerror.

Object files

Rules for linking unrecognized sections 232

Page 240: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

Unrecognized sections that do not have the SHF_OS_NONCONFORMING attribute, are combined in atwo−phase process. As the link editor combines sections using this process, it must honor the alignmentconstraints of the input sections (asserted by the sh_addralign field), padding between sections with zerobytes, if necessary, and producing a combination with the maximum alignment constraint of its componentinput sections.

In the first phase, input sections that match in name, type and attribute flags should be concatenatedinto single sections. The concatenation order should satisfy the requirements of any known inputsection attributes (for example, SHF_MERGE and SHF_LINK_ORDER). When not otherwiseconstrained, sections should be emitted in input order.

1.

In the second phase, sections should be assigned to segments or other units based on their attributeflags. Sections of each particular unrecognized type should be assigned to the same unit unlessprevented by incompatible flags, and within a unit, sections of the same unrecognized type should beplaced together if possible.

2.

Non OS−specific processing (for example, relocation) should be applied to unrecognized section types. Anoutput section header table, if present, should contain entries for unknown sections. Any unrecognized sectionattribute flags should be removed.

NOTE: It is recommended that link editors follow the same two−phase ordering approach described abovewhen linking sections of known types. Padding between such sections may have values different from zero,where appropriate.

Special sections

Various sections hold program and control information. ``Special sections'' shows sections that are used by thesystem and have the indicated types and attributes.

Name Type Attributes

.bss SHT_NOBITS SHF_ALLOC+SHF_WRITE

.comment SHT_PROGBITS none

.data SHT_PROGBITS SHF_ALLOC+SHF_WRITE

.data1 SHT_PROGBITS SHF_ALLOC+SHF_WRITE

.debug SHT_PROGBITS none

.dynamic SHT_DYNAMIC see below

.dynstr SHT_STRTAB SHF_ALLOC

.dynsym SHT_DYNSYM SHF_ALLOC

.fini SHT_PROGBITS SHF_ALLOC+SHF_EXECINSTR

.fini_array SHT_FINI_ARRAY SHF_ALLOC+SHF_WRITE

.got SHT_PROGBITS see below

.hash SHT_HASH SHF_ALLOC

.init SHT_PROGBITS SHF_ALLOC+SHF_EXECINSTR

.init_array SHT_INIT_ARRAY SHF_ALLOC+SHF_WRITE

.interp SHT_PROGBITS none

.line SHT_PROGBITS none

Object files

Special sections 233

Page 241: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

.note SHT_NOTE none

.plt SHT_PROGBITS see below

.preinit_array SHT_PREINIT_ARRAY SHF_ALLOC+SHF_WRITE

.relname SHT_REL see below

.relaname SHT_RELA see below

.rodata SHT_PROGBITS SHF_ALLOC

.rodata1 SHT_PROGBITS SHF_ALLOC

.shstrtab SHT_STRTAB none

.strtab SHT_STRTAB see below

.symtab SHT_SYMTAB see below

.text SHT_PROGBITS SHF_ALLOC+SHF_EXECINSTR

Special sections

.bssThis section holds uninitialized data that contribute to the program's memory image. By definition,the system initializes the data with zeros when the program begins to run. The section occupies no filespace, as indicated by the section type, SHT_NOBITS.

.commentThis section holds version control information.

.data and .data1These sections hold initialized data that contribute to the program's memory image.

.debugThis section holds information for symbolic debugging. The contents are unspecified.

.dynamicThis section holds dynamic linking information. The section's attributes will include theSHF_ALLOC bit. Whether the SHF_WRITE bit is set is processor specific.

.dynstrThis section holds strings needed for dynamic linking, most commonly the strings that represent thenames associated with symbol table entries.

.dynsymThis section holds the dynamic linking symbol table, as described in ``Symbol table''.

.finiThis section holds executable instructions that contribute to the process termination code. When aprogram exits normally, the system arranges to execute the code in this section.

.fini_arrayThis section holds an array of function pointers that contributes to a single termination array for theexecutable or shared object containing the section.

Object files

Special sections 234

Page 242: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

.gotThis section holds the global offset table. See ``Global offset table'' for more information.

.hashThis section holds a symbol hash table. See ``Hash table'' for more information.

.initThis section holds executable instructions that contribute to the process initialization code. When aprogram starts to run, the system arranges to execute the code in this section before calling the mainprogram entry point (called main in C programs).

.init_arrayThis section holds an array of function pointers that contributes to a single initialization array for theexecutable or shared object containing the section.

.interpThis section holds the path name of a program interpreter. See ``Program interpreter'' for moreinformation.

.lineThis section holds line number information for symbolic debugging, which describes thecorrespondence between the source program and the machine code. The contents are unspecified.

.noteThis section holds information in the format described in ``Note section''.

.pltThis section holds the procedure linkage table. See ``Procedure linkage table'' for more information.

.preinit_arrayThis section holds an array of function pointers that contributes to a single pre−initialization array forthe executable or shared object containing the section.

.relname and .relanameThese sections hold relocation information, as described in ``Relocation''. If the file has a loadablesegment that includes relocation, the sections' attributes will include the SHF_ALLOC bit; otherwise,that bit will be off. Conventionally, name is supplied by the section to which the relocations apply.Thus a relocation section for .text normally would have the name .rel.text or .rela.text.

.rodata and .rodata1These sections hold read−only data that typically contribute to a non−writable segment in the processimage. See ``Program header'' for more information.

.shstrtabThis section holds section names.

.strtabThis section holds strings, most commonly the strings that represent the names associated withsymbol table entries. If the file has a loadable segment that includes the symbol string table, thesection's attributes will include the f4SHF_ALLOC bit; otherwise, that bit will be off.

Object files

Special sections 235

Page 243: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

.symtabThis section holds a symbol table, as described in ``Symbol table''. If the file has a loadable segmentthat includes the symbol table, the section's attributes will include the SHF_ALLOC bit; otherwise,that bit will be off.

.textThis section holds the ``text,'' or executable instructions, of a program.

Section names with a dot (.) prefix are reserved for the system, although applications may use these sections iftheir existing meanings are satisfactory. Applications may use names without the prefix to avoid conflictswith system sections. The object file format lets one define sections not shown in the previous list. An objectfile may have more than one section with the same name.

Section names reserved for a processor architecture are formed by placing an abbreviation of the architecturename ahead of the section name. The name should be taken from the architecture names used for e_machine.For instance .FOO.psect is the psect section defined by the FOO architecture. Existing extensions are calledby their historical names.

.conflict•

.gptab•

.liblist•

.lit4•

.lit8•

.reginfo•

.sbss•

.sdata•

.tdesc•

NOTE: For information on processor−specific sections, see the ABI supplement for the desired processor.

String table

String table sections hold null−terminated character sequences, commonly called strings. The object file usesthese strings to represent symbol and section names. One references a string as an index into the string tablesection. The first byte, which is index zero, is defined to hold a null character. Likewise, a string table's lastbyte is defined to hold a null character, ensuring null termination for all strings. A string whose index is zerospecifies either no name or a null name, depending on the context. An empty string table section is permitted;its section header's sh_size member would contain zero. Non−zero indexes are invalid for an empty stringtable.

A section header's sh_name member holds an index into the section header string table section, as designatedby the e_shstrndx member of the ELF header. The following figures show a string table with 25 bytes and thestrings associated with various indexes.

Index +0 +1 +2 +3 +4 +5 +6 +7 +8 +9

0 \0 n a m e . \0 V a r

10 i a b l e \0 a b l e

Object files

String table 236

Page 244: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

20 \0 \0 x x \0

String table

Index String

0 none

1 name.

7 Variable

11 able

16 able

24 null string

String table indexes

As the example shows, a string table index may refer to any byte in the section. A string may appear morethan once; references to substrings may exist; and a single string may be referenced multiple times.Unreferenced strings also are allowed.

Symbol table

An object file's symbol table holds information needed to locate and relocate a program's symbolic definitionsand references. A symbol table index is a subscript into this array. Index 0 both designates the first entry in thetable and serves as the undefined symbol index. The contents of the initial entry are specified later in thissection.

Name Value

STN_UNDEF 0

A symbol table entry has the following format.

typedef struct { Elf32_Word st_name; Elf32_Addr st_value; Elf32_Word st_size; unsigned char st_info; unsigned char st_other; Elf32_Half st_shndx; } Elf32_Sym;

typedef struct { Elf64_Word st_name; unsigned char st_info; unsigned char st_other; Elf64_Half st_shndx;Elf64_Addr st_value; Elf64_Xword st_size; } Elf64_Sym;

Symbol table entry

st_nameThis member holds an index into the object file's symbol string table, which holds the characterrepresentations of the symbol names. If the value is non−zero, it represents a string table index that

Object files

Symbol table 237

Page 245: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

gives the symbol name. Otherwise, the symbol table entry has no name.

NOTE: External C symbols have the same names in C and object files' symbol tables.

st_valueThis member gives the value of the associated symbol. Depending on the context, this may be anabsolute value, an address and so on; more details are given below.

st_sizeMany symbols have associated sizes. For example, a data object's size is the number of bytescontained in the object. This member holds 0 if the symbol has no size or an unknown size.

st_infoThis member specifies the symbol's type and binding attributes. A list of the values and meaningsappears below. The following code shows how to manipulate the values for both 32 and 64−bitobjects.

#define ELF32_ST_BIND(i) ((i)>>4) #define ELF32_ST_TYPE(i) ((i)&0xf) #define ELF32_ST_INFO(b,t) (((b)<<4)+((t)&0xf))

#define ELF64_ST_BIND(i) ((i)>>4) #define ELF64_ST_TYPE(i) ((i)&0xf) #defineELF64_ST_INFO(b,t) (((b)<<4)+((t)&0xf))

st_otherThis member currently specifies a symbol's visibility. A list of the values and meanings appearsbelow. The following code shows how to manipulate the values for both 32 and 64−bit objects. Otherbits contain 0 and have no defined meaning.

#define ELF32_ST_VISIBILITY(o) ((o)&0x3) #define ELF32_ST_OTHER(v) ((v)&0x3)

#define ELF64_ST_VISIBILITY(o) ((o)&0x3) #define ELF64_ST_OTHER(v) ((v)&0x3)

st_shndxEvery symbol table entry is defined in relation to some section. This member holds the relevantsection header table index. As the sh_link and sh_info interpretation table and the related textdescribe, some section indexes indicate special meanings.

A symbol's binding determines the linkage visibility and behavior.

Name Value

STB_LOCAL 0

STB_GLOBAL 1

STB_WEAK 2

STB_LOOS 10

Object files

Symbol table 238

Page 246: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

STB_HIOS 12

STB_LOPROC 13

STB_HIPROC 15

Symbol binding

STB_LOCALLocal symbols are not visible outside the object file containing their definition. Local symbols of thesame name may exist in multiple files without interfering with each other.

STB_GLOBALGlobal symbols are visible to all object files being combined. One file's definition of a global symbolwill satisfy another file's undefined reference to the same global symbol.

STB_WEAKWeak symbols resemble global symbols, but their definitions have lower precedence.

STB_LOOS through STB_HIOSValues in this inclusive range are reserved for operating system−specific semantics.

STB_LOPROC through STB_HIPROCValues in this inclusive range are reserved for processor−specific semantics.

Global and weak symbols differ in two major ways.

When the link editor combines several relocatable object files, it does not allow multiple definitionsof STB_GLOBAL symbols with the same name. On the other hand, if a defined global symbolexists, the appearance of a weak symbol with the same name will not cause an error. The link editorhonors the global definition and ignores the weak ones. Similarly, if a common symbol exists (forexample, a symbol whose st_shndx field holds SHN_COMMON), the appearance of a weak symbolwith the same name will not cause an error. The link editor honors the common definition and ignoresthe weak one.

When the link editor searches archive libraries, it extracts archive members that contain definitions ofundefined global symbols. The member's definition may be either a global or a weak symbol. The linkeditor does not extract archive members to resolve undefined weak symbols. Unresolved weaksymbols have a zero value.

NOTE: The behavior of weak symbols in areas not specified by this document is implementation defined.Weak symbols are intended primarily for use in system software. Their use in application programs isdiscouraged.

In each symbol table, all symbols with STB_LOCAL binding precede the weak and global symbols. Asdescribed in ``Sections'', a symbol table section's sh_info section header member holds the symbol table indexfor the first non−local symbol.

A symbol's type provides a general classification for the associated entity.

Name Value

Object files

Symbol table 239

Page 247: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

STT_NOTYPE 0

STT_OBJECT 1

STT_FUNC 2

STT_SECTION 3

STT_FILE 4

STT_COMMON 5

STT_LOOS 10

STT_HIOS 12

STT_LOPROC 13

STT_HIPROC 15

Symbol types

STT_NOTYPEThe symbol's type is not specified.

STT_OBJECTThe symbol is associated with a data object, such as a variable, an array, and so on.

STT_FUNCThe symbol is associated with a function or other executable code.

STT_SECTIONThe symbol is associated with a section. Symbol table entries of this type exist primarily forrelocation and normally have STB_LOCAL binding.

STT_FILEConventionally, the symbol's name gives the name of the source file associated with the object file. Afile symbol has STB_LOCAL binding, its section index is SHN_ABS, and it precedes the otherSTB_LOCAL symbols for the file, if it is present.

STT_COMMONThe symbol labels an uninitialized common block. This is described in greater detail below.

STT_LOOS through STT_HIOSValues in this inclusive range are reserved for operating system−specific semantics.

STT_LOPROC through STT_HIPROCValues in this inclusive range are reserved for processor−specific semantics. If meanings arespecified, the processor supplement explains them.

Function symbols (those with type STT_FUNC) in shared object files have special significance. Whenanother object file references a function from a shared object, the link editor automatically creates a procedurelinkage table entry for the referenced symbol. Shared object symbols with types other than STT_FUNC willnot be referenced automatically through the procedure linkage table.

Symbols with type STT_COMMON label uninitialized common blocks. In relocatable objects, thesesymbols are not allocated and must have the special section index SHN_COMMON. In shared objects and

Object files

Symbol table 240

Page 248: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

executables these symbols must be allocated to some section in the defining object.

In relocatable objects, symbols with type STT_COMMON are treated just as other symbols with indexSHN_COMMON. If the link−editor allocates space for the SHN_COMMON symbol in an output section ofthe object it is producing, it must preserve the type of the output symbol as STT_COMMON.

When the dynamic linker encounters a reference to a symbol that resolves to a definition of typeSTT_COMMON, it may (but is not required to) change its symbol resolution rules as follows: instead ofbinding the reference to the first symbol found with the given name, the dynamic linker searches for the firstsymbol with that name with type other than STT_COMMON. If no such symbol is found, it looks for theSTT_COMMON definition of that name that has the largest size.

A symbol's visibility, although it may be specified in a relocatable object, defines how that symbol may beaccessed once it has become part of an executable or shared object.

Name Value

STV_DEFAULT 0

STV_INTERNAL 1

STV_HIDDEN 2

STV_PROTECTED 3

Symbol visibility

STV_DEFAULTThe visibility of symbols with the STV_DEFAULT attribute is as specified by the symbol's bindingtype. That is, global and weak symbols are visible outside of their defining component (executablefile or shared object). Local symbols are hidden, as described below. Global and weak symbols arealso preemptable, that is, they may by preempted by definitions of the same name in anothercomponent.

NOTE: An implementation may restrict the set of global and weak symbols that are externallyvisible.

STV_PROTECTEDA symbol defined in the current component is protected if it is visible in other components but notpreemptable, meaning that any reference to such a symbol from within the defining component mustbe resolved to the definition in that component, even if there is a definition in another component thatwould preempt by the default rules. A symbol with STB_LOCAL binding may not haveSTV_PROTECTED visibility.

STV_HIDDENA symbol defined in the current component is hidden if its name is not visible to other components.Such a symbol is necessarily protected. This attribute may be used to control the external interface ofa component. Note that an object named by such a symbol may still be referenced from anothercomponent if its address is passed outside.

A hidden symbol contained in a relocatable object must be either removed or converted to

Object files

Symbol table 241

Page 249: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

STB_LOCAL binding by the link−editor when the relocatable object is included in an executable fileor shared object.

STV_INTERNALThe meaning of this visibility attribute may be defined by processor supplements to further constrainhidden symbols. A processor supplement's definition should be such that generic tools can safely treatinternal symbols as hidden.

An internal symbol contained in a relocatable object must be either removed or converted toSTB_LOCAL binding by the link−editor when the relocatable object is included in an executable fileor shared object.

None of the visibility attributes affects resolution of symbols within an executable or shared object duringlink−editing − such resolution is controlled by the binding type. Once the link−editor has chosen itsresolution, these attributes impose two requirements, both based on the fact that references in the code beinglinked may have been optimized to take advantage of the attributes.

First, all of the non−default visibility attributes, when applied to a symbol reference, imply that adefinition to satisfy that reference must be provided within the current executable or shared object. Ifsuch a symbol reference has no definition within the component being linked, then the reference musthave STB_WEAK binding and is resolved to zero.

Second, if any reference to or definition of a name is a symbol with a non−default visibility attribute,the visibility attribute must be propagated to the resolving symbol in the linked object. If differentvisibility attributes are specified for distinct references to or definitions of a symbol, the mostconstraining visibility attribute must be propagated to the resolving symbol in the linked object. Theattributes, ordered from least to most constraining, are: STV_PROTECTED, STV_HIDDEN andSTV_INTERNAL .

If a symbol's value refers to a specific location within a section, its section index member, st_shndx, holds anindex into the section header table. As the section moves during relocation, the symbol's value changes aswell, and references to the symbol continue to ``point'' to the same location in the program. Some specialsection index values give other semantics.

SHN_ABSThe symbol has an absolute value that will not change because of relocation.

SHN_COMMONThe symbol labels a common block that has not yet been allocated. The symbol's value givesalignment constraints, similar to a section's sh_addralign member. The link editor will allocate thestorage for the symbol at an address that is a multiple of st_value. The symbol's size tells how manybytes are required. Symbols with section index SHN_COMMON can appear only in relocatableobjects.

SHN_UNDEFThis section table index means the symbol is undefined. When the link editor combines this object filewith another that defines the indicated symbol, this file's references to the symbol will be linked to theactual definition.

The symbol table entry for index 0 (STN_UNDEF) is reserved and holds the following values:

Object files

Symbol table 242

Page 250: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

Name Value Note

st_name 0 No name

st_value 0 Zero value

st_size 0 No size

st_info 0 No type, local binding

st_other 0

st_shndx SHN_UNDEFNo section

Symbol table entry: Index 0

Symbol values

Symbol table entries for different object file types have slightly different interpretations for the st_valuemember.

In relocatable files, st_value holds alignment constraints for a symbol whose section index isSHN_COMMON.

In relocatable files, st_value holds a section offset for a defined symbol. st_value is an offset fromthe beginning of the section that st_shndx identifies.

In executable and shared object files, st_value holds a virtual address. To make these files' symbolsmore useful for the dynamic linker, the section offset (file interpretation) gives way to a virtualaddress (memory interpretation) for which the section number is irrelevant.

Although the symbol table values have similar meanings for different object files, the data allows efficientaccess by the appropriate programs.

Relocation

Relocation is the process of connecting symbolic references with symbolic definitions. For example, when aprogram calls a function, the associated call instruction must transfer control to the proper destination addressat execution. Relocatable files must have ``relocation entries'' which are necessary because they containinformation that describes how to modify their section contents, thus allowing executable and shared objectfiles to hold the right information for a process's program image.

typedef struct { Elf32_Addr r_offset; Elf32_Word r_info; } Elf32_Rel;

typedef struct { Elf32_Addr r_offset; Elf32_Word r_info; Elf32_Sword r_addend; } Elf32_Rela;

typedef struct { Elf64_Addr r_offset; Elf64_Xword r_info; } Elf64_Rel;

typedef struct { Elf64_Addr r_offset; Elf64_Xword r_info; Elf64_Sxword r_addend; } Elf64_Rela;

r_offsetThis member gives the location at which to apply the relocation action. For a relocatable file, the

Object files

Symbol values 243

Page 251: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

value is the byte offset from the beginning of the section to the storage unit affected by the relocation.For an executable file or a shared object, the value is the virtual address of the storage unit affected bythe relocation.

r_infoThis member gives both the symbol table index with respect to which the relocation must be made,and the type of relocation to apply. For example, a call instruction's relocation entry would hold thesymbol table index of the function being called. If the index is STN_UNDEF, the undefined symbolindex, the relocation uses 0 as the ``symbol value''. Relocation types are processor−specific;descriptions of their behavior appear in the processor supplement. When the text below refers to arelocation entry's relocation type or symbol table index, it means the result of applyingELF32_R_TYPE (or ELF64_R_TYPE) or ELF32_R_SYM (or ELF64_R_SYM), respectively, tothe entry's r_info member.

#define ELF32_R_SYM(i) ((i)>>8) #define ELF32_R_TYPE(i) ((unsigned char)(i)) #define ELF32_R_INFO(s,t) (((s)<<8)+(unsigned char)(t))

#define ELF64_R_SYM(i) ((i)>>32) #define ELF64_R_TYPE(i) ((i)&0xffffffffL) #defineELF64_R_INFO(s,t) (((s)<<32)+((t)&0xffffffffL))

Relocation entries

r_addendThis member specifies a constant addend used to compute the value to be stored into the relocatablefield.

As specified previously, only Elf32_Rela and Elf64_Rela entries contain an explicit addend. Entries of typeElf32_Rel and Elf64_Rel store an implicit addend in the location to be modified. Depending on the processorarchitecture, one form or the other might be necessary or more convenient. Consequently, an implementationfor a particular machine may use one form exclusively or either form depending on context.

A relocation section references two other sections: a symbol table and a section to modify. The sectionheader's sh_info and sh_link members, described in ``Sections'' above, specify these relationships. Relocationentries for different object files have slightly different interpretations for the r_offset member.

In relocatable files, r_offset holds a section offset. The relocation section itself describes how tomodify another section in the file; relocation offsets designate a storage unit within the secondsection.

In executable and shared object files, r_offset holds a virtual address. To make these files' relocationentries more useful for the dynamic linker, the section offset (file interpretation) gives way to a virtualaddress (memory interpretation).

Although the interpretation of r_offset changes for different object files to allow efficient access by therelevant programs, the relocation types' meanings stay the same.

The typical application of an ELF relocation is to determine the referenced symbol value, extract the addend(either from the field to be relocated or from the addend field contained in the relocation record, asappropriate for the type of relocation record), apply the expression implied by the relocation type to thesymbol and addend, extract the desired part of the expression result, and place it in the field to be relocated.

Object files

Symbol values 244

Page 252: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

If multiple consecutive relocation records are applied to the same relocation location (r_offset), they arecomposed instead of being applied independently, as described above. By consecutive, we mean that therelocation records are contiguous within a single relocation section. By composed, we mean that the standardapplication described above is modified as follows:

In all but the last relocation operation of a composed sequence, the result of the relocation expressionis retained, rather than having part extracted and placed in the relocated field. The result is retained atfull pointer precision of the applicable ABI processor supplement.

In all but the first relocation operation of a composed sequence, the addend used is the retained resultof the previous relocation operation, rather than that implied by the relocation type.

NOTE: A consequence of the above rules is that the location specified by a relocation type is relevant for thefirst element of a composed sequence (and then only for relocation records that do not contain an explicitaddend field) and for the last element, where the location determines where the relocated value will be placed.For all other relocation operands in a composed sequence, the location specified is ignored.

An ABI processor supplement may specify individual relocation types that always stop a compositionsequence, or always start a new one.

NOTE: This section requires processor−specific information. The ABI supplement for the desired processordescribes the details.

Program header

An executable or shared object file's program header table is an array of structures, each describing a segmentor other information the system needs to prepare the program for execution. An object file segment containsone or more sections, as ``Segment contents'' describes below. Program headers are meaningful only forexecutable and shared object files. A file specifies its own program header size with the ELF header'se_phentsize and e_phnum members. See for more information.

typedef struct { Elf32_Word p_type; Elf32_Off p_offset; Elf32_Addr p_vaddr; Elf32_Addr p_paddr; Elf32_Word p_filesz; Elf32_Word p_memsz; Elf32_Word p_flags; Elf32_Word p_align; } Elf32_Phdr;

typedef struct { Elf64_Word p_type; Elf64_Word p_flags; Elf64_Off p_offset; Elf64_Addr p_vaddr;Elf64_Addr p_paddr; Elf64_Xword p_filesz; Elf64_Xword p_memsz; Elf64_Xword p_align; } Elf64_Phdr;

Relocation types (processor−specific)

p_type

Object files

Program header 245

Page 253: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

This member tells what kind of segment this array element describes or how to interpret the arrayelement's information. Type values and their meanings appear below.

p_offsetThis member gives the offset from the beginning of the file at which the first byte of the segmentresides.

p_vaddrThis member gives the virtual address at which the first byte of the segment resides in memory.

p_paddrOn systems for which physical addressing is relevant, this member is reserved for the segment'sphysical address. Because System V ignores physical addressing for application programs, thismember has unspecified contents for executable files and shared objects.

p_fileszThis member gives the number of bytes in the file image of the segment; it may be zero.

p_memszThis member gives the number of bytes in the memory image of the segment; it may be zero.

p_flagsThis member gives flags relevant to the segment. Defined flag values appear below.

p_alignLoadable process segments must have congruent values for p_vaddr and p_offset, modulo the pagesize. This member gives the value to which the segments are aligned in memory and in the file.Values 0 and 1 mean no alignment is required. Otherwise, p_align should be a positive, integralpower of 2, and p_vaddr should equal p_offset, modulo p_align.

Some entries describe process segments; others give supplementary information and do not contribute to theprocess image. Segment entries may appear in any order, except as explicitly noted below. Defined typevalues follow; other values are reserved for future use.

Name Value

PT_NULL 0

PT_LOAD 1

PT_DYNAMIC 2

PT_INTERP 3

PT_NOTE 4

PT_SHLIB 5

PT_PHDR 6

PT_LOOS 0x60000000

PT_HIOS 0x6fffffff

PT_LOPROC 0x70000000

PT_HIPROC 0x7fffffff

Segment types, p_type table

Object files

Program header 246

Page 254: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

PT_NULLThe array element is unused; other members' values are undefined. This type lets the program headertable have ignored entries.

PT_LOADThe array element specifies a loadable segment, described by p_filesz and p_memsz. The bytes fromthe file are mapped to the beginning of the memory segment. If the segment's memory size(p_memsz) is larger than the file size (p_filesz), the ``extra'' bytes are defined to hold the value 0 andto follow the segment's initialized area. The file size may not be larger than the memory size.Loadable segment entries in the program header table appear in ascending order, sorted on thep_vaddr member.

PT_DYNAMICThe array element specifies dynamic linking information. See ``Dynamic section'' for moreinformation.

PT_INTERPThe array element specifies the location and size of a null−terminated path name to invoke as aninterpreter. This segment type is meaningful only for executable files (though it may occur for sharedobjects); it may not occur more than once in a file. If it is present, it must precede any loadablesegment entry. See ``Program interpreter'' for more information.

PT_NOTEThe array element specifies the location and size of auxiliary information. See ``Note section'' formore information.

PT_SHLIBThis segment type is reserved but has unspecified semantics. Programs that contain an array elementof this type do not conform to the ABI.

PT_PHDRThe array element, if present, specifies the location and size of the program header table itself, both inthe file and in the memory image of the program. This segment type may not occur more than once ina file. Moreover, it may occur only if the program header table is part of the memory image of theprogram. If it is present, it must precede any loadable segment entry. See ``Program interpreter'' formore information.

PT_LOOS through PT_HIOSValues in this inclusive range are reserved for operating system−specific semantics.

PT_LOPROC through PT_HIPROCValues in this inclusive range are reserved for processor−specific semantics. If meanings arespecified, the processor supplement explains them.

NOTE: Unless specifically required elsewhere, all program header segment types are optional. A file'sprogram header table may contain only those elements relevant to its contents.

Object files

Program header 247

Page 255: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

Base address

The virtual addresses in the program headers might not represent the actual virtual addresses of the program'smemory image. Executable files typically contain absolute code. To let the process execute correctly, thesegments must reside at the virtual addresses used to build the executable file. On the other hand, sharedobject segments typically contain position−independent code. This lets a segment's virtual address changefrom one process to another, without invalidating execution behavior. On some platforms, while the systemchooses virtual addresses for individual processes, it maintains the relative position of one segment to anotherwithin any one shared object. Because position−independent code on those platforms uses relative addressingbetween segments, the difference between virtual addresses in memory must match the difference betweenvirtual addresses in the file. The differences between the virtual address of any segment in memory and thecorresponding virtual address in the file is thus a single constant value for any one executable or shared objectin a given process. This difference is the base address. One use of the base address is to relocate the memoryimage of the file during dynamic linking.

An executable or shared object file's base address (on platforms that support the concept) is calculated duringexecution from three values: the virtual memory load address, the maximum page size, and the lowest virtualaddress of a program's loadable segment. To compute the base address, one determines the memory addressassociated with the lowest p_vaddr value for a PT_LOAD segment. This address is truncated to the nearestmultiple of the maximum page size. The corresponding p_vaddr value itself is also truncated to the nearestmultiple of the maximum page size. The base address is the difference between the truncated memory addressand the truncated p_vaddr value.

Segment permissions

A program to be loaded by the system must have at least one loadable segment (although this is not requiredby the file format). When the system creates loadable segments' memory images, it gives access permissionsas specified in the p_flags member.

Name Value Meaning

PF_X 0x1 Execute

PF_W 0x2 Write

PF_R 0x4 Read

PF_MASKOS 0x0ff00000Unspecified

PF_MASKPROC0xf0000000Unspecified

Segment flag bits, p_flags

All bits included in the PF_MASKOS mask are reserved for operating system−specific semantics.

All bits included in the PF_MASKPROC mask are reserved for processor−specific semantics. If meaningsare specified, the processor supplement explains them.

If a permission bit is 0, that type of access is denied. Actual memory permissions depend on the memorymanagement unit, which may vary from one system to another. Although all flag combinations are valid, thesystem may grant more access than requested. In no case, however, will a segment have write permissionunless it is specified explicitly. The following table shows both the exact flag interpretation and the allowableflag interpretation. ABI−conforming systems may provide either.

Object files

Base address 248

Page 256: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

Flags Value Exact Allowable

none 0 All access denied All access denied

PF_X 1 Execute only Read, execute

PF_W 2 Write only Read, write, execute

PF_W+PF_X 3 Write, execute Read, write, execute

PF_R 4 Read only Read, execute

PF_R+PF_X 5 Read, execute Read, execute

PF_R+PF_W 6 Read, write Read, write, execute

PF_R+PF_W+PF_X 7 Read, write, executeRead, write, execute

Segment permissions

For example, typical text segments have read and execute − but not write − permissions. Data segmentsnormally have read, write, and execute permissions.

Segment contents

An object file segment comprises one or more sections, though this fact is transparent to the program header.Whether the file segment holds one or many sections also is immaterial to program loading. Nonetheless,various data must be present for program execution, dynamic linking, and so on. The diagrams belowillustrate segment contents in general terms. The order and membership of sections within a segment mayvary; moreover, processor−specific constraints may alter the examples below. See the processor supplementfor details.

Text segments contain read−only instructions and data, typically including the following sections describedearlier in ``Sections''. Other sections may also reside in loadable segments; these examples are not meant togive complete and exclusive segment contents.

.text

.rodata

.hash

.dynsym

.dynstr

.plt

.rel.got

Text segment

Data segments contain writable data and instructions, typically including the following sections.

.data

.dynamic

.got

.bss

Data segment

Object files

Segment permissions 249

Page 257: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

A PT_DYNAMIC program header element points at the .dynamic section, explained in ``Dynamic section''below. The .got and .plt sections also hold information related to position−independent code and dynamiclinking. Although the .plt appears in a text segment in the previous table, it may reside in a text or a datasegment, depending on the processor. See ``Global offset table'' and ``Procedure linkage table'' for moreinformation.

As described in ``Sections'', the .bss section has the type SHT_NOBITS. Although it occupies no space in thefile, it contributes to the segment's memory image. Normally, these uninitialized data reside at the end of thesegment, thereby making p_memsz larger than p_filesz in the associated program header element.

Note section

Sometimes a vendor or system builder needs to mark an object file with special information that otherprograms will check for conformance, compatibility, and so on. Sections of type SHT_NOTE and programheader elements of type PT_NOTE can be used for this purpose. The note information in sections andprogram header elements holds a variable amount of entries. In 64−bit objects (files withe_ident[EI_CLASS] equal to ELFCLASS64), each entry is an array of 8−byte words in the format of thetarget processor. In 32−bit objects (files with e_ident[EI_CLASS] equal to ELFCLASS32), each entry is anarray of 4−byte words in the format of the target processor. Labels appear below to help explain noteinformation organization, but they are not part of the specification.

namesz

descsz

type

name

[. . .]

desc

[. . .]

Note information

namesz

and nameThe first namesz bytes in name contain a null−terminated character representation of the entry'sowner or originator. There is no formal mechanism for avoiding name conflicts. By convention,vendors use their own name, such as XYZ Computer Company, as the identifier. If no name ispresent, namesz contains 0. Padding is present, if necessary, to ensure 8 or 4−byte alignment for thedescriptor (depending on whether the file is a 64−bit or 32−bit object). Such padding is not includedin namesz.

descsz

and descThe first descsz bytes in desc hold the note descriptor. The ABI places no constraints on a descriptor'scontents. If no descriptor is present, descsz contains 0. Padding is present, if necessary, to ensure 8 or4−byte alignment for the next note entry (depending on whether the file is a 64−bit or 32−bit object).Such padding is not included in descsz.

Object files

Segment permissions 250

Page 258: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

typeThis word gives the interpretation of the descriptor. Each originator controls its own types; multipleinterpretations of a single value may exist. Thus, a program must recognize both the name and thetype to recognize a descriptor. Types currently must be non−negative. The ABI does not define whatdescriptors mean.

To illustrate, the following note segment holds two entries.

+0 +1 +2 +3 +−−−−−−−−−−−−−−−−−−−+ namesz | 7 | +−−−−−−−−−−−−−−−−−−−+ descsz | 0 | No descriptor +−−−−−−−−−−−−−−−−−−−+ type | 1 | +−−−−−−−−−−−−−−−−−−−+ name | X | Y | Z | | | −−−|−−−−|−−−−|−−−−| | C | o | \0 | pad| +−−−−−−−−−−−−−−−−−−−+ namesz | 7 | +−−−−−−−−−−−−−−−−−−−+ descsz | 8 | +−−−−−−−−−−−−−−−−−−−+ type | 3 | +−−−−−−−−−−−−−−−−−−−+ name | X | Y | Z | | | −−−|−−−−|−−−−|−−−−| | c | o | \0 | pad| +−−−−−−−−−−−−−−−−−−−+ desc | word 0 | +−−−−−−−−−−−−−−−−−−−+ | word 1 | +−−−−−−−−−−−−−−−−−−−+

Example note segment

NOTE: The system reserves note information with no name (namesz==0) and with a zero−length name(name[0]=='\0') but currently defines no types. All other names must have at least one non−null character.

NOTE: Note information is optional. The presence of note information does not affect a program's ABIconformance, provided the information does not affect the program's execution behavior. Otherwise, theprogram does not conform to the ABI and has undefined behavior.

Program loading (Processor specific)

NOTE: This section requires processor−specific information. The ABI supplement for the desired processordescribes the details.

Object files

Program loading (Processor specific) 251

Page 259: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

Dynamic linking

Program interpreter

An executable file that participates in dynamic linking shall have one PT_INTERP program header element.During exec(BA_OS), the system retrieves a path name from the PT_INTERP segment and creates the initialprocess image from the interpreter file's segments. That is, instead of using the original executable file'ssegment images, the system composes a memory image for the interpreter. It then is the interpreter'sresponsibility to receive control from the system and provide an environment for the application program.

The interpreter receives control in one of two ways. First, it may receive a file descriptor to read theexecutable file, positioned at the beginning. It can use this file descriptor to read and/or map the executablefile's segments into memory. Second, depending on the executable file format, the system may load theexecutable file into memory instead of giving the interpreter an open file descriptor. With the possibleexception of the file descriptor, the interpreter's initial process state matches what the executable file wouldhave received. The interpreter itself may not require a second interpreter. An interpreter may be either ashared object or an executable file.

A shared object (the normal case) is loaded as position−independent, with addresses that may varyfrom one process to another; the system creates its segments in the dynamic segment area used bymmap(KE_OS) and related services. Consequently, a shared object interpreter typically will notconflict with the original executable file's original segment addresses.

An executable file may be loaded at fixed addresses; if so, the system creates its segments using thevirtual addresses from the program header table. Consequently, an executable file interpreter's virtualaddresses may collide with the first executable file; the interpreter is responsible for resolvingconflicts.

Dynamic linker

When building an executable file that uses dynamic linking, the link editor adds a program header element oftype PT_INTERP to an executable file, telling the system to invoke the dynamic linker as the programinterpreter.

NOTE: The locations of the system provided dynamic linkers are processor specific.

Exec(BA_OS) and the dynamic linker cooperate to create the process image for the program, which entailsthe following actions:

Adding the executable file's memory segments to the process image;• Adding shared object memory segments to the process image;• Performing relocations for the executable file and its shared objects;• Closing the file descriptor that was used to read the executable file, if one was given to the dynamiclinker;

Transferring control to the program, making it look as if the program had received control directlyfrom exec(BA_OS).

The link editor also constructs various data that assist the dynamic linker for executable and shared objectfiles. As shown above in ``Program header'', this data resides in loadable segments, making them availableduring execution. (Once again, recall the exact segment contents are processor−specific. See the processor

Object files

Dynamic linking 252

Page 260: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

supplement for complete information).

A .dynamic section with type SHT_DYNAMIC holds various data. The structure residing at thebeginning of the section holds the addresses of other dynamic linking information.

The .hash section with type SHT_HASH holds a symbol hash table.• The .got and .plt sections with type SHT_PROGBITS hold two separate tables: the global offsettable and the procedure linkage table. Sections below explain how the dynamic linker uses andchanges the tables to create memory images for object files.

Because every ABI−conforming program imports the basic system services from a shared object library, thedynamic linker participates in every ABI−conforming program execution.

Shared objects may occupy virtual memory addresses that are different from the addresses recorded in thefile's program header table. The dynamic linker relocates the memory image, updating absolute addressesbefore the application gains control. Although the absolute address values would be correct if the library wereloaded at the addresses specified in the program header table, this normally is not the case.

If the process environment (see exec(BA_OS)) contains a variable named LD_BIND_NOW with a non−nullvalue, the dynamic linker processes all relocations before transferring control to the program. For example, allthe following environment entries would specify this behavior.

LD_BIND_NOW=1• LD_BIND_NOW=on• LD_BIND_NOW=off•

Otherwise, LD_BIND_NOW either does not occur in the environment or has a null value. The dynamiclinker is permitted to evaluate procedure linkage table entries lazily, thus avoiding symbol resolution andrelocation overhead for functions that are not called. See ``Procedure linkage table'' for more information.

Dynamic section

If an object file participates in dynamic linking, its program header table will have an element of typePT_DYNAMIC. This segment contains the .dynamic section. A special symbol, _DYNAMIC, labels thesection, which contains an array of the following structures.

typedef struct { Elf32_Sword d_tag; union { Elf32_Word d_val; Elf32_Addr d_ptr; } d_un; } Elf32_Dyn;

extern Elf32_Dyn _DYNAMIC[];

typedef struct { Elf64_Sxword d_tag; union { Elf64_Xword d_val; Elf64_Addr d_ptr; } d_un; } Elf64_Dyn;

extern Elf64_Dyn _DYNAMIC[];

Dynamic structure

Object files

Dynamic section 253

Page 261: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

For each object with this type, d_tag controls the interpretation of d_un.

d_valThese objects represent integer values with various interpretations.

d_ptrThese objects represent program virtual addresses. As mentioned previously, a file's virtual addressesmight not match the memory virtual addresses during execution. When interpreting addressescontained in the dynamic structure, the dynamic linker computes actual addresses, based on theoriginal file value and the memory base address. For consistency, files do not contain relocationentries to ``correct'' addresses in the dynamic structure.

To make it simpler for tools to interpret the contents of dynamic section entries, the value of each tag, exceptfor those in two special compatibility ranges, will determine the interpretation of the d_un union. A tag whosevalue is an even number indicates a dynamic section entry that uses d_ptr. A tag whose value is an oddnumber indicates a dynamic section entry that uses d_val or that uses neither d_ptr nor d_val. Tags whosevalues are less than the special value DT_ENCODING and tags whose values fall between DT_HIOS andDT_LOPROC do not follow these rules.

The following table summarizes the tag requirements for executable and shared object files. If a tag is marked``mandatory'', the dynamic linking array for an ABI−conforming file must have an entry of that type.Likewise, ``optional'' means an entry for the tag may appear but is not required.

Name Value d_un Executable SharedObject

DT_NULL 0 ignored mandatory mandatory

DT_NEEDED 1 d_val optional optional

DT_PLTRELSZ 2 d_val optional optional

DT_PLTGOT 3 d_ptr optional optional

DT_HASH 4 d_ptr mandatory mandatory

DT_STRTAB 5 d_ptr mandatory mandatory

DT_SYMTAB 6 d_ptr mandatory mandatory

DT_RELA 7 d_ptr mandatory optional

DT_RELASZ 8 d_val mandatory optional

DT_RELAENT 9 d_val mandatory optional

DT_STRSZ 10 d_val mandatory mandatory

DT_SYMENT 11 d_val mandatory mandatory

DT_INIT 12 d_ptr optional optional

DT_FINI 13 d_ptr optional optional

DT_SONAME 14 d_val ignored optional

DT_RPATH* 15 d_val optional ignored

DT_SYMBOLIC* 16 ignored ignored optional

DT_REL 17 d_ptr mandatory optional

DT_RELSZ 18 d_val mandatory optional

Object files

Dynamic section 254

Page 262: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

DT_RELENT 19 d_val mandatory optional

DT_PLTREL 20 d_val optional optional

DT_DEBUG 21 d_ptr optional ignored

DT_TEXTREL* 22 ignored optional optional

DT_JMPREL 23 d_ptr optional optional

DT_BIND_NOW* 24 ignored optional optional

DT_INIT_ARRAY 25 d_ptr optional optional

DT_FINI_ARRAY 26 d_ptr optional optional

DT_INIT_ARRAYSZ 27 d_val optional optional

DT_FINI_ARRAYSZ 28 d_val optional optional

DT_RUNPATH 29 d_val optional optional

DT_FLAGS 30 d_val optional optional

DT_ENCODING 32 unspecified unspecified unspecified

DT_PREINIT_ARRAY 32 d_ptr optional ignored

DT_PREINIT_ARRAYSZ 33 d_val optional ignored

DT_LOOS 0x6000000Dunspecified unspecified unspecified

DT_HIOS 0x6ffff000 unspecified unspecified unspecified

DT_LOPROC 0x70000000unspecified unspecified unspecified

DT_HIPROC 0x7fffffff unspecified unspecified unspecified

Dynamic array tags, d_tag

* Signifies an entry that is at level 2.

DT_NULLAn entry with a DT_NULL tag marks the end of the _DYNAMIC array.

DT_NEEDEDThis element holds the string table offset of a null−terminated string, giving the name of a neededlibrary. The offset is an index into the table recorded in the code. See ``Shared object dependencies''for more information about these names. The dynamic array may contain multiple entries with thistype. These entries' relative order is significant, though their relation to entries of other types is not.

DT_PLTRELSZThis element holds the total size, in bytes, of the relocation entries associated with the procedurelinkage table. If an entry of type DT_JMPREL is present, a DT_PLTRELSZ must accompany it.

DT_PLTGOTThis element holds an address associated with the procedure linkage table and/or the global offsettable. See this section in the processor supplement for details.

DT_HASHThis element holds the address of the symbol hash table, described in ``Hash table''. This hash tablerefers to the symbol table referenced by the DT_SYMTAB element.

DT_STRTAB

Object files

Dynamic section 255

Page 263: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

This element holds the address of the string table, described in ``String table''. Symbol names, librarynames, and other strings reside in this table.

DT_SYMTABThis element holds the address of the symbol table, described in the first part of this chapter, withElf32_Sym entries for the 32−bit class of files and Elf64_Sym entries for the 64−bit class of files.

DT_RELAThis element holds the address of a relocation table, described in ``Relocation''. Entries in the tablehave explicit addends, such as Elf32_Rela for the 32−bit file class or Elf64_Rela for the 64−bit fileclass. An object file may have multiple relocation sections. When building the relocation table for anexecutable or shared object file, the link editor catenates those sections to form a single table.Although the sections remain independent in the object file, the dynamic linker sees a single table.When the dynamic linker creates the process image for an executable file or adds a shared object tothe process image, it reads the relocation table and performs the associated actions. If this element ispresent, the dynamic structure must also have DT_RELASZ and DT_RELAENT elements. Whenrelocation is ``mandatory'' for a file, either DT_RELA or DT_REL may occur (both are permittedbut not required).

DT_RELASZThis element holds the total size, in bytes, of the DT_RELA relocation table.

DT_RELAENTThis element holds the size, in bytes, of the DT_RELA relocation entry.

DT_STRSZThis element holds the size, in bytes, of the string table.

DT_SYMENTThis element holds the size, in bytes, of a symbol table entry.

DT_INITThis element holds the address of the initialization function, discussed in ``Initialization andtermination functions''.

DT_FINIThis element holds the address of the termination function, discussed in ``Initialization andtermination functions''.

DT_SONAMEThis element holds the string table offset of a null−terminated string, giving the name of the sharedobject. The offset is an index into the table recorded in the DT_STRTAB entry. See ``Shared objectdependencies'' for more information about these names.

DT_RPATHThis element holds the string table offset of a null−terminated search library search path stringdiscussed in ``Shared object dependencies''. The offset is an index into the table recorded in theDT_STRTAB entry. This entry is at level 2. Its use has been superseded by DT_RUNPATH.

DT_SYMBOLIC

Object files

Dynamic section 256

Page 264: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

This element's presence in a shared object library alters the dynamic linker's symbol resolutionalgorithm for references within the library. Instead of starting a symbol search with the executablefile, the dynamic linker starts from the shared object itself. If the shared object fails to supply thereferenced symbol, the dynamic linker then searches the executable file and other shared objects asusual. This entry is at level 2. Its use has been superseded by the flag.

DT_RELThis element is similar to DT_RELA, except its table has implicit addends, such as Elf32_Rel for the32−bit file class or Elf64_Rel for the 64−bit file class. If this element is present, the dynamic structuremust also have DT_RELSZ and DT_RELENT elements.

DT_RELSZThis element holds the total size, in bytes, of the DT_REL relocation table.

DT_RELENTThis element holds the size, in bytes, of the DT_REL relocation entry.

DT_PLTRELThis member specifies the type of relocation entry to which the procedure linkage table refers. Thed_val member holds DT_REL or DT_RELA, as appropriate. All relocations in a procedure linkagetable must use the same relocation.

DT_DEBUGThis member is used for debugging. Its contents are not specified for the ABI; programs that accessthis entry are not ABI−conforming.

DT_TEXTRELThis member's absence signifies that no relocation entry should cause a modification to anon−writable segment, as specified by the segment permissions in the program header table. If thismember is present, one or more relocation entries might request modifications to a non−writablesegment, and the dynamic linker can prepare accordingly. This entry is at level 2. Its use has beensuperseded by the DF_TEXTREL flag.

DT_JMPRELIf present, this entry's d_ptr member holds the address of relocation entries associated solely with theprocedure linkage table. Separating these relocation entries lets the dynamic linker ignore them duringprocess initialization, if lazy binding is enabled. If this entry is present, the related entries of typesDT_PLTRELSZ and DT_PLTREL must also be present.

DT_BIND_NOWIf present in a shared object or executable, this entry instructs the dynamic linker to process allrelocations for the object containing this entry before transferring control to the program. Thepresence of this entry takes precedence over a directive to use lazy binding for this object whenspecified through the environment or via dlopen(BA_LIB). This entry is at level 2. Its use has beensuperseded by the DF_BIND_NOW flag.

DT_INIT_ARRAYThis element holds the address of the array of pointers to initialization functions, discussed in``Initialization and termination functions''.

DT_FINI_ARRAY

Object files

Dynamic section 257

Page 265: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

This element holds the address of the array of pointers to termination functions, discussed in``Initialization and termination functions''.

DT_INIT_ARRAYSZThis element holds the size in bytes of the array of initialization functions pointed to by theDT_INIT_ARRAY entry. If an object has a DT_INIT_ARRAY entry, it must also have aDT_INIT_ARRAYSZ entry.

DT_FINI_ARRAYSZThis element holds the size in bytes of the array of termination functions pointed to by theDT_FINI_ARRAY entry. If an object has a DT_FINI_ARRAY entry, it must also have aDT_FINI_ARRAYSZ entry.

DT_RUNPATHThis element holds the string table offset of a null−terminated library search path string discussed in``Shared object dependencies''. The offset is an index into the table recorded in the DT_STRTABentry.

DT_FLAGSThis element holds flag values specific to the object being loaded. Each flag value will have the nameDF_flag_name. Defined values and their meanings are described below. All other values arereserved.

DT_PREINIT_ARRAYThis element holds the address of the array of pointers to pre−initialization functions, discussed in``Initialization and termination functions''. The DT_PREINIT_ARRAY table is processed only in anexecutable file; it is ignored if contained in a shared object.

DT_PREINIT_ARRAYSZThis element holds the size in bytes of the array of pre−initialization functions pointed to by theDT_PREINIT_ARRAY entry. If an object has a DT_PREINIT_ARRAY entry, it must also have aDT_PREINIT_ARRAYSZ entry. As with DT_PREINIT_ARRAY, this entry is ignored if itappears in a shared object.

DT_ENCODINGValues greater than or equal to DT_ENCODING and less than DT_LOOS follow the rules for theinterpretation of the d_un union described above.

DT_LOOS through DT_HIOSValues in this inclusive range are reserved for operating system−specific semantics. All such valuesfollow the rules for the interpretation of the d_un union described above.

DT_LOPROC through DT_HIPROCValues in this inclusive range are reserved for processor−specific semantics. If meanings arespecified, the processor supplement explains them. All such values follow the rules for theinterpretation of the d_un union described above.

Except for the DT_NULL element at the end of the array, and the relative order of DT_NEEDED elements,entries may appear in any order. Tag values not appearing in the table are reserved.

Object files

Dynamic section 258

Page 266: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

Name Value

DF_ORIGIN 0x1

DF_SYMBOLIC 0x2

DF_TEXTREL 0x4

DF_BIND_NOW0x8

DT_FLAGS values

DF_ORIGINThis flag signifies that the object being loaded may make reference to the $ORIGIN substitutionstring (see ``Substitution sequences''). The dynamic linker must determine the pathname of the objectcontaining this entry when the object is loaded.

DF_SYMBOLICIf this flag is set in a shared object library, the dynamic linker's symbol resolution algorithm forreferences within the library is changed. Instead of starting a symbol search with the executable file,the dynamic linker starts from the shared object itself. If the shared object fails to supply thereferenced symbol, the dynamic linker then searches the executable file and other shared objects asusual.

DF_TEXTRELIf this flag is not set, no relocation entry should cause a modification to a non−writable segment, asspecified by the segment permissions in the program header table. If this flag is set, one or morerelocation entries might request modifications to a non−writable segment, and the dynamic linker canprepare accordingly.

DF_BIND_NOWIf set in a shared object or executable, this flag instructs the dynamic linker to process all relocationsfor the object containing this entry before transferring control to the program. The presence of thisentry takes precedence over a directive to use lazy binding for this object when specified through theenvironment or via dlopen(BA_LIB).

Shared object dependencies

When the link editor processes an archive library, it extracts library members and copies them into the outputobject file. These statically linked services are available during execution without involving the dynamiclinker. Shared objects also provide services, and the dynamic linker must attach the proper shared object filesto the process image for execution.

When the dynamic linker creates the memory segments for an object file, the dependencies (recorded inDT_NEEDED entries of the dynamic structure) tell what shared objects are needed to supply the program'sservices. By repeatedly connecting referenced shared objects and their dependencies, the dynamic linkerbuilds a complete process image. When resolving symbolic references, the dynamic linker examines thesymbol tables with a breadth−first search. That is, it first looks at the symbol table of the executable programitself, then at the symbol tables of the DT_NEEDED entries (in order), and then at the second levelDT_NEEDED entries, and so on. Shared object files must be readable by the process; other permissions arenot required.

Object files

Shared object dependencies 259

Page 267: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

NOTE: Even when a shared object is referenced multiple times in the dependency list, the dynamic linkerwill connect the object only once to the process.

Names in the dependency list are copies either of the DT_SONAME strings or the path names of the sharedobjects used to build the object file. For example, if the link editor builds an executable file using one sharedobject with a DT_SONAME entry of lib1 and another shared object library with the path name /usr/lib/lib2,the executable file will contain lib1 and /usr/lib/lib2 in its dependency list.

If a shared object name has one or more slash (/) characters anywhere in the name, such as /usr/lib/lib2 ordirectory/file, the dynamic linker uses that string directly as the path name. If the name has no slashes, such aslib1, three facilities specify shared object path searching.

The dynamic array tag DT_RUNPATH gives a string that holds a list of directories, separated bycolons :. For example, the string /home/dir/lib:/home/dir2/lib: tells the dynamic linker to search firstthe directory /home/dir/lib, then /home/dir2/lib, and then the current directory to find dependencies.

The set of directories specified by a given DT_RUNPATH entry is used to find only the immediatedependencies of the executable or shared object containing the entry. That is, it is used only for thosedependencies contained in the DT_NEEDED entries of the dynamic structure containing theDT_RUNPATH entry, itself. One object's DT_RUNPATH entry does not affect the search for anyother object's dependencies.

1.

A variable called LD_LIBRARY_PATH in the process environment [see exec(BA_OS)] may hold alist of directories as above, optionally followed by a semicolon ; and another directory list. Thefollowing values would be equivalent to the previous example:

LD_LIBRARY_PATH=/home/dir/usr/lib:/home/dir2/usr/lib:♦ LD_LIBRARY_PATH=/home/dir/usr/lib;/home/dir2/usr/lib:♦ LD_LIBRARY_PATH=/home/dir/usr/lib:/home/dir2/usr/lib:;♦

Although some programs (such as the link editor) treat the lists before and after the semicolondifferently, the dynamic linker does not. Nevertheless, the dynamic linker accepts the semicolonnotation, with the semantics described previously.

All LD_LIBRARY_PATH directories are searched before those from DT_RUNPATH.

2.

Finally, if the other two groups of directories fail to locate the desired library, the dynamic linkersearches the default directories, /usr/lib or such other directories as may be specified by the ABIsupplement for a given processor.

3.

When the dynamic linker is searching for shared objects, it is not a fatal error if an ELF file with the wrongattributes is encountered in the search. Instead, the dynamic linker shall exhaust the search of all paths beforedetermining that a matching object could not be found. For this determination, the relevant attributes arecontained in the following ELF header fields: e_ident[EI_DATA], e_ident[EI_CLASS],e_ident[EI_OSABI], e_ident[EI_ABIVERSION], e_machine, e_type, e_flags and e_version.

NOTE: For security, the dynamic linker ignores LD_LIBRARY_PATH for set−user and set−group IDprograms. It does, however, search DT_RUNPATH directories and the default directories. The samerestriction may be applied to processes that have more than minimal privileges on systems with installedextended security mechanisms.

Object files

Shared object dependencies 260

Page 268: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

NOTE: A fourth search facility, the dynamic array tag DT_RPATH, has been moved to level 2 in the ABI. Itprovides a colon−separated list of directories to search. Directories specified by DT_RPATH are searchedbefore directories specified by LD_LIBRARY_PATH.

If both DT_RPATH and DT_RUNPATH entries appear in a single object's dynamic array, the dynamiclinker processes only the DT_RUNPATH entry.

Substitution sequences

Within a string provided by dynamic array entries with the DT_NEEDED or DT_RUNPATH tags and inpathnames passed as parameters to the dlopen() routine, a dollar sign $ introduces a substitution sequence.This sequence consists of the dollar sign immediately followed by either the longest name sequence or a namecontained within left and right braces { and }. A name is a sequence of bytes that start with either a letter or anunderscore followed by zero or more letters, digits or underscores. If a dollar sign is not immediately followedby a name or a brace−enclosed name, the behavior of the dynamic linker is unspecified.

If the name is ``ORIGIN'', then the substitution sequence is replaced by the dynamic linker with the absolutepathname of the directory in which the object containing the substitution sequence originated. Moreover, thepathname will contain no symbolic links or use of . or .. components. Otherwise (when the name is not``ORIGIN'') the behavior of the dynamic linker is unspecified.

When the dynamic linker loads an object that uses $ORIGIN, it must calculate the pathname of the directorycontaining the object. Because this calculation can be computationally expensive, implementations may wantto avoid the calculation for objects that do not use $ORIGIN. If an object calls dlopen() with a stringcontaining $ORIGIN and does not use $ORIGIN in one if its dynamic array entries, the dynamic linker maynot have calculated the pathname for the object until the dlopen() actually occurs. Since the application mayhave changed its current working directory before the dlopen() call, the calculation may not yield the correctresult. To avoid this possibility, an object may signal its intention to reference $ORIGIN by setting theDF_ORIGIN flag. An implementation may reject an attempt to use $ORIGIN within a dlopen() call from anobject that did not set the DF_ORIGIN flag and did not use $ORIGIN within its dynamic array.

NOTE: For security, the dynamic linker does not allow use of $ORIGIN substitution sequences for set−userand set−group ID programs. For such sequences that appear within strings specified by DT_RUNPATHdynamic array entries, the specific search path containing the $ORIGIN sequence is ignored (though othersearch paths in the same string are processed). $ORIGIN sequences within a DT_NEEDED entry or pathpassed as a parameter to dlopen() are treated as errors. The same restrictions may be applied to processes thathave more than minimal privileges on systems with installed extended security mechanisms.

Global offset table

NOTE: This section requires processor−specific information. The System V Application Binary Interfacesupplement for the desired processor describes the details.

Procedure linkage table

NOTE: This section requires processor−specific information. The System V Application Binary Interface

Object files

Shared object dependencies 261

Page 269: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

supplement for the desired processor describes the details.

Hash table

A hash table of Elf32_Word objects supports symbol table access. The same table layout is used for both the32−bit and 64−bit file class. Labels appear below to help explain the hash table organization, but they are notpart of the specification.

nbucket

nchain

bucket[0]

. . .

bucket[nbucket−1]

chain[0]

. . .

chain[nchain−1]

Symbol hash table

The bucket array contains nbucket entries, and the chain array contains nchain entries; indexes start at 0. Bothbucket and chain hold symbol table indexes. Chain table entries parallel the symbol table. The number ofsymbol table entries should equal nchain; so symbol table indexes also select chain table entries. A hashingfunction (shown below) accepts a symbol name and returns a value that may be used to compute a bucketindex. Consequently, if the hashing function returns the value x for some name, bucket[x%nbucket] gives anindex, y, into both the symbol table and the chain table. If the symbol table entry is not the one desired,chain[y] gives the next symbol table entry with the same hash value. One can follow the chain links untileither the selected symbol table entry holds the desired name or the chain entry contains the valueSTN_UNDEF.

unsigned long elf_hash(const unsigned char *name) { unsigned long h = 0, g; while (*name) { h = (h << 4) + *name++; if (g = h & 0xf0000000) h ^= g >> 24; h &= ~g; } return h; }

Hashing function

Initialization and termination functions

After the dynamic linker has built the process image and performed the relocations, each shared object and theexecutable file get the opportunity to execute some initialization functions. All shared object initializationshappen before the executable file gains control.

Object files

Hash table 262

Page 270: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

Before the initialization functions for any object A is called, the initialization functions for any other objectsthat object A depends on are called. For these purposes, an object A depends on another object B, if B appearsin A's list of needed objects (recorded in the DT_NEEDED entries of the dynamic structure). The order ofinitialization for circular dependencies is undefined.

The initialization of objects occurs by recursing through the needed entries of each object. The initializationfunctions for an object are invoked after the needed entries for that object have been processed. The order ofprocessing among the entries of a particular list of needed objects is unspecified.

NOTE: Each processor supplement may optionally further restrict the algorithm used to determine the orderof initialization. Any such restriction, however, may not conflict with the rules described by this specification.

The following example illustrates two of the possible correct orderings which can be generated for theexample NEEDED lists. In this example the a.out is dependent on b, d, and e. b is dependent on d and f, whiled is dependent on e and g. From this information a dependency graph can be drawn. The above algorithm oninitialization will then allow the following specified initialization orderings among others.

Initialization ordering example

Similarly, shared objects and executable files may have termination functions, which are executed with theatexit(BA_OS) mechanism after the base process begins its termination sequence. The termination functionsfor any object A must be called before the termination functions for any other objects that object A dependson. For these purposes, an object A depends on another object B, if B appears in A's list of needed objects(recorded in the DT_NEEDED entries of the dynamic structure). The order of termination for circulardependencies is undefined.

Finally, an executable file may have pre−initialization functions. These functions are executed after thedynamic linker has built the process image and performed relocations but before any shared objectinitialization functions. Pre−initialization functions are not permitted in shared objects.

NOTE: Complete initialization of system libraries may not have occurred when pre−initializations areexecuted, so some features of the system may not be available to pre−initialization code. In general, use of

Object files

Hash table 263

Page 271: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

pre−initialization code can be considered portable only if it has no dependencies on system libraries.

The dynamic linker ensures that it will not execute any initialization, pre−initialization, or terminationfunctions more than once.

Shared objects designate their initialization and termination code in one of two ways. First, they may specifythe address of a function to execute via the DT_INIT and DT_FINI entries in the dynamic structure,described in ``Dynamic section''.

Shared objects may also (or instead) specify the address and size of an array of function pointers. Eachelement of this array is a pointer to a function to be executed by the dynamic linker. Each array element is thesize of a pointer in the programming model followed by the object containing the array. The address of thearray of initialization function pointers is specified by the DT_INIT_ARRAY entry in the dynamic structure.Similarly, the address of the array of pre−initialization functions is specified by DT_PREINIT_ARRAY andthe address of the array of termination functions is specified by DT_FINI_ARRAY. The size of each array isspecified by the DT_INIT_ARRAYSZ, DT_PREINIT_ARRAYSZ and DT_FINI_ARRAYSZ entries.

The functions whose addresses are contained in the arrays specified by DT_INIT_ARRAY and byDT_PREINIT_ARRAY are executed by the dynamic linker in the same order in which their addressesappear in the array; those specified by DT_FINI_ARRAY are executed in reverse order.

If an object contains both DT_INIT and DT_INIT_ARRAY entries, the function referenced by theDT_INIT entry is processed before those referenced by the DT_INIT_ARRAY entry for that object. If anobject contains both DT_FINI and DT_FINI_ARRAY entries, the functions referenced by theDT_FINI_ARRAY entry are processed before the one referenced by the DT_FINI entry for that object.

NOTE: Although the atexit(BA_OS) termination processing normally will be done, it is not guaranteed tohave executed upon process death. In particular, the process will not execute the termination processing if itcalls _exit [see exit(BA_OS)] or if the process dies because it received a signal that it neither caught norignored.

The processor supplement for each processor specifies whether the dynamic linker is responsible for callingthe executable file's initialization function or registering the executable file's termination function withatexit(BA_OS). Termination functions specified by users via the atexit(BA_OS) mechanism must beexecuted before any termination functions of shared objects.

Object files

Hash table 264

Page 272: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

Floating point operationsOn machines which support the IEEE Standard for Binary Floating Point Arithmetic (ANSI/IEEE Standard754−1985), the C compiler uses the IEEE standard single−precision, double−precision andextended−precision data types, operations, and conversions. Library functions are provided for further IEEEsupport.

You will probably not need any special functions to use floating point operations in your programs. If you do,however, you can find information about floating point support in this topic.

NOTE: For more information on how the C compilation system supports the IEEE standard see ``IEEErequirements''.

This topic discusses the following:

the details of IEEE arithmetic• floating point exception handling• conversion between binary and decimal values• single−precision floating point operations• implicit precision of subexpressions• IEEE requirements•

IEEE arithmetic

This subtopic provides the details of floating point representation, the environment, and exception handling.Most users need not be concerned with the details of the floating point environment.

NOTE: Programs ported from pre−System V Release 4 environments will now proceed using computationswith diagnostic values or floating point ``infinities.''

NOTE: The floating point subsystems of the Intel386 microprocessor are based on the Standard for BinaryFloating−Point Arithmetic, ANSI/IEEE Standard 754−1985. For more information about this standard, writeto IEEE Service Center, 445 Hoes Lane, Piscataway, NJ, 08854, or call (201) 981−0060

Data types and formats

Single−precision

Single−precision floating point numbers have the following format:

31 30 23 22 0

SIGN EXPONENT FRACTION

^

Floating point operations 265

Page 273: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

binary point

Field Position Full Name

sign 31 sign bit (0==positive, 1==negative)

exponent 30−23 exponent (biased by 127)

fraction 22−0 fraction (bits to right of binary point)

Double−precision

Double−precision floating point numbers have the following format:

63 62 52 51 0

SIGN EXPONENT FRACTION

^

binary point

Field Position Full Name

sign 63 sign bit (0==positive, 1==negative)

exponent 62−52 exponent (biased by 1023)

fraction 51−0 fraction (bits to right of binary point)

NOTE: For big−endian machines, the high−order word is at the low address; for little−endian machines, thehigh−order word is at the high address:

Extended−precision

Extended−precision floating point numbers have the following format on the Intel386 microprocessor:

79 78 64 63 0

SIGN EXPONENT FRACTION

^

binary point

Floating point operations

Data types and formats 266

Page 274: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

For extended precision, bit 63 is always a 1, and the decimal point comes after this bit. The other formats havean implicit bit 1 before the fraction, as explained in ``Normalized numbers''.

Field Position Full Name

sign 79 sign bit (0==positive, 1==negative)

exponent 78−64 exponent (biased by 16,383)

fraction 63−0 fraction (bit 63=1, followed by the

decimal point. Bits 62−0 are the fraction.

Normalized numbers

A number is normalized if the exponent field contains other than all 1's or all 0's.

The exponent field contains a biased exponent, where the bias is 127 in single−precision, 1023 indouble−precision, and 16,383 in extended−precision. Thus, the exponent of a normalized floating pointnumber is in the range −126 to 127 inclusive for single−precision, and in the range −1022 to 1023 inclusivefor double−precision. For extended−precision the range is −16,382 to 16,383.

There is an implicit bit associated with both single− and double−precision formats. The implicit bit is notexplicitly stored anywhere (thus its name). Logically, for normalized operands the implicit bit has a value of 1and resides immediately to the left of the binary point (in the 2[0] position). Thus the implicit bit and fractionfield together can represent values in the range 1 to 2 − 2[−23] inclusive for single−precision, and in the range1 to 2 − 2[−52] inclusive for double−precision. For extended−precision, there is no such bit, therefore the fieldcan represent values in the range 1 to 2 − 2[−63]

Thus normalized single−precision numbers can be in the range (plus or minus)2[−126] to (2 − 2[−23] ) × 2[127] inclusive.

Normalized double−precision numbers can be in the range (plus or minus)2[−1022] to (2 − 2[−52] ) × 2[1023] inclusive.

Normalized extended−precision numbers can be in the range (plus or minus)2[−16,382] to (2 − 2[−63] ) × 2[16,383] inclusive.

Denormalized numbers

A number is denormalized if the exponent field contains all 0's and the fraction field does not contain all 0's.

Thus denormalized single−precision numbers can be in the range (plus or minus) 2[−126] × 2[−22] = 2[−148]to (1 − 2[−22] ) × 2[−126] inclusive.

Denormalized double−precision numbers can be in the range (plus or minus) 2[−1022] × 2[−51] = 2[−1073]to (1 − 2[−51] ) × 2[−1022] inclusive.

Denormalized extended−precision numbers do not have a 1 bit in position 63. Therefore, it stores numbers inthe range (plus or minus) 2[−16,382] × 2[−63] = 2[−16,445] to (1 − 2[−63] ) × 2[−16,382] inclusive.

Both positive and negative zero values exist, but they are treated the same during floating point calculations.

Floating point operations

Normalized numbers 267

Page 275: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

Maximum and minimum representable floating point values

The maximum and minimum representable values in floating point format are defined in the header filevalues.h.

Special−case values

The following table gives the names of special cases and how each is represented.

Single and Double Precision:

Value Name Sign Exponent Fraction

MSB Rest of Fraction

NaN (non−trapping) X Max 1 X

Trapping NaN X Max 0 Nonzero

Positive Infinity 0 Max Min

Negative Infinity 1 Max Min

Positive Zero 0 Min Min

Negative Zero 1 Min Min

Denormalized Number X Min Nonzero

Normalized Number X NotMM X

Key:

Xdoes not matter

Maxmaximum value that can be stored in the field (all 1's)

Minminimum value that can be stored in the field (all 0's)

NaNnot a number

NotMMfield is not equal to either Min or Max values

Nonzerofield contains at least one ``1'' bit

MSBMost Significant Bit

Floating point operations

Denormalized numbers 268

Page 276: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

Double−extended:

Value Name Sign Exponent Fraction

MSB

Rest of Fraction

NaN (non−trapping) X Max 1 Nonzero

Trapping NaN X Max 0 Nonzero

Positive Infinity 0 Max 1 Min

Negative Infinity 1 Max 1 Min

Positive Zero 0 Min Min

Negative Zero 1 Min Min

Denormalized Number X Min 0 Nonzero

Normalized Number X NotMM 1 X

The algorithm for classification of a value into special cases follows:

If (Exponent==Max)If (Fraction==Min)Then the number is Infinity (Positive or Negativeas determined by the Sign bit).Else the number is NaN (Trapping if FractionMSB==0,non−Trapping if FractionMSB==1).

Else If (Exponent==Min)If (Fraction==Min)Then the number is Zero (Positive or Negativeas determined by the Sign bit).Else the number is Denormalized.Else the number is Normalized.

NaNs and infinities

The floating point system supports two special representations:

Infinity −Positive infinity in a format compares greater than all other representable numbers in thesame format. Arithmetic operations on infinities are quite intuitive. For example, adding anyrepresentable number to infinity is a valid operation the result of which is positive infinity.Subtracting positive infinity from itself is invalid. If some arithmetic operation overflows, and theoverflow trap is disabled, in some rounding modes the result is infinity.

Not−a−Number (NaN)− These floating point representations are not numbers. They can be used tocarry diagnostic information. There are two kinds of NaNs: signaling NaNs and quiet NaNs.Signaling NaNs raise the invalid operation exception whenever they are used as operands in floatingpoint operations. Quiet NaNs propagate through most operations without raising any exception. Theresult of these operations is the same quiet NaN. NaNs are sometimes produced by the arithmeticoperations themselves. For example, 0.0 divided by 0.0, when the invalid operation trap is disabled,produces a quiet NaN.

Floating point operations

NaNs and infinities 269

Page 277: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

The header file ieeefp.h defines the interface for the floating point exception and environment control. Thisheader defines three interfaces:

Rounding Control• Exception Control• Exception Handling•

Rounding control

The floating point arithmetic provides four rounding modes that affect the result of most floating pointoperations. (These modes are defined in the header ieeefp.h):

FP_RN round to nearest representable number, tie −> evenFP_RP round toward plus infinityFP_RM round toward minus infinityFP_RZ round toward zero (truncate)

You can check the current rounding mode with the function

fp_rnd fpgetround(void); /* return current rounding mode */

You can change the rounding mode for floating point operations with the function:

fp_rnd fpsetround(fp_rnd); /* set rounding mode, */ /* return previous */

(fp_rnd is an enumeration type with the enumeration constants listed and described above. The values forthese constants are in ieeefp.h.)

NOTE: These examples, such as the one directly above, illustrate function prototypes. For further informationon function prototypes, see ``Function definitions'' in ``C language compilers''.

The default rounding mode is round−to−nearest. In C and FORTRAN (F77), floating point to integerconversions are always done by truncation, and the current rounding mode has no effect on these operations.

For more information on fpgetround and fpsetround, see fpgetround(3C).

Exceptions, sticky bits, and trap bits

Floating point operations can lead to any of the following types of floating point exceptions:

Divide by zero exception

This exception happens when a non−zero number is divided by floating point zero.

Invalid operation exception

All operations on signaling NaNs raise an invalid operation exception. Zero divided by zero, infinitysubtracted from infinity, infinity divided by infinity all raise this exception. When a quiet NaN iscompared with the greater or lesser relational operators, an invalid operation exception is raised.

Overflow exception•

Floating point operations

Rounding control 270

Page 278: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

This exception occurs when the result of any floating point operation is too large in magnitude to fit inthe intended destination.Underflow exception

When the underflow trap is enabled, an underflow exception is signaled when the result of someoperation is a very tiny non−zero number that may cause some other exception later (such as overflowupon division). When the underflow trap is disabled, an underflow exception occurs only when boththe result is very tiny (as explained above) and a loss of accuracy is detected.

Inexact or imprecise exception

This exception is signaled if the rounded result of an operation is not identical to the infinitely preciseresult. Inexact exceptions are quite common. 1.0 / 3.0 is an inexact operation. Inexact exceptions alsooccur when the operation overflows without an overflow trap.

NOTE: The above examples for the exception types do not constitute an exhaustive list of the conditionswhen an exception can occur.

NOTE: The floating point implementation on the Intel processors includes another exception type called``Denormalization exception.'' This exception occurs when the result of an expression is a denormalizednumber.

Single−precision floating point operations

The ANSI standard for C has a provision that allows expressions to be evaluated in single−precisionarithmetic if there is no double (or long double) operand in the expression. The C compiler supports thisprovision.

Floating point constants are double−precision, unless explicitly stated to be float. For example, in thestatements

float a,b; ... a = b + 1.0;

because the constant 1.0 has type double, b is promoted to double before the addition and the result isconverted back to float. However, the constant can be made explicitly a float:

a = b + 1.0f; /* or */ a = b + (float) 1.0;

In this case, the statement can potentially be compiled to a single instruction. Single−precision operations tendto be faster than double−precision operations.

Whether a computation can be done in single−precision is decided based on the operands of each operator.Consider the following:

float s; double d;

Floating point operations

Single−precision floating point operations 271

Page 279: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

d = d + s * s; s * s is computed to produce a single−precision result, which is promoted to double−precisionand added to ``d''. Note that using single−precision (as versus double−precision) arithmetic can result in lossof precision, as illustrated in the following example.

float f = 8191.f * 8191.f; /* evaluate as a float */ double d = 8191. * 8191. ; /* evaluate as a double */ printf ("As float: %f\nAs double: %f\n", f, d);

The result is:

As float: 67092480.000000 As double: 67092481.000000

Also, long int variables (same as int) have more precision than float variables. Consider the followingexample:

int i,j; i = 0x7ffffff; j = i * 1.0; printf("j = %x\n", j); j = i * 1.0f; printf("j = %x\n", j);

The first printf statement outputs 7ffffff, while the second prints 0. The second printf prints 0 because thenearest float to 0x7fffffff has a value of 0x80000000. When the value is converted to an integer, the result is0, and a floating point imprecise result exception occurs. A trap occurs if this exception was enabled.

A function that is declared to return a float may actually return either a float or a double. If the functiondeclaration is a prototype declaration in which at least one of the parameters is float, the function returns afloat. Otherwise, it returns a double with precision limited to that of a float. (All of this is transparent.) Forexample:

float retflt(float); /* actually returns a float */ float retdbl1(); /* actually returns a double */ float retdbl2(int); /* actually returns a double */

Arguments work as follows:

double takeflt(float x); /* takes a float */

double takedbl(x) float x; /* takes a double */

Double−extended−precision

On certain implementations, the C compiler produces code that uses IEEE double−extended−precisionarithmetic. On implementations that do not produce IEEE double−extended−precision arithmetic, either forintermediate or final results, all results are computed with the precision implicit in their type.

ANSI C includes a new data type called long double, which maps to the IEEE extended−precision format.Extended−precision is a wider type than double. Doubles on the Intel386 microprocessor are 64 bits, longdoubles are 80 bits. All arithmetic operations (+,−,*,/) work analogously. However, ANSI C does not requirea long double to be wider than a double.

Floating point operations

Double−extended−precision 272

Page 280: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

The C compilation system handles long double in different fashions, dependent on the implementation. Forexample, on Intel processors, complete support for long double is available. When you use the −Xc optionto the cc command and the implementation does not produce double−extended−precision arithmetic code, thecompiler treats a long double as computationally equivalent to a double. When you use the −Xt or −Xaoptions under these conditions, the compiler treats a long double as an error.

NOTE: Because of compatibility constraints, we recommend that you do not use long double onimplementations that do not support double−extended−precision arithmetic.

IEEE requirements

All arithmetic computations generated by the C compiler strictly conform to IEEE requirements. Thefollowing is a discussion of some topics where the C compilation system falls short of completely meeting theANSI/IEEE Standard 754−1985 requirements or the spirit of the requirements.

Conversion of floating point formats to integer

IEEE requires floating point to integer format conversions to be affected by the current rounding mode.However, the C language requires these conversions to be done by truncation (which is the same asround−to−zero). In the C compilation system floating point to integer conversions are done by truncation.

Conversion of floating point numbers to integers should signal integer overflow or invalid operation for anoverflow condition. In the current implementation the integer overflow flag is set, but there is no way toenable the overflow trap. Enabling the integer overflow trap would result in a substantial performance penaltydue to stalled pipeline effects.

The C compilation system provides the rint function for IEEE−style conversion from floating point tointegers. For information on the rint function, see floor(3M).

Square root

IEEE requires the square root of a negative non−zero number to raise invalid operation, whereas UNIX®

operating system compatibility requires square root to return 0.0 with errno set to EDOM. When you use the−Xt option to the cc command, the sqrt routine in the C compilation system returns 0.0 for negative non−zeroinputs. Otherwise, the −Xt option operation conforms to IEEE requirements. When you use the −Xa or −Xcoption, the square root of a negative non−zero number raises invalid operation and returns a NaN, in strictconformance with the IEEE standard.

Compares and unordered condition

In addition to the usual relationships between floating point values (less than, equal, greater than), there is afourth relationship: unordered. The unordered case arises when at least one operand is a NaN. Every NaNcompares unordered with any value, including itself.

The C compilation system provides the following predicates required by IEEE between floating pointoperands:

== >=

Floating point operations

IEEE requirements 273

Page 281: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

!= <

> <=

While there is no predicate to test for unordered, you can use isnand or isnanf to test whether an argument isa NaN. For information on isnand and isnanf, see isnan(3C).

The relations >, >=, <, and <= raise invalid operation for unordered operands. The compiler generated codedoes not guard against the unordered outcome of a comparison. If the trap is masked, the path taken forunordered conditions is the same as if the conditional were true, which may result in incorrect behavior.

For the predicates == and !=, unordered condition does not lead to invalid operation. The path taken forunordered condition is the same as when the operands are non−equal, which is correct.

(a > b) is not the same as ( !(a <= b) ) in IEEE floating point arithmetic. The difference occurs when b or acompares unordered. The C compiler generates the same code for both cases.

Floating point operations

IEEE requirements 274

Page 282: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

Analyzing your code with lintlint checks for code constructs that may cause your C program not to compile, or to execute with unexpectedresults. lint issues every error and warning message produced by the C compiler. It also issues ``lint−specific''warnings about potential bugs and portability problems.

Why lint is an important tool

lint compensates for separate and independent compilation of files in C by flagging inconsistencies indefinition and use across files, including any libraries you have used. In a large project environmentespecially, where the same function may be used by different programmers in hundreds of separate modulesof code, lint can help discover bugs that otherwise might be difficult to find. A function called with one lessargument than expected, for example, looks at the stack for a value the call has never pushed, with resultscorrect in one condition, incorrect in another, depending on whatever happens to be in memory at that stacklocation. By identifying dependencies like this one, and dependencies on machine architecture as well, lintcan improve the reliability of code run on your machine or someone else's.

NOTE: lint is only intended for use with C programs, it will not work for C++ programs. In C++ thecompiler checks for many of the kinds of constructs that lint flags.

Options and directives

lint is a static analyzer, which means that it cannot evaluate the run−time consequences of the dependencies itdetects. Certain programs may contain hundreds of unreachable break statements, and lint will give awarning for each of them. The number of lint messages issued can be distracting. lint, however, providescommand line options and directives to help suppress warnings you consider to be spurious.

NOTE: Directives are special comments embedded in the source text.

For the example we've cited here,

you can invoke lint with the −b option to suppress all complaints about unreachable breakstatements;

for a finer−grained control, you can precede any unreachable statement with the comment /NOTREACHED / to suppress the diagnostic for that statement.

The ``Usage'' section details options and directives and introduces the lint filter technique, which lets youtailor lint's behavior even more finely to your project's needs. It also shows you how to use lint libraries tocheck your program for compatibility with the library functions you have called in it.

lint and the compiler

Nearly five hundred diagnostic messages are issued by lint. However, this list only contains thoselint−specific warnings that are not issued by the compiler. Also listed are diagnostics issued both by lint andthe compiler that are capable of being suppressed only by lint options. For the text and examples of allmessages issued exclusively by lint or subject exclusively to its options, refer to the ``lint−specific messages''.For the messages also issued by the compiler, consult the ``Compiler diagnostics''.

Analyzing your code with lint 275

Page 283: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

Message formats

Most of lint's messages are simple, one−line statements printed for each occurrence of the problem theydiagnose. Errors detected in included files are reported multiple times by the compiler but only once by lint,no matter how many times the file is included in other source files. Compound messages are issued forinconsistencies across files and, in a few cases, for problems within them as well. A single message describesevery occurrence of the problem in the file or files being checked. When use of a lint filter requires that amessage be printed for each occurrence, compound diagnostics can be converted to the simple type byinvoking lint with the −s option.

NOTE: See ``Usage'' for more information.

What lint does

lint−specific diagnostics are issued for three broad categories of conditions: inconsistent use, nonportablecode, and suspicious constructs. In this section, we'll review examples of lint's behavior in each of these areas,and suggest possible responses to the issues they raise.

Consistency checks

Inconsistent use of variables, arguments, and functions is checked within files as well as across them.Generally speaking, the same checks are performed for prototype uses, declarations, and parameters as forold−style functions. (If your program does not use function prototypes, lint will check the number and typesof parameters in each call to a function more strictly than the compiler.) lint also identifies mismatches ofconversion specifications and arguments in [fs]printf and [fs]scanf control strings. Examples:

Within files, lint flags nonvoid functions that ``fall off the bottom'' without returning a value to theinvoking function. In the past, programmers often indicated that a function was not meant to return avalue by omitting the return type: fun() {}. That convention means nothing to the compiler, whichregards fun as having the return type int. Declare the function with the return type void to eliminatethe problem.

Across files, lint detects cases where a nonvoid function does not return a value, yet is used for itsvalue in an expression, and the opposite problem, a function returning a value that is sometimes oralways ignored in subsequent calls. When the value is always ignored, it may indicate an inefficiencyin the function definition. When it is sometimes ignored, it's probably bad style (typically, not testingfor error conditions). If you do not need to check the return values of string functions like strcat,strcpy, and sprintf, or output functions like printf and putchar, cast the offending call(s) to void.

lint identifies variables or functions that are declared but not used or defined; used but not defined; ordefined but not used. That means that when lint is applied to some, but not all files of a collection tobe loaded together, it will complain about functions and variables declared in those files but definedor used elsewhere; used there but defined elsewhere; or defined there and used elsewhere. Invoke the−x option to suppress the former complaint, −u to suppress the latter two.

Portability checks

Some nonportable code is flagged by lint in its default behavior, and a few more cases are diagnosed whenlint is invoked with −p and/or −Xc. The latter tells lint to check for constructs that do not conform to the

Analyzing your code with lint

Message formats 276

Page 284: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

ANSI C standard. For the messages issued under −p and −Xc, check the ``Usage'' section below. Examples:

In some C language implementations, character variables that are not explicitly declared signed orunsigned are treated as signed quantities with a range typically from −128 to 127. In otherimplementations, they are treated as nonnegative quantities with a range typically from 0 to 255. Sothe test

char c;

c = getchar(); if (c == EOF) . . . where EOF has the value −1, will always fail on machines wherecharacter variables take on nonnegative values. One of lint's −p checks will flag any comparison thatimplies a ``plain'' char may have a negative value. Note, however, that declaring c a signed char inthe above example eliminates the diagnostic, not the problem. That's because getchar must return allpossible characters and a distinct EOF value, so a char cannot store its value. This example, which isperhaps the most common one arising from implementation−defined sign−extension, shows how athoughtful application of lint's portability option can help you discover bugs not related to portability.In any case, declare c as an int.

A similar issue arises with bit−fields. When constant values are assigned to bit−fields, the field maybe too small to hold the value. On a machine that treats bit−fields of type int as unsigned quantities,the values allowed for int x:3 range from 0 to 7, whereas on machines that treat them as signedquantities they range from −4 to 3. However unintuitive it may seem, a three−bit field declared typeint cannot hold the value 4 on the latter machines. lint invoked with −p flags all bit−field types otherthan unsigned int or signed int. Note that these are the only portable bit−field types. The compilationsystem supports int, char, short, and long bit−field types that may be unsigned, signed, or ``plain.''It also supports the enum bit−field type.

Bugs can arise when a larger−sized type is assigned to a smaller−sized type. If significant bits aretruncated, accuracy is lost:

short s; long l; s = l;

lint flags all such assignments by default; the diagnostic can be suppressed by invoking the −a option.Bear in mind that you may be suppressing other diagnostics when you invoke lint with this or anyother option. Check the list in the ``Usage'' section below for the options that suppress more than onediagnostic.

A cast of a pointer to one object type to a pointer to an object type with stricter alignmentrequirements may not be portable. lint flags

int fun(y) char y; { return(int )y; }

because, on most machines, an int cannot start on an arbitrary byte boundary, whereas a char can. Ifyou suppress the diagnostic by invoking lint with −h, you may be disabling other messages. You caneliminate the problem by using the generic pointer void .

ANSI C leaves the order of evaluation of complicated expressions undefined. What this means is thatwhen function calls, nested assignment statements, or the increment and decrement operators causeside effects −− when a variable is changed as a byproduct of the evaluation of an expression −− theorder in which the side effects take place is highly machine dependent. By default, lint flags any

Analyzing your code with lint

Message formats 277

Page 285: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

variable changed by a side effect and used elsewhere in the same expression:

int a[10]; main() { int i = 1; a[i++] = i; }

Note that in this example the value of a[1] may be 1 if one compiler is used, 2 if another. The bitwiselogical operator & can also give rise to this diagnostic when it is mistakenly used in place of thelogical operator &&:

if ((c = getchar()) != EOF & c != '0')

Suspicious constructs

lint flags a number of valid constructs that may not represent what the programmer intended. Examples:

An unsigned variable always has a nonnegative value. So the test

unsigned x; if (x < 0) . . .

will always fail. Whereas the test

unsigned x; if (x > 0) . . .

is equivalent to

if (x != 0) . . .

which may not be the intended action. lint flags suspicious comparisons of unsigned variables withnegative constants or 0. To compare an unsigned variable to the bit pattern of a negative number, castit to unsigned:

if (u == (unsigned) −1) . . .

Or use the U suffix:

if (u == −1U) . . .

lint flags expressions without side effects that are used in a context where side effects are expected,where the expression may not represent what the programmer intended. It issues an additionalwarning whenever the equality operator is found where the assignment operator was expected, inother words, where a side effect was expected:

int fun() { int a, b, x, y; (a = x) && (b == y); }

Analyzing your code with lint

Suspicious constructs 278

Page 286: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

lint cautions you to parenthesize expressions that mix both the logical and bitwise operators(specifically, &, | , ^, <<, >>), where misunderstanding of operator precedence may lead to incorrectresults. Because the precedence of bitwise &, for example, falls below logical ==, the expression

if (x & a == 0) . . .

will be evaluated as

if (x & (a == 0)) . . .

which is most likely not what you intended. Invoking lint with −h disables the diagnostic.

Usage

You invoke lint with a command of the form

$ lint file.c file.c

lint examines code in two passes. In the first, it checks for error conditions local to C source files, in thesecond for inconsistencies across them. This process is invisible to the user unless lint is invoked with −c:

$ lint −c file1.c file2.c

That command directs lint to execute the first pass only and collect information relevant to the second −−about inconsistencies in definition and use across file1.c and file2.c −− in intermediate files named file1.ln andfile2.ln:

$ ls −1 file1.c file1.ln file2.c file2.ln

In this way, the −c option to lint is analogous to the −c option to cc, which suppresses the link editing phaseof compilation. Generally speaking, lint's command line syntax closely follows cc's.

When the .ln files are linted

$ lint file1.ln file2.ln

the second pass is executed. lint processes any number of .c or .ln files in their command line order. So

$ lint file1.ln file2.ln file3.c

directs lint to check file3.c for errors internal to it and all three files for consistency.

lint searches directories for included header files in the same order as cc

NOTE: For further information, see ``Searching for a header file'' in ``C and C++ compilation system''.

Analyzing your code with lint

Usage 279

Page 287: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

Use the −I option to lint as you would the −I option to cc. If you want lint to check an included header filethat is stored in a directory other than your current directory or the standard place, specify the path of thedirectory with −I as follows:

$ lint −Idir file1.c file2.c

You can specify −I more than once on the lint command line. Directories are searched in the order theyappear on the command line. Of course, you can specify multiple options to lint on the same command line.Options may be concatenated unless one of the options takes an argument:

$ lint −cp −Idir −Idir file1.c file2.c

That command directs lint to

execute the first pass only;• perform additional portability checks;• search the specified directories for included header files.•

lint libraries

You can use lint libraries to check your program for compatibility with the library functions you have calledin it: the declaration of the function return type, the number and types of arguments the function expects, andso on. The standard lint libraries correspond to libraries supplied by the C compilation system, and generallyare stored in the standard place on your system, the directory /usr/ccs/lib. By convention, lint libraries havenames of the form llib−lx.ln.

The lint standard C library, llib−lc.ln, is appended to the lint command line by default; checks forcompatibility with it can be suppressed by invoking the −n option. Other lint libraries are accessed asarguments to −l.

$ lint −lx file1.c file2.c

directs lint to check the usage of functions and variables in file1.c and file2.c for compatibility with the lintlibrary llib−lx.ln. The library file, which consists only of definitions, is processed exactly as are ordinarysource files and ordinary .ln files, except that functions and variables used inconsistently in the library file, ordefined in the library file but not used in the source files, elicit no complaints.

To create your own lint library, insert the directive / LINTLIBRARY / at the head of a C source file, theninvoke lint for that file with the −o option and the library name that will be given to −l:

$ lint −ox files headed by / LINTLIBRARY /

causes only definitions in the source files headed by / LINTLIBRARY / to be written to the file llib−lx.ln.(Note the analogy of lint −o to cc −o.) A library can be created from a file of function prototype declarationsin the same way, except that both / LINTLIBRARY / and / PROTOLIBn / must be inserted at the headof the declarations file. If n is 1, prototype declarations will be written to a library .ln file just as are old−styledefinitions. If n is 0, the default, the process is canceled. Invoking lint with −y is another way of creating alint library:

$ lint −y −ox file1.c file2.c

Analyzing your code with lint

lint libraries 280

Page 288: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

causes each source file named on the command line to be treated as if it began with / LINTLIBRARY /and only its definitions to be written to llib−lx.ln.

By default, lint searches for lint libraries in the standard place. To direct lint to search for a lint library in adirectory other than the standard place, specify the path of the directory with the −L option:

$ lint −Ldir −lx file1.c file2.c

The specified directory is searched before the standard place.

lint filters

A lint filter is a project−specific post−processor that typically uses an awk script or similar program to readthe output of lint and discard messages that your project has decided do not identify real problems −− stringfunctions, for instance, returning values that are sometimes or always ignored. It enables you to generatecustomized diagnostic reports when lint options and directives do not provide sufficient control over output.

Two options to lint are particularly useful in developing a filter. Invoking lint with −s causes compounddiagnostics to be converted into simple, one−line messages issued for each occurrence of the problemdiagnosed. The easily parsed message format is suitable for analysis by an awk script.

Invoking lint with −k causes certain comments you have written in the source file to be printed in output, andcan be useful both in documenting project decisions and specifying the post−processor's behavior. In the latterinstance, if the comment identified an expected lint message, and the reported message was the same, themessage might be filtered out. To use −k, insert on the line preceding the code you want to comment the /LINTED [msg] / directive, where msg refers to the comment to be printed when lint is invoked with −k.(Refer to the list of directives below for what lint does when −k is not invoked for a file containing /LINTED [msg] /.)

Options and directives listed

These options suppress specific messages:

−aSuppress:

assignment causes implicit narrowing conversion◊ conversion to larger integral type may sign−extend incorrectly◊

−bFor unreachable break and empty statements, suppress:

statement not reached◊

−hSuppress:

assignment operator "=" found where equality operator "==" was expected◊ constant operand to op: "!"◊ fallthrough on case statement◊ pointer cast may result in improper alignment◊ precedence confusion possible; parenthesize◊

Analyzing your code with lint

lint filters 281

Page 289: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

statement has no consequent: if◊ statement has no consequent: else◊

−mSuppress:

declared global, could be static◊

−uSuppress:

name defined but never used◊ name used but not defined◊

−vSuppress:

argument unused in function◊

−xSuppress:

name declared but never used or defined◊

These options enable specific messages:

−pEnable:

conversion to larger integral type may sign−extend incorrectly◊ may be indistinguishable due to truncation or case◊ pointer casts may be troublesome◊ nonportable bit−field type◊ suspicious comparison of char with value: op "op``"''◊

−XcEnable:

bitwise operation on signed value nonportable◊ function must return int: main()◊ may be indistinguishable due to truncation or case◊ only 0 or 2 parameters allowed: main()◊ nonportable character constant◊

Other options:

−cCreate a .ln file consisting of information relevant to lint's second pass for every .c file named on thecommand line. The second pass is not executed.

−FWhen referring to the .c files named on the command line, print their path names as supplied on thecommand line rather than only their base names.

Analyzing your code with lint

lint filters 282

Page 290: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

−IdirSearch the directory dir for included header files.

−kWhen used with the directive / LINTED [msg] /, print info: msg.

−lxAccess the lint library llib−lx.ln.

−LdirWhen used with −l, search for a lint library in the directory dir.

−nSuppress checks for compatibility with the default lint standard C library.

−oxCreate the file llib−lx.ln, consisting of information relevant to lint's second pass, from the .c filesnamed on the command line. Generally used with −y or / LINTLIBRARY / to create lint libraries.

−sConvert compound messages into simple ones.

−yTreat every .c file named on the command line as if it began with the directive /* LINTLIBRARY */.

−VWrite the product name and release to standard error.

Directives:

/* ARGSUSEDn */Suppress:

argument unused in function◊

for every argument but the first n in the function definition it precedes. Default is 0.

/* CONSTCOND */Suppress:

constant in conditional context◊ constant operand to op: "!"◊ logical expression always false: op "&&"◊ logical expression always true: op "||"◊

for the constructs it precedes. Also/ CONSTANTCONDITION /.

/* EMPTY */Suppress:

statement has no consequent: else◊

Analyzing your code with lint

lint filters 283

Page 291: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

when inserted between the else and semicolon;

statement has no consequent: if◊

when inserted between the controlling expression of the if and semicolon.

/* FALLTHRU */Suppress:

fallthrough on case statement◊

for the case statement it precedes. Also/ FALLTHROUGH /.

/* LINTED [msg] */When −k is not invoked, suppress every warning pertaining to an intrafile problem except:

argument unused in function◊ declaration unused in block◊ set but not used in function◊ static unused◊ variable unused in function◊

for the line of code it precedes. msg is ignored.

/* LINTLIBRARY */When −o is invoked, write to a library .ln file only definitions in the .c file it heads.

/* NOTREACHED */Suppress:

statement not reached◊

for the unreached statements it precedes;

fallthrough on case statement◊

for the case it precedes that cannot be reached from the preceding case;

function falls off bottom without returning value◊

for the closing curly brace it precedes at the end of the function.

/* PRINTFLIKEn */Treat the nth argument of the function definition it precedes as a [fs]printf format string and issue:

malformed format string◊

for invalid conversion specifications in that argument, and

function argument type inconsistent with format◊ too few arguments for format◊ too many arguments for format◊

Analyzing your code with lint

lint filters 284

Page 292: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

for mismatches between the remaining arguments and the conversion specifications. lint issues thesewarnings by default for errors in calls to [fs]printf functions provided by the standard C library.

/* PROTOLIBn */When n is 1 and / LINTLIBRARY / is used, write to a library .ln file only function prototypedeclarations in the .c file it heads. Default is 0, canceling the process.

/* SCANFLIKEn */Same as / PRINTFLIKEn / except that the nth argument of the function definition is treated as a[fs]scanf format string. By default, lint issues warnings for errors in calls to [fs]scanf functionsprovided by the standard C library.

/* VARARGSn */For the function whose definition it precedes, suppress:

function called with variable number of arguments◊

for calls to the function with n or more arguments.

lint−specific messages

This section lists alphabetically the warning messages issued exclusively by lint or subject exclusively to itsoptions. The code examples illustrate conditions in which the messages are elicited. Note that some of theexamples would elicit messages in addition to the one stated. For the remaining lint messages, consult``Compiler diagnostics''.

argument unused in function

Format: Compound

A function argument was not used. Preceding the function definition with/ ARGSUSEDn / suppresses the message for all but the first n arguments; invoking lint with −v suppressesit for every argument.

1 int fun(int x, int y)2 {3 return x;4 }5 / ARGSUSED1 /6 int fun2(int x, int y)7 {8 return x;9 }============argument unused in function (1) y in fun

array subscript cannot be > value: value

Format: Simple

The value of an array element's subscript exceeded the upper array bound.

Analyzing your code with lint

lint−specific messages 285

Page 293: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

1 int fun()2 {3 int a[10];4 int p = a;5 while (p != &a[10]) / using address is ok /6 p++;7 return a[5 + 6];8 }============(7) warning: array subscript cannot be > 9: 11

array subscript cannot be negative: value

Format: Simple

The constant expression that represents the subscript of a true array (as opposed to a pointer) had a negativevalue.

1 int f()2 {3 int a[10];4 return a[5 2 / 10 − 2];5 }============(4) warning: array subscript cannot be negative: −1

assignment causes implicit narrowing conversion

Format: Compound

An object was assigned to one of a smaller type. Invoking lint with −a suppresses the message. So does anexplicit cast to the smaller type.

1 void fun()2 {3 short s;4 long l = 0;5 s = l;6 }============assignment causes implicit narrowing conversion (5) short=long

assignment of negative constant to unsigned type

Format: Simple

A negative constant was assigned to a variable of unsigned type. Use a cast or the U suffix.

1 void fun()2 {3 unsigned i;4 i = −1;5 i = −1U;6 i = (unsigned) (−4 + 3);7 }============

Analyzing your code with lint

lint−specific messages 286

Page 294: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

(4) warning: assignment of negative constant to unsigned type

assignment operator "=" found where "==" was expected

Format: Simple

An assignment operator was found where a conditional expression was expected. The message is not issuedwhen an assignment is made to a variable using the value of a function call or in the case of string copying(see the example below). The warning is suppressed when lint is invoked with −h.

1 void fun()2 {3 char p, q;4 int a = 0, b = 0, c = 0, d = 0, i;5 i = (a = b) && (c == d);6 i = (c == d) && (a = b);7 if (a = b)8 i = 1;9 while ( p++ = q++);10 while (a = b);11 while ((a = getchar()) == b);12 if (a = foo()) return;13 }============(5) warning: assignment operator "=" found where "==" was expected(7) warning: assignment operator "=" found where "==" was expected(10) warning: assignment operator "=" found where "==" was expected

bitwise operation on signed value nonportable

Format: Compound

The operand of a bitwise operator was a variable of signed integral type, as defined by ANSI C. Because theseoperators return values that depend on the internal representations of integers, their behavior isimplementation−defined for operands of that type. The message is issued only when lint is invoked with −Xc.

1 fun()2 {3 int i;4 signed int j;5 unsigned int k;6 i = i & 055;7 j = j | 022;8 k = k >> 4;9 }============warning: bitwise operation on signed value nonportable (6) (7)

constant in conditional context

Format: Simple

Analyzing your code with lint

lint−specific messages 287

Page 295: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

The controlling expression of an if, while, or for statement was a constant. Preceding the statement with /CONSTCOND / suppresses the message.

1 void fun()2 {3 if (! 1) return;4 while (1) foo();5 for (;1;);6 for (;;);7 / CONSTCOND /8 while (1);9 }============(3) warning: constant in conditional context(4) warning: constant in conditional context(5) warning: constant in conditional context

constant operand to op: "!"

Format: Simple

The operand of the NOT operator was a constant. Preceding the statement with / CONSTCOND /suppresses the message for that statement; invoking lint with −h suppresses it for every statement.

1 void fun()2 {3 if (! 0) return;4 / CONSTCOND /5 if (! 0) return;6 }============(3) warning: constant operand to op: "!"

constant truncated by assignment

Format: Simple

An integral constant expression was assigned or returned to an object of an integral type that cannot hold thevalue without truncation.

1 unsigned char f()2 {3 unsigned char i;4 i = 255;5 i = 256;6 return 256;7 }============(5) warning: constant truncated by assignment(6) warning: constant truncated by assignment

conversion of pointer loses bits

Format: Simple

A pointer was assigned to an object of an integral type that is smaller than the pointer.

Analyzing your code with lint

lint−specific messages 288

Page 296: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

1 void fun()2 {3 char c;4 int i;5 c = i;6 }============(5) warning: conversion of pointer loses bits

conversion to larger integral type may sign−extend incorrectly

Format: Compound

A variable of type ``plain'' char was assigned to a variable of a largerintegral type. Whether a ``plain'' char is treated as signed or unsigned is implementation−defined. Themessage is issued only when lint is invoked with −p, and is suppressed when it is invoked with −a.

1 void fun()2 {3 char c = 0;4 short s = 0;5 long l;6 l = c;7 l = s;8 }============conversion to larger integral type may sign−extend incorrectly (6)

declaration unused in block

Format: Compound

An external variable or function was declared but not used in an inner block.

1 int fun()2 {3 int foo();4 int bar();5 return foo();6 }============declaration unused in block (4) bar

declared global, could be static

Format: Compound

An external variable or function was declared global, instead of static, but was referenced only in the file inwhich it was defined. The message is suppressed when lint is invoked with −m.

file f1.c1 int i;2 int foo() {return i;}3 int fun() {return i;}4 static int stfun() {return fun();}

Analyzing your code with lint

lint−specific messages 289

Page 297: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

file f2.c1 main()2 {3 int a;4 a = foo();5 }============declared global, could be static fun f1.c(3) i f1.c(1)

equality operator "==" found where "=" was expected

Format: Simple

An equality operator was found where a side effect was expected.

1 void fun(a, b)2 int a, b;3 {4 a == b;5 for (a == b; a < 10; a++);6 }============(4) warning: equality operator "==" found where "=" was expected(5) warning: equality operator "==" found where "=" was expected

evaluation order undefined: name

Format: Simple

A variable was changed by a side effect and used elsewhere in the same expression.

1 int a[10];2 main()3 {4 int i = 1;5 a[i++] = i;6 }============(5) warning: evaluation order undefined: i

fallthrough on case statement

Format: Simple

Execution fell through one case to another without a break or return. Preceding a case statement with /FALLTHRU /, or / NOTREACHED / when the case cannot be reached from the preceding case (seebelow), suppresses the message for that statement; invoking lint with −h suppresses it for every statement.

1 void fun(i)2 {3 switch (i) {4 case 10:5 i = 0;

Analyzing your code with lint

lint−specific messages 290

Page 298: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

6 case 12:7 return;8 case 14:9 break;10 case 15:11 case 16:12 break;13 case 18:14 i = 0;15 / FALLTHRU /16 case 20:17 error("bad number");18 / NOTREACHED /19 case 22:20 return;21 }22 }============(6) warning: fallthrough on case statement

function argument ( number ) declared inconsistently

Format: Compound

The parameter types in a function prototype declaration or definition differed from their types in anotherdeclaration or definition. The message described after this one is issued for uses (not declarations ordefinitions) of a prototype with the wrong parameter types.

file i3a.c1 int fun1(int);2 int fun2(int);3 int fun3(int);file i3b.c1 int fun1(int i);2 int fun2(int i) {}3 void foo()4 {5 int i;6 fun3(i);7 }============function argument ( number ) declared inconsistently fun2 (arg 1) i3b.c(2) int :: i3a.c(2) int fun1 (arg 1) i3a.c(1) int :: i3b.c(1) int function argument ( number ) used inconsistently fun3 (arg 1) i3a.c(3) int :: i3b.c(6) int

function argument ( number ) used inconsistently

Format: Compound

The argument types in a function call did not match the types of the formal parameters in the functiondefinition. (And see the discussion of the preceding message.)

file f1.c1 int fun(int x, int y)2 {3 return x + y;

Analyzing your code with lint

lint−specific messages 291

Page 299: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

4 }file f2.c1 int main()2 {3 int x;4 extern int fun();5 return fun(1, x);6 }============function argument ( number ) used inconsistently fun( arg 2 ) f1.c(2) int :: f2.c(5) int

function argument type inconsistent with format

Format: Compound

An argument was inconsistent with the corresponding conversion specification in the control string of a[fs]printf or [fs]scanf function call. (See also / PRINTFLIKEn / and / SCANFLIKEn / in the list ofdirectives in the ``Usage'' section above.)

1 #include <stdio.h>2 main()3 {4 int i;5 printf("%s", i);6 }============function argument type inconsistent with format printf(arg 2) int :: (format) char test.c(5)

function called with variable number of arguments

Format: Compound

A function was called with the wrong number of arguments. Preceding a function definition with /VARARGSn / suppresses the message for calls with n or more arguments; defining and declaring a functionwith the ANSI C notation ``. . .'' suppresses it for every argument.

NOTE: See "function declared with variable number of arguments" for more information.

file f1.c1 int fun(int x, int y, int z)2 {3 return x + y + z;4 }5 int fun2(int x, . . .)6 {7 return x;8 }10 / VARARGS1 /11 int fun3(int x, int y, int z)12 {13 return x;14 }file f2.c1 int main()

Analyzing your code with lint

lint−specific messages 292

Page 300: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

2 {3 extern int fun(), fun3(), fun2(int x, . . .);4 return fun(1, 2);5 return fun2(1, 2, 3, 4);6 return fun3(1, 2, 3, 4, 5);7 }============function called with variable number of arguments fun f1.c(2) :: f2.c(4)

function declared with variable number of arguments

Format: Compound

The number of parameters in a function prototype declaration or definition differed from their number inanother declaration or definition. Declaring and defining the prototype with the ANSI C notation ``. . .''suppresses the warning if all declarations have the same number of arguments. The message immediatelypreceding this one is issued for uses (not declarations or definitions) of a prototype with the wrong number ofarguments.

file i3a.c1 int fun1(int);2 int fun2(int);3 int fun3(int);file i3b.c1 int fun1(int, int);2 int fun2(int a, int b) {}3 void foo()4 {5 int i, j, k;6 i = fun3(j, k);7 }============function declared with variable number of arguments fun2 i3a.c(2) :: i3b.c(2) fun1 i3a.c(1) :: i3b.c(1)function called with variable number of arguments fun3 i3a.c(3) :: i3b.c(6)

function falls off bottom without returning value

Format: Compound

A nonvoid function did not return a value to the invoking function. If the closing curly brace is truly notreached, preceding it with / NOTREACHED / suppresses the message.

1 fun()2 {}3 void fun2()4 {}5 foo()6 {7 exit(1);8 / NOTREACHED /9 }============function falls off bottom without returning value (2) fun

Analyzing your code with lint

lint−specific messages 293

Page 301: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

function must return int: main()

Format: Simple

The program's main function does not return int, in violation of ANSI C restrictions. The message is issuedonly when lint is invoked with −Xc.

1 void main()2 {}============(2) warning: function must return int: main()

function returns pointer to [automatic/parameter]

Format: Simple

A function returned a pointer to an automatic variable or a parameter. Since an object with automatic storageduration is no longer guaranteed to be reserved after the end of the block, the value of the pointer to thatobject will be indeterminate after the end of the block.

1 int fun(int x)2 {3 int a[10];4 int b;5 if (x == 1)6 return a;7 else if (x == 2)8 return &b ;9 else return &x ;10 }============(6) warning: function returns pointer to automatic(8) warning: function returns pointer to automatic(9) warning: function returns pointer to parameter

function returns value that is always ignored

Format: Compound

A function contained a return statement and every call to the function ignored its return value.

file f1.c1 int fun()2 {3 return 1;4 }file f2.c1 extern int fun();2 int main()3 {4 fun();5 return 1;6 }============function returns value that is always ignored fun

Analyzing your code with lint

lint−specific messages 294

Page 302: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

function returns value that is sometimes ignored

Format: Compound

A function contained a return statement and some, but not all, calls to the function ignored its return value.

file f1.c1 int fun()2 {3 return 1;4 }file f2.c1 extern int fun();2 int main()3 {4 if(1) {5 return fun();6 } else {7 fun();8 return 1;9 }10 }============function returns value that is sometimes ignored fun

function value is used, but none returned

Format: Compound

A nonvoid function did not contain a return statement, yet was used for its value in an expression.

file f1.c1 extern int fun();2 main()3 {4 return fun();5 }file f2.c1 int fun()2 {}============function value is used, but none returned fun

logical expression always false: op "&&"

Format: Simple

A logical AND expression checked for equality of the same variable to two different constants, or had theconstant 0 as an operand. In the latter case, preceding the expression with / CONSTCOND / suppresses themessage.

1 void fun(a)2 int a;3 {4 a = (a == 1) && (a == 2);

Analyzing your code with lint

lint−specific messages 295

Page 303: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

5 a = (a == 1) && (a == 1);6 a = (1 == a) && (a == 2);7 a = (a == 1) && 0;8 / CONSTCOND /9 a = (0 && (a == 1));10 }============(4) warning: logical expression always false: op "&&"(6) warning: logical expression always false: op "&&"(7) warning: logical expression always false: op "&&"

logical expression always true: op "||"

Format: Simple

A logical OR expression checked for inequality of the same variable to two different constants, or had anonzero integral constant as an operand. In the latter case, preceding the expression with / CONSTCOND /suppresses the message.

1 void fun(a)2 int a;3 {4 a = (a != 1) || (a != 2);5 a = (a != 1) || (a != 1);6 a = (1 != a) || (a != 2);7 a = (a == 10) || 1;8 / CONSTCOND /9 a = (1 || (a == 10));10 }============(4) warning: logical expression always true: op "||"(6) warning: logical expression always true: op "||"(7) warning: logical expression always true: op "||"

malformed format string

Format: Compound

A [fs]printf or [fs]scanf control string was formed incorrectly. (See also / PRINTFLIKEn / and /SCANFLIKEn / in the list of directives in the ``Usage'' section above.)

1 #include <stdio.h>2 main()3 {4 printf("%y");5 }============malformed format string printf test.c(4)

may be indistinguishable due to truncation or case

Format: Compound

External names in a program may be indistinguishable when it is ported to another machine due toimplementation−defined restrictions as to length or case. The message is issued only when lint is invokedwith −Xc or −p. Under −Xc, external names are truncated to the first 6 characters with one case, in

Analyzing your code with lint

lint−specific messages 296

Page 304: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

accordance with the ANSI C lower bound; under −p, to the first 8 characters with one case.

file f1.c1 int foobar1;2 int FooBar12;file f2.c1 int foobar2;2 int FOOBAR12;============

under −p may be indistinguishable due to truncation or case FooBar12 f1.c(2) :: FOOBAR12 f2.c(2)

under −Xc may be indistinguishable due to truncation or case foobar1 f1.c(1) :: FooBar12 f1.c(2) foobar1f1.c(1) :: foobar2 f2.c(1) foobar1 f1.c(1) :: FOOBAR12 f2.c(2)

name declared but never used or defined

Format: Compound

A nonstatic external variable or function was declared but not used or defined in any file. The message issuppressed when lint is invoked with −x.

file f.c1 extern int fun();2 static int foo();============name declared but never used or defined fun f.c(1)

name defined but never used

Format: Compound

A variable or function was defined but not used in any file. The message is suppressed when lint is invokedwith −u.

file f.c1 int i, j, k = 1;2 main()3 {4 j = k;5 }============name defined but never used i f.c(1)

name multiply defined

Format: Compound

A variable was defined in more than one source file.

file f1.c1 char i = 'a';file f2.c1 long i = 1;

Analyzing your code with lint

lint−specific messages 297

Page 305: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

============name multiply defined i f1.c(1) :: f2.c(1)

name used but not defined

Format: Compound

A nonstatic external variable or function was declared but not defined in any file. The message is suppressedwhen lint is invoked with −u.

file f.c1 extern int fun();2 int main()3 {4 return fun();5 }============name used but not defined fun f.c(4)

nonportable bit−field type

Format: Simple

A bit−field type other than signed int or unsigned int was used. The message is issued only when lint isinvoked with −p. Note that these are the only portable bit−field types. The compilation system supports int,char, short, and long bit−field types that may be unsigned, signed, or ``plain.'' It also supports the enumbit−field type.

1 struct u {2 unsigned v:1;3 int w:1;4 char x:8;5 long y:8;6 short z:8;7 };============(3) warning: nonportable bit−field type(4) warning: nonportable bit−field type(5) warning: nonportable bit−field type(6) warning: nonportable bit−field type

nonportable character constant

Format: Simple

A multi−character character constant in the program may not be portable. The message is issued only whenlint is invoked with −Xc.

1 int c = 'abc';============(1) warning: nonportable character constant

only 0 or 2 parameters allowed: main()

Analyzing your code with lint

lint−specific messages 298

Page 306: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

Format: Simple

The function main in your program was defined with only one parameter or more than two parameters, inviolation of the ANSI C requirement. The message is issued only when lint is invoked with −Xc.

1 main(int argc, char argv, char envp)2 {}============(2) warning: only 0 or 2 parameters allowed: main()

pointer cast may result in improper alignment

Format: Compound

A pointer to one object type was cast to a pointer to an object type with stricter alignment requirements. Doingso may result in a value that is invalid forthe second pointer type. The warning is suppressed when lint is invoked with −h.

1 void fun()2 {3 short s;4 int i;5 i = (int ) s;6 }============pointer cast may result in improper alignment (5)

pointer casts may be troublesome

Format: Compound

A pointer to one object type was cast to a pointer to a different object type. The message is issued only whenlint is invoked with −p, and is not issued for the generic pointer void .

1 void fun()2 {3 int i;4 char c;5 void v;6 i = (int ) c;7 i = (int ) v;8 }============warning: pointer casts may be troublesome (6)

precedence confusion possible; parenthesize

Format: Simple

An expression that mixes a logical and a bitwise operator was not parenthesized. The message is suppressedwhen lint is invoked with −h.

1 void fun()2 {

Analyzing your code with lint

lint−specific messages 299

Page 307: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

3 int x = 0, m = 0, MASK = 0, i;4 i = (x + m == 0);5 i = (x & MASK == 0); / eval'd (x & (MASK == 0)) /6 i = (MASK == 1 & x); / eval'd ((MASK == 1) & x) /7 }============(5) warning: precedence confusion possible; parenthesize(6) warning: precedence confusion possible; parenthesize

precision lost in bit−field assignment

Format: Simple

A constant was assigned to a bit−field too small to hold the value without truncation. Note that in thefollowing example the bit−field z may have values that range from 0 to 7 or −4 to 3, depending on themachine.

1 void fun()2 {3 struct {4 signed x:3; / max value allowed is 3 /5 unsigned y:3; / max value allowed is 7 /6 int z:3; / max value allowed is 7 /7 } s;8 s.x = 3;9 s.x = 4;10 s.y = 7;11 s.y = 8;12 s.z = 7;13 s.z = 8;14 }============(9) warning: precision lost in bit−field assignment: 4(11) warning: precision lost in bit−field assignment: 0x8(13) warning: precision lost in bit−field assignment: 8

set but not used in function

Format: Compound

An automatic variable or a function parameter was declared and set but not used in a function.

1 void fun(y)2 int y;3 {4 int x;5 x = 1;6 y = 1;7 }============set but not used in function (4) x in fun (1) y in fun

statement has no consequent: else

Format: Simple

Analyzing your code with lint

lint−specific messages 300

Page 308: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

An if statement had a null else part. Inserting / EMPTY / between the else and semicolon suppresses themessage for that statement; invoking lint with −h suppresses it for every statement.

1 void f(a)2 int a;3 {4 if (a)5 return;6 else;7 }============(6) warning: statement has no consequent: else

statement has no consequent: if

Format: Simple

An if statement had a null if part. Inserting / EMPTY / between the controlling expression of the if andsemicolon suppresses the message for that statement; invoking lint with −h suppresses it for every statement.

1 void f(a)2 int a;3 {4 if (a);5 if (a == 10)6 / EMPTY /;7 else return;8 }============(4) warning: statement has no consequent: if

statement has null effect

Format: Compound

An expression did not generate a side effect where a side effect was expected. Note that the message is issuedfor every subsequent sequence point that is reached at which a side effect is not generated.

1 void fun()2 {3 int a, b, c, x;4 a;5 a == 5;6 ;7 while (x++ != 10);8 (a == b) && (c = a);9 (a = b) && (c == a);10 (a, b);11 }============statement has null effect (4) (5) (9) (10)

statement not reached

Format: Compound

Analyzing your code with lint

lint−specific messages 301

Page 309: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

A function contained a statement that cannot be reached. Preceding an unreached statement with /NOTREACHED / suppresses the message for that statement; invoking lint with −b suppresses it for everyunreached break and empty statement. Note that this message is also issued by the compiler but cannot besuppressed.

1 void fun(a)2 {3 switch (a) {4 case 1:5 return;6 break;7 case 2:8 return;9 / NOTREACHED /10 break;11 }12 }============statement not reached (6)

static unused

Format: Compound

A variable or function was defined or declared static in a file but not used in that file. Doing so is probably aprogramming error because the object cannot be used outside the file.

1 static int x;2 static int main() {}3 static int foo();4 static int y = 1;============static unused (4) y (3) foo (2) main (1) x

suspicious comparison of char with value: op "op"

Format: Simple

A comparison was performed on a variable of type ``plain'' char that implied it may have a negative value (<0, <= 0, >= 0, > 0). Whether a ``plain'' char is treated as signed or nonnegative is implementation−defined.The message is issued only when lint is invoked with −p.

1 void fun(c, d)2 char c;3 signed char d;4 {5 int i;6 i = (c == −5);7 i = (c < 0);8 i = (d < 0);9 }============(6) warning: suspicious comparison of char with negative constant: op "=="(7) warning: suspicious comparison of char with 0: op "<"

Analyzing your code with lint

lint−specific messages 302

Page 310: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

suspicious comparison of unsigned with value: op "op"

Format: Simple

A comparison was performed on a variable of unsigned type that implied it may have a negative value (< 0,<= 0, >= 0, > 0).

1 void fun(x)2 unsigned x;3 {4 int i;5 i = (x > −2);6 i = (x < 0);7 i = (x <= 0);8 i = (x >= 0);9 i = (x > 0);10 i = (−2 < x);11 i = (x == −1);12 i = (x == −1U);13 }============(5) warning: suspicious comparison of unsigned with negative constant: op ">"(6) warning: suspicious comparison of unsigned with 0: op "<"(7) warning: suspicious comparison of unsigned with 0: op "<="(8) warning: suspicious comparison of unsigned with 0: op ">="(9) warning: suspicious comparison of unsigned with 0: op ">"(10) warning: suspicious comparison of unsigned with negative constant: op "<"(11) warning: suspicious comparison of unsigned with negative constant: op "=="

too few arguments for format

Format: Compound

A control string of a [fs]printf or [fs]scanf function call had more conversion specifications than there werearguments remaining in the call. (See also / PRINTFLIKEn / and / SCANFLIKEn / in the list ofdirectives in the ``Usage'' section above.)

1 #include <stdio.h>2 main()3 {4 int i;5 printf("%d%d", i);6 }============too few arguments for format printf test.c(5)

too many arguments for format

Format: Compound

A control string of a [fs]printf or [fs]scanf function call had fewer conversion specifications than there werearguments remaining in the call. (See also / PRINTFLIKEn / and / SCANFLIKEn / in the list ofdirectives in the ``Usage'' section above.)

Analyzing your code with lint

lint−specific messages 303

Page 311: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

1 #include <stdio.h>2 main()3 {4 int i, j;5 printf("%d", i, j);6 }============too many arguments for format printf test.c(5)

value type declared inconsistently

Format: Compound

The return type in a function declaration or definition did not match the return type in another declaration ordefinition of the function. The message is also issued for inconsistent declarations of variable types.

file f1.c1 void fun() {}2 void foo();3 extern int a;file f2.c1 extern int fun();2 extern int foo();3 extern char a;============value type declared inconsistently fun f1.c(1) void() :: f2.c(1) int() foo f1.c(2) void() :: f2.c(2) int() a f1.c(3) int :: f2.c(3) char

value type used inconsistently

Format: Compound

The return type in a function call did not match the return type in the function definition.

file f1.c1 int fun(p)2 int p;3 {4 return p;5 }file f2.c1 main()2 {3 int i, p;4 i = fun(p);5 }============value type used inconsistently fun f1.c(3) int () :: f2.c(4) int()

variable may be used before set: name

Format: Simple

Analyzing your code with lint

lint−specific messages 304

Page 312: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

The first reference to an automatic, non−array variable occurred at a line number earlier than the firstassignment to the variable. Note that taking the address of a variable implies both a set and a use, and that thefirst assignment to any member of a struct or union implies an assignment to the entire struct or union.

1 void fun()2 {3 int i, j, k;4 static int x;5 k = j;6 i = i + 1;7 x = x + 1;8 }============(5) warning: variable may be used before set: j(6) warning: variable may be used before set: i

variable unused in function

Format: Compound

A variable was declared but never used in a function.

1 void fun()2 {3 int x, y;4 static z;5 }============variable unused in function (4) z in fun (3) y in fun (3) x in fun

Analyzing your code with lint

lint−specific messages 305

Page 313: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

m4 macro processorm4 is a general purpose macro processor that can be used to preprocess C and assembly language programs,among other things. Besides the straightforward replacement of one string of text by another, m4 lets youperform

integer arithmetic• file inclusion• conditional macro expansion• string and substring manipulation•

You can use built−in macros to perform these tasks or define your own macros. Built−in and user−definedmacros work exactly the same way except that some of the built−in macros have side effects on the state ofthe process. A list of built−in macros appears on the m4(1) page.

The basic operation of m4 is to read every legal token (string of ASCII letters and digits and possiblysupplementary characters) and determine if the token is the name of a macro. The name of the macro isreplaced by its defining text, and the resulting string is pushed back onto the input to be rescanned. Macrosmay be called with arguments. The arguments are collected and substituted into the right places in thedefining text before the defining text is rescanned.

Macro calls have the general form

name(arg1, arg2, ..., argn)

If a macro name is not immediately followed by a left parenthesis, it is assumed to have no arguments.Leading unquoted blanks, tabs, and new−lines are ignored while collecting arguments. Left and right singlequotes are used to quote strings. The value of a quoted string is the string stripped of the quotes.

When a macro name is recognized, its arguments are collected by searching for a matching right parenthesis.If fewer arguments are supplied than are in the macro definition, the trailing arguments are taken to be null.Macro evaluation proceeds normally during the collection of the arguments, and any commas or rightparentheses that appear in the value of a nested call are as effective as those in the original input text. Afterargument collection, the value of the macro is pushed back onto the input stream and rescanned.

You invoke m4 with a command of the form

$ m4 file file file

Each argument file is processed in order. If there are no arguments or if an argument is a hyphen, the standardinput is read. If you are eventually going to compile the m4 output, you could use a command something likethis:

$ m4 file1.m4 > file1.c

You can use the −D option to define a macro on the m4 command line. Suppose you have two similarversions of a program. You might have a single m4 input file capable of generating the two output files. Forexample, file1.m4 could contain lines such as

if(VER, 1, do_something) if(VER, 2, do_something)

m4 macro processor 306

Page 314: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

Your makefile for the program might look like this:

file1.1.c : file1.m4 m4 −DVER=1 file1.m4 > file1.1.c ...

file1.2.c : file1.m4 m4 −DVER=2 file1.m4 > file1.2.c ... You can use the −U option to ``undefine'' VER. Iffile1.m4 contains

if(VER, 1, do_something) if(VER, 2, do_something) ifndef(VER, do_something)

then your makefile would contain

file0.0.c : file1.m4 m4 −UVER file1.m4 > file1.0.c ...

file1.1.c : file1.m4 m4 −DVER=1 file1.m4 > file1.1.c ...

file1.2.c : file1.m4 m4 −DVER=2 file1.m4 > file1.2.c ...

Defining macros

The primary built−in m4 macro is define, which is used to define new macros. The following input

define(name, stuff)

causes the string name to be defined as stuff. All subsequent occurrences of name will be replaced by stuff.The defined string must contain only ASCII alphanumeric or printable supplementary characters and mustbegin with a letter or printable supplementary character (underscore counts as a letter). The defining string isany text that contains balanced parentheses; it may stretch over multiple lines. As a typical example

define(N, 100) ... if (i > N)

defines N to be 100 and uses the ``symbolic constant'' N in a later if statement. As noted, the left parenthesismust immediately follow the word define to signal that define has arguments. If the macro name is notimmediately followed by a left parenthesis, it is assumed to have no arguments. In the previous example, then,N is a macro with no arguments.

A macro name is only recognized as such if it appears surrounded by characters which cannot be used in amacro name. In the following example

define(N, 100) ... if (NNN > 100)

the variable NNN is unrelated to the defined macro N even though the variable contains Ns.

m4 macro processor

Defining macros 307

Page 315: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

m4 expands macro names into their defining text as soon as possible.

define(N, 100) define(M, N)

defines M to be 100 because the string N is immediately replaced by 100 as the arguments of define(M, N)are collected. To put this another way, if N is redefined, M keeps the value 100.

There are two ways to avoid this behavior. The first, which is specific to the situation described here, is tointerchange the order of the definitions:

define(M, N) define(N, 100)

Now M is defined to be the string N, so when the value of M is requested later, the result will always be thevalue of N at that time (because the M will be replaced by N which will be replaced by 100).

Quoting

The more general solution is to delay the expansion of the arguments of define by quoting them. Any textsurrounded by left and right single quotes is not expanded immediately, but has the quotes stripped off as thearguments are collected. The value of the quoted string is the string stripped of the quotes.

define(N, 100) define(M, `N')

defines M as the string N, not 100.

The general rule is that m4 always strips off one level of single quotes whenever it evaluates something. Thisis true even outside of macros. If the word define is to appear in the output, the word must be quoted in theinput:

`define' = 1;

It's usually best to quote the arguments of a macro to assure that what you are assigning to the macro nameactually gets assigned. To redefine N, for example, you delay its evaluation by quoting:

define(N, 100) ... define(`N', 200)

Otherwise

define(N, 100) ... define(N, 200)

the N in the second definition is immediately replaced by 100. The effect is the same as saying

define(100, 200)

Note that this statement will be ignored by m4 because only things that look like names can be defined.

m4 macro processor

Quoting 308

Page 316: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

If left and right single quotes are not convenient for some reason, the quote characters can be changed with thebuilt−in macro changequote:

changequote([, ])

In this example the macro makes the ``quote'' characters the left and right brackets instead of the left and rightsingle quotes. The quote symbols can be up to five characters long. The original characters can be restored byusing changequote without arguments:

changequote

undefine removes the definition of a macro or built−in:

undefine(`N')

Here the macro removes the definition of N. Be sure to quote the argument to undefine. Built−ins can beremoved with undefine as well:

undefine(`define')

Note that once a built−in is removed or redefined, its original definition cannot be reused.

Macros can be renamed with defn. Suppose you want the built−in define to be called XYZ. You specify

define(XYZ, defn(`define')) undefine(`define')

After this, XYZ takes on the original meaning of define.

XYZ(A, 100)

defines A to be 100.

The built−in ifdef provides a way to determine if a macro is currently defined. Depending on the system, adefinition appropriate for the particular machine can be made as follows:

ifdef(`pdp11', `define(wordsize,16)') ifdef(`u3b', `define(wordsize,32)')

The ifdef macro permits three arguments. If the first argument is defined, the value of ifdef is the secondargument. If the first argument is not defined, the value of ifdef is the third argument:

ifdef(`unix', on UNIX, not on UNIX)

If there is no third argument, the value of ifdef is null.

Arguments

The previous sections focused on the simplest form of macro processing −− replacing one string with another(fixed) string. Macros can also be defined so that different invocations have different results. In thereplacement text for a macro (the second argument of its define), any occurrence of ``$''n is replaced by thenth argument when the macro is actually used. The macro bump, defined as

m4 macro processor

Arguments 309

Page 317: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

define(bump, $1 = $1 + 1)

is equivalent to x = x + 1 for bump(x).

A macro can have as many arguments as you want, but only the first nine are accessible individually, $1through $9. $0 refers to the macro name itself. Arguments that are not supplied are replaced by null strings, soa macro can be defined that simply concatenates its arguments:

define(cat, $1$2$3$4$5$6$7$8$9)

cat(x, y, z) is equivalent to xyz. Arguments $4 through $9 are null because no corresponding arguments wereprovided.

Leading unquoted blanks, tabs, or new−lines that occur during argument collection are discarded. All otherwhite space is retained, so

define(a, b c)

defines a to be b c.

Arguments are separated by commas. A comma ``protected'' by parentheses does not terminate an argument:

define(a, (b,c))

has two arguments, a and (b,c). You can specify a comma or parenthesis as an argument by quoting it.

$ is replaced by a list of the arguments given to the macro in a subsequent invocation. The listed argumentsare separated by commas.

define(a, 1) define(b, 2) define(star, `$ ') star(a, b)

gives the result 1,2.

star(`a', `b')

gives the same result because m4 strips the quotes from a and b as it collects the arguments of star, thenexpands a and b when it evaluates star.

$@ is identical to $ except that each argument in the subsequent invocation is quoted.

define(a, 1) define(b, 2) define(at, `$@') at(`a', `b')

gives the result a,b because the quotes are put back on the arguments when at is evaluated.

$# is replaced by the number of arguments in the subsequent invocation.

define(sharp, `$#') sharp(1, 2, 3)

m4 macro processor

Arguments 310

Page 318: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

gives the result 3,

sharp()

gives the result 1, and

sharp

gives the result 0.

The built−in shift returns all but its first argument. The other arguments are quoted and pushed back onto theinput with commas in between. The simplest case

shift(1, 2, 3)

gives 2,3. As with $@, you can delay the expansion of the arguments by quoting them, so

define(a, 100) define(b, 200) shift(`a', `b')

gives the result b because the quotes are put back on the arguments when shift is evaluated.

Arithmetic built−ins

m4 provides three built−in macros for doing integer arithmetic. incr increments its numeric argument by 1.decr decrements by 1. To handle the common programming situation in which a variable is to be defined as``one more than N'' you would use

define(N, 100) define(N1, `incr(N)')

N1 is defined as one more than the current value of N.

The more general mechanism for arithmetic is a built−in called eval, which is capable of arbitrary arithmeticon integers. Its operators in decreasing order of precedence are

+ (unary)

/ % + == != < <= > >= ! ~ & | ^ && ||

Parentheses may be used to group operations where needed. All the operands of an expression given to evalmust ultimately be numeric. The numeric value of a true relation (like 1 > 0) is 1, and false is 0. The precisionin eval is 32 bits on the UNIX® operating system.

As a simple example, you can define M to be 2N+1 with

m4 macro processor

Arithmetic built−ins 311

Page 319: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

define(M, `eval(2 N+1)')

Then the sequence

define(N, 3) M(2)

gives 9 as the result.

File inclusion

A new file can be included in the input at any time with the built−in macro include:

include(filename)

inserts the contents of filename in place of the macro and its argument. The value of include (its replacementtext) is the contents of the file. If needed, the contents can be captured in definitions and so on.

A fatal error occurs if the file named in include cannot be accessed. To get some control over this situation,the alternate form sinclude (``silent include'') can be used. This built−in says nothing and continues if the filenamed cannot be accessed.

Diversions

m4 output can be diverted to temporary files during processing, and the collected material can be output oncommand. m4 maintains nine of these diversions, numbered 1 through 9. If the built−in macro divert(n) isused, all subsequent output is put onto the end of a temporary file referred to as n. Diverting to this file isstopped by the divert or divert(0) macros, which resume the normal output process.

Diverted text is normally output at the end of processing in numerical order. Diversions can be brought backat any time by appending the new diversion to the current diversion. Output diverted to a stream other than 0through 9 is discarded. The built−in undivert brings back all diversions in numerical order; undivert witharguments brings back the selected diversions in the order given. Undiverting discards the diverted text (asdoes diverting) into a diversion whose number is not between 0 and 9, inclusive.

The value of undivert is not the diverted text. Furthermore, the diverted material is not rescanned for macros.The built−in divnum returns the number of the currently active diversion. The current output stream is 0during normal processing.

System command

Any program can be run by using the syscmd built−in:

syscmd(date)

invokes the UNIX® operating system date command. Normally, syscmd would be used to create a file for asubsequent include.

To make it easy to name files uniquely, the built−in maketemp replaces a string of XXXXX in the argumentwith the process ID of the current process.

m4 macro processor

File inclusion 312

Page 320: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

Conditionals

Arbitrary conditional testing is performed with the built−in ifelse. In its simplest form

ifelse(a, b, c, d)

compares the two strings a and b. If a and b are identical, ifelse returns the string c. Otherwise, string d isreturned. Thus, a macro called compare can be defined as one that compares two strings and returns yes orno, respectively, if they are the same or different:

define(compare, `ifelse($1, $2, yes, no)')

Note the quotes, which prevent evaluation of ifelse from occurring too early. If the final argument is omitted,the result is null, so

ifelse(a, b, c)

is c if a matches b, and null otherwise.

ifelse can actually have any number of arguments and provides a limited form of multiway decisioncapability. In the input

ifelse(a, b, c, d, e, f, g)

if the string a matches the string b, the result is c. Otherwise, if d is the same as e, the result is f. Otherwise,the result is g.

String manipulation

The len macro returns the length of the string (number of characters) in its argument.

len(abcdef)

is 6, and

len((a,b))

is 5.

The substr macro can be used to produce substrings of strings. If you type

substr(s, i, n)

it will return the substring of s that starts at the ith position (origin 0) and is n characters long. If n is omitted,the rest of the string is returned. If you type

substr(`now is the time',1)

it will return the following string:

now is the time

m4 macro processor

Conditionals 313

Page 321: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

If i or n are out of range, a blank line is returned. For example, if you type

substr(`now is the time',−1)

or

substr(`now is the time',1,39)

you will get a blank line.

The index(s1, s2) macro returns the index (position) in s1 where the string s2 occurs, 1 if it does not occur.As with substr, the origin for strings is 0.

translit performs character transliteration and has the general form

translit(s, f, t)

which modifies s by replacing any character in f by the corresponding character in t. Using input

translit(s, aeiou, 12345)

replaces the vowels by the corresponding digits. If t is shorter than f, characters that do not have an entry in tare deleted. As a limiting case, if t is not present at all, characters from f are deleted from s.

translit(s, aeiou)

would delete vowels from s.

The macro dnl deletes all characters that follow it up to and including the next new−line. It is useful mainlyfor throwing away empty lines that otherwise would clutter up m4 output. Using input

define(N, 100) define(M, 200) define(L, 300)

results in a new−line at the end of each line that is not part of the definition. The new−line is copied into theoutput where it may not be wanted. When you add dnl to each of these lines, the new−lines will disappear.Another method of achieving the same result is to input

divert( 1) define(...) ... divert

Printing

The built−in errprint writes its arguments out on the standard error file. An example would be

errprint(`fatal error')

dumpdef is a debugging aid that dumps the current names and definitions of items specified as arguments. Ifno arguments are given, then all current names and definitions are printed.

m4 macro processor

Printing 314

Page 322: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

Linking with the mapfile option

The ELF linker (ld) automatically maps input sections from object files (.o files) to output segments inexecutable files (a.out files). The mapfile option to the ld command allows you to change the default mappingprovided by the ELF linker.

In particular, the mapfile option allows you to:

declare segments and specify values for segment attributes such as segment type, permissions,addresses, length, and alignment

control mapping of input sections to segments by specifying the attribute values necessary in a sectionto map to a specific segment (the attributes are section name, section type, and permissions) and byspecifying which object file(s) the input sections should be taken from, if necessary

declare a global−absolute symbol that is assigned a value equal to the size of a specified segment (bythe linker) and that can be referenced from object files

NOTE: The major purpose of the mapfile option is to allow users of ifiles (an option previously available told that used link editor command language directives) to convert to mapfiles. All other facilities previouslyavailable for ifiles, other than those mentioned above, are not available with the mapfile option.

CAUTION: When using the mapfile option, be aware that you can easily create a.out files that do notexecute. Therefore, the use of the mapfile option is strongly discouraged. ld can produce a correct a.outwithout the use of the mapfile option. The mapfile option is intended for system programming use, notapplication programming use.

This appendix describes the structure and syntax of a mapfile and the use of the −M option to the ldcommand.

Using the mapfile option

To use the mapfile option, you must:

enter mapfile directives into a file (this is your ``mapfile'')1. enter the following option on the ld command line:

−M mapfile

mapfile is the file name of the file you produced in step 1. If the mapfile is not in your currentdirectory, you must include the full path name; no default search path exists. (See the ld manual pagefor information on operation of the ld command.)

2.

CAUTION: The mapfile option can only by used in static mode. The −dn option must accompany the −Moption on the ld command line or ld returns a fatal error.

Linking with the mapfile option 315

Page 323: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

Mapfile structure and syntax

You can enter three types of directives into a mapfile:

segment declarations• mapping directives• size−symbol declarations•

Each directive can span more than one line and can have any amount of white space (including new−lines) aslong as it is followed by a semicolon. You can enter 0 (zero) or more directives in a mapfile. (Entering 0directives causes ld to ignore the mapfile and use its own defaults.) Typically, segment declarations arefollowed by mapping directives. You would declare a segment and then define the criteria by which a sectionbecomes part of that segment. If you enter a mapping directive or size−symbol declaration without firstdeclaring the segment to which you are mapping (except for built−in segments, explained later), the segmentis given default attributes as explained below. This segment is then an ``implicitly declared segment.''

Size−symbol declarations can appear anywhere in a mapfile.

The following subtopics describe each directive type. For all syntax discussions, the following apply:

All entries in constant width, all colons, semicolons, equal signs, and at (@) signs are typed inliterally.

All entries in italics are substitutables.• { ... } means ``zero or more.''• { ... }+ means ``one or more.''• [ ... ] means ``optional.''• section_names and segment_names follow the same rules as C identifiers where a period (.) is treatedas a letter (for example, .bss is a valid name).

section_names, segment_names, file_names, and symbol_names are case sensitive; everything elseis not case sensitive.

Spaces (or new−lines) may appear anywhere except before a number or in the middle of a name orvalue.

Comments beginning with ``#'' and ending at a new−line may appear anywhere that a space mayappear.

Segment declarations

A segment declaration creates a new segment in the a.out or changes the attribute values of an existingsegment. (An existing segment is one that you previously defined or one of the three built−in segmentsdescribed below.)

A segment declaration has the following syntax:

segment_name <= {segment_attribute_value};

For each segment_name, you can specify any number of segment_attribute_values in any order, eachseparated by a space. (Only one attribute value is allowed for each segment attribute.) The segment attributesand their valid values are as follows:

Linking with the mapfile option

Mapfile structure and syntax 316

Page 324: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

segment_type:LOADNOTE

segment flags:?[R][W][X]

virtual_address:Vnumber

physical_address:Pnumber

length:Lnumber

alignment:Anumber

There are three built−in segments with the following default attribute values:

text (LOAD, ?RX, virtual_address, physical_address, length, and alignment values set to defaultsper CPU type)

data (LOAD, ?RWX, virtual_address, physical_address, length, and alignment values set todefaults per CPU type)

note (NOTE)•

ld behaves as if these segments had been declared before your mapfile is read in. See ``Mapfile optiondefaults'' for more information.

Note the following when entering segment declarations:

A number can be hexadecimal, decimal, or octal, following the same rules as in the C language.• No space is allowed between the V, P, L, or A and the number.• The segment_type value can be either LOAD or NOTE.• The segment_type value defaults to LOAD.• The segment_flags values are R for readable, W for writable, and X for executable. No spaces areallowed between the question mark and the individual flags that make up the segment_flags value.

The segment_flags value for a LOAD segment defaults to ?RWX.• NOTE segments cannot be assigned any segment attribute value other than a segment_type.• Implicitly declared segments default to segment_type value LOAD, segment_flags value ?RWX,virtual_address, physical_address, length, and alignment values set to defaults per CPU type.

NOTE: ld calculates the addresses and length of the current segment based on the previous segment'sattribute values. Also, even though implicitly declared segments default to ``no length limit,'' anymachine memory limitations still apply.

LOAD segments can have an explicitly specified virtual_address value and/or physical_addressvalue, as well as a maximum segment length value.

Linking with the mapfile option

Mapfile structure and syntax 317

Page 325: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

If a segment has a segment_flags value of ? with nothing following, the value defaults to notreadable, not writable and not executable.

The alignment value is used in calculating the virtual address of the beginning of the segment. Thisalignment only affects the segment for which it is specified; other segments still have the defaultalignment unless their alignments are also changed.

If any of the virtual_address, physical_address, or length attribute values are not set, ld calculatesthese values as it builds the a.out.

If an alignment value is not specified for a segment, it is set to the built−in default. (The defaultdiffers from one CPU to another and may even differ between kernel versions. You should check theappropriate documentation for these numbers).

If both a virtual_address and an alignment value are specified for a segment, the virtual_addressvalue takes priority.

If a virtual_address value is specified for a segment, the alignment field in the program headercontains the default alignment value.

CAUTION: If a virtual_address value is specified, the segment is placed at that virtual address. Forthe UNIX system kernel this creates a correct result. For files that start via exec(), this method createsan incorrect a.out file because the segments do not have correct offsets relative to their pageboundaries.

Mapping directives

A mapping directive tells ld how to map input sections to segments. Basically, you name the segment thatyou are mapping to and indicate what the attributes of a section must be in order to map into the namedsegment. The set of section_attribute_values that a section must have to map into a specific segment is calledthe entrance criteria for that segment. In order to be placed in a specified segment of the a.out, a section mustmeet the entrance criteria for a segment exactly.

A mapping directive has the following syntax:

segment_name : {section_attribute_value} [: {file_name}+];

For a segment_name, you specify any number of section_attribute_values in any order, each separated by aspace. (At most one section attribute value is allowed for each section attribute.) You can also specify that thesection must come from a certain .o file(s) via the file_name substitutable. The section attributes and theirvalid values are as follows:

section_name:any valid section name

section_type:$PROGBITS$SYMTAB$STRTAB$REL$RELA$NOTE$NOBITS

Linking with the mapfile option

Mapping directives 318

Page 326: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

section_flags:?[[!]A][[!]W][[!]X]

Note the following when entering mapping directives:

You must choose at most one section_type from the section_types listed above. The section_typeslisted above are built−in types. For more information on section_types, see ``Object files''.

The section_flags values are A for allocatable, W for writable, or X for executable. If an individualflag is preceded by an exclamation mark (``!''), the linker checks to make sure that the flag is not set.No spaces are allowed between the question mark, exclamation point(s), and the individual flags thatmake up the section_flags value.

file_name may be any valid file name and can be of the form archive_name(component_name), forexample, /usr/lib/usr/libc.a(printf.o). A file name may be of the formfile_name (see next bulletitem). Note that ld does not check the syntax of file names.

If a file_name is of the formfile_name, ld simulates a basename (see basename(1)) on the filename from the command line and uses that to match against the mapfile file_name. In other words,the file_name from the mapfile only needs to match the last part of the file name from the commandline. (See ``Mapping Example'' below.)

If you use the −l option on the cc or ld command line, and the library after the −l option is in thecurrent directory, you must precede the library with ./ (or the entire path name) in the mapfile in orderto create a match.

More than one directive line may appear for a particular output segment, for example, the followingset of directives is valid:

S1 : $PROGBITS; S1 : $NOBITS;

Entering more than one mapping directive line for a segment is the only way to specify multiplevalues of a section attribute.

A section can match more than one entrance criteria. In this case, the first segment encountered in themapfile with that entrance criteria is used, for example, if a mapfile reads:

S1 : $PROGBITS;S2 : $PROGBITS;

the $PROGBITS sections are mapped to segment S1.

Size−symbol declarations

Size−symbol declarations let you define a new global−absolute symbol that represents the size, in bytes, ofthe specified segment. This symbol can be referenced in your object files. A size−symbol declaration has thefollowing syntax:

segment_name ``@'' symbol_name

symbol_name can be any valid C identifier, although the ld command does not check the syntax of thesymbol_name.

Linking with the mapfile option

Size−symbol declarations 319

Page 327: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

Mapping example

``User−Defined Mapfile'' is an example of a user−defined mapfile. The numbers on the left are included in theexample for tutorial purposes. Only the information to the right of the numbers would actually appear in themapfile.

1. elephant : .bss : peanuts.o popcorn.o;

2. monkey : $PROGBITS ?AX;

3. monkey : .bss;

4. monkey = LOAD V0x80000000 L0x4000;

5. monkey = LOAD;

6. donkey = ?RWX;

7. donkey : .bss;

8. donkey = ?RX A0x1000;

9. text = ?RWX V0x80008000;

User−Defined Mapfile

Four separate segments are manipulated in this example. The implicitly declared segment elephant (line 1)receives all of the .bss sections from the files peanuts.o and popcorn.o. Note thatpopcorn.o matches anypopcorn.o file that may have been entered on the ld command line; the file need not be in the currentdirectory. On the other hand, if /var/tmp/peanuts.o were entered on the ld command line, it would not matchpeanuts.o because it is not preceded by a.

The implicitly declared segment monkey (line 2) receives all sections that are both $PROGBITS andallocatable−executable (?AX), as well as all sections (not already in the segment elephant) with the name .bss(line 3). The .bss sections entering the monkey segment need not be $PROGBITS or allocatable−executablebecause the section_type and section_flags values were entered on a separate line from the section_namevalue. (An "and" relationship exists between attributes on the same line as illustrated by $PROGBITS "and"?AX on line 2. An "or" relationship exists between attributes for the same segment that span more than oneline as illustrated by $PROGBITS ?AX on line 2 "or" .bss on line 3.)

The monkey segment is implicitly declared in line 2 with segment_type value LOAD, segment_flags value?RWX, and default virtual_address, physical_address, length, and alignment values specified per CPU type.In line 4 the segment_type value of monkey is set to LOAD, virtual_address value to 0x80000000 andmaximum length value to 0x4000. In line 5 the segment_type value of monkey is again set to LOAD (sincethe segment_type attribute value does not change, no warning is issued).

Line 6 implicitly declares the donkey segment. The entrance criteria is designed to route all .bss sections tothis segment. Actually, no sections fall into this segment because the entrance criteria for monkey in line 3capture all of these sections. In line 8, the segment_flags value is set to ?RX and the alignment value is set to0x1000 (since the segment_flags value changes, a warning is issued).

Line 9 changes the segment_flags value to ?RWX and the virtual_address value of the text segment to0x80008000 (since the segment_flags value changes, a warning is issued).

The example user−defined mapfile in

Linking with the mapfile option

Mapping example 320

Page 328: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

``User−Defined Mapfile'' is designed to cause warnings for illustration purposes. If you wanted to change theorder of the directives to avoid warnings, the example would appear as follows:

1. elephant : .bss : peanuts.o popcorn.o;

4. monkey = LOAD V0x80000000 L0x4000;

2. monkey : $PROGBITS ?AX;

3. monkey : .bss;

6. donkey = ?RX A0x1000;

5. donkey : .bss;

7. text = V0x80008000;

This order eliminates all warnings.

Mapfile option defaults

The ld command has three built−in segments (text, data, and note) with default segment_attribute_valuesand corresponding default mapping directives as described under ``Segment Declarations.'' Even though the ldcommand does not use an actual ``mapfile'' to store the defaults, the model of a ``default mapfile'' helps toillustrate what happens when the ld command encounters your mapfile.

``Default Mapfile'' shows how a mapfile would appear for the ld command defaults. The ld command beginsexecution behaving as if the mapfile in ``Default Mapfile'' has already been read in. Then ld reads yourmapfile and either augments or makes changes to the defaults.

CAUTION: The interp segment, which precedes all others, and the dynamic segment, which follows thedata segment, are not shown in ``Default Mapfile'' and ``Simple Map Structure'' because you cannotmanipulate them.

text = LOAD ?RX; text : $PROGBITS ?A!W;

data = LOAD ?RWX; data : $PROGBITS ?AW; data : $NOBITS ?AW;

note = NOTE; note : $NOTE;

Default Mapfile

As each segment declaration in your mapfile is read in, it is compared to the existing list of segmentdeclarations as follows:

If the segment does not already exist in the mapfile, but another with the same segment−type valueexists, the segment is added before all of the existing segments of the same segment_type.

1.

If none of the segments in the existing mapfile has the same segment_type value as the segment justread in, then the segment is added by segment_type value to maintain the following order:

INTERP1. LOAD2.

2.

Linking with the mapfile option

Mapfile option defaults 321

Page 329: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

DYNAMIC3. NOTE4.

If the segment is of segment_type LOAD and you have defined a virtual_address value for thisLOADable segment, the segment is placed before any LOADable segments without a definedvirtual_address value or with a higher virtual_address value, but after any segments with avirtual_address value that is lower.

3.

As each mapping directive in your mapfile is read in, the directive is added after any other mapping directivesthat you already specified for the same segment but before the default mapping directives for that segment.

Internal map structure

One of the most important data structures in the ELF−based ld is the map structure. A default map structure,corresponding to the model default mapfile mentioned above, is used by ld when the command is executed.Then, if the mapfile option is used, ld parses the mapfile to augment and/or override certain values in thedefault map structure.

A typical (although somewhat simplified) map structure is illustrated in ``Simple Map Structure''.

The ``Entrance Criteria'' boxes correspond to the information in the default mapping directives and the``Segment Attribute Descriptors'' boxes correspond to the information in the default segment declarations. The``Output Section Descriptors'' boxes give the detailed attributes of the sections that fall under each segment.The sections themselves are in circles.

Simple Map Structure

ld performs the following steps when mapping sections to segments:

Linking with the mapfile option

Internal map structure 322

Page 330: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

When a section is read in, ld checks the list of Entrance Criteria looking for a match. (All specifiedcriteria must match):

In ``Simple Map Structure'', for a section to fall into the text segment it must have asection_type value of $PROGBITS and have a section_flags value of ?A!W. It need nothave the name .text since no name is specified in the Entrance Criteria. The section may beeither X or !X (in the section_flags value) since nothing was specified for the execute bit inthe Entrance Criteria.

If no Entrance Criteria match is found, the section is placed at the end of the a.out file after allother segments. No program header entry is created for this information. See ``Object files''for information on program headers.

1.

When the section falls into a segment, ld checks the list of existing Output Section Descriptors in thatsegment as follows:

If the section attribute values match those of an existing Output Section Descriptor exactly,the section is placed at the end of the list of sections associated with that Output SectionDescriptor.

For instance, a section with a section_name value of .data1, a section_type value of$PROGBITS, and a section_flags value of ?AWX falls into the second Entrance Criteriabox in ``Simple Map Structure'', placing it in the data segment. The section matches thesecond Output Section Descriptor box exactly (.data1, $PROGBITS, ?AWX) and is addedto the end of the list associated with that box. The .data1 sections from fido.o, rover.o, andsam.o illustrate this point.

If no matching Output Section Descriptor is found, but other Output Section Descriptors ofthe same section_type exist, a new Output Section Descriptor is created with the sameattribute values as the section and that section is associated with the new Output SectionDescriptor. The Output Section Descriptor (and the section) are placed after the last OutputSection Descriptor of the same section_type. The .data2 section in ``Simple Map Structure''was placed in this manner.

If no other Output Section Descriptors of the indicated section_type exist, a new OutputSection Descriptor is created and the section is placed so as to maintain the followingsection_type order:

$DYNAMIC$PROGBITS$SYMTAB$STRTAB$RELA$REL$HASH$NOTE$NOBITS

The .bss section in ``Simple Map Structure'' illustrates this point.

NOTE: If the input section has a user−defined section_type value (that is, betweenSHT_LOUSER and SHT_HIUSER) it is treated as a $PROGBITS section. No methodexists for naming this section_type value in the mapfile, but these sections can be redirectedusing the other attribute value specifications (section_flags, section_name) in the entrancecriteria. See ``Sections''.

2.

Linking with the mapfile option

Internal map structure 323

Page 331: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

If a segment contains no sections after all of the command line object files and libraries have beenread in, no program header entry is produced for that segment.

3.

NOTE: Note that input sections of type $SYMTAB, $STRTAB, $REL, and $RELA are used internally byld. Directives that refer to these section_types can only map output sections produced by ld to segments.

Error messages

When using the mapfile option, ld can return the following types of error messages:

Warningsdo not stop execution of the linker nor prevent the linker from producing a viable a.out

Fatal Errorsstop execution of the linker at the point at which the fatal error occurred

Either warning: or fatal: appears at the beginning of each error message (in the C locale). Error messages arenot numbered.

Warnings

The following conditions produce warnings:

a physical_address or a virtual_address value or a length value appears for any segment other than aLOAD segment (the directive is ignored)

a second declaration line exists for the same segment that changes an attribute value(s) (the seconddeclaration overrides the original)

an attribute value(s) (segment_type and/or segment_flags for text and data; segment_type for note)was changed for one of the built−in segments

an attribute value(s) (segment_type, and/or segment_flags) was changed for a segment created by animplicit declaration

Fatal errors

The following conditions produce fatal errors:

specifying more than one −M option on the command line• specifying both the −r and the −M option on the same command line• specifying the −M option without the −dn option on the command line(−dy is the default; you must specify −dn with the −M option)

a mapfile cannot be opened or read• a syntax error is found in the mapfile

NOTE: ld does not return an error if a file_name, section_name, segment_name or symbol_namedoes not conform to the rules under the ``Mapfile Structure and Syntax'' section unless this conditionproduces a syntax error. For instance, if a name begins with a special character and this name is at thebeginning of a directive line, ld returns an error. If the name is a section_name (appearing within the

Linking with the mapfile option

Error messages 324

Page 332: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

directive) ld does not return an error.

more than one segment_type, segment_flags, virtual_address, physical_address, length, oralignment value appears on a single declaration line

you attempt to manipulate either the interp segment or dynamic segment in a mapfile

CAUTION: The interp and dynamic segments are special built−in segments that you cannot changein any way.

a segment grows larger than the size specified by a your length attribute value• a user−defined virtual_address value causes a segment to overlap the previous segment• more than one section_name, section_type, or section_flags value appears on a single directive line• a flag and its complement (for example, A and !A) appear on a single directive line•

Linking with the mapfile option

Error messages 325

Page 333: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

Enhanced asm facilityAlthough the ability to write portable code is one reason for using the C language, sometimes it is necessaryto introduce machine−specific assembly language instructions into C code. This need arises most often withinoperating system code that must deal with hardware registers that would otherwise be inaccessible from C.The asm facility makes it possible to introduce this assembly code.

In earlier versions of C the asm facility was primitive. You included a line that looked like a call on thefunction asm, which took one argument, a string:

asm("assembly instruction here");

Unfortunately this technique has shortcomings when the assembly instruction needs to reference C languageoperands. You have to guess the register or stack location into which the compiler would put the operand andencode that location into the instruction. If the compiler's allocation scheme changed, or, more likely, if the Ccode surrounding the asm changed, the correct location for the operand in the asm would also change. Youwould have to be aware that the C code would affect the asm and change it accordingly.

The new facility presented here is upwardly compatible with old code, since it retains the old capability. Inaddition, it allows you to define asm macros that describe how machine instructions should be generatedwhen their operands take particular forms that the compiler recognizes, such as register or stack variables.

NOTE: Although this enhanced asm facility is easier to use than before, you are still strongly discouragedfrom using it for routine applications because those applications will not be portable to different machines.The primary intended use of the asm facility is to help implement operating systems in a clean way.

The optimizer (cc −O) may work incorrectly on C programs that use the asm facility, particularly when theasm macros contain instructions or labels that are unlike those that the C compiler generates. Furthermore,you may need to rewrite asm code in the future to maximize its benefits as new optimization technology isintroduced into the compilation system.

NOTE: The C++ compiler also supports the enhanced asm facility. The description and considerationspresented here for C also apply to C++.

Definition of terms

The following terms are used in this discussion:

``asm macro''An ``asm macro'' is the mechanism by which programs use the enhanced asm facility. asm macroshave a definition and uses. The definition includes a set of pattern/body pairs. Each pattern describesthe storage modes that the actual arguments must match for the ``asm macro body'' to be expanded.The uses resemble C function calls.

storage mode

Enhanced asm facility 326

Page 334: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

The storage mode, or mode, of an asm macro argument is the compiler's idea of where the argumentcan be found at run time. Examples are ``in a register'' or ``in memory.''

patternA pattern specifies the modes for each of the arguments of an asm macro. When the modes in thepattern all match those of the use, the corresponding body is expanded.

``asm macro body''The ``asm macro body'', or ``body'', is the portion of code that will be expanded by the compiler whenthe corresponding pattern matches. The body may contain references to the formal parameters, inwhich case the compiler substitutes the corresponding assembly language code.

Example

The following example uses a machine with an spl instruction for setting machine interrupt priority levels.spl takes one operand, which must be in a register. Nevertheless, it would be convenient to have a functionthat produces in−line code to set priority levels, uses the spl instruction, and works with register variables orconstants.

This example consists of two parts, the definition of the asm macro, and its use.

Definition

Define an asm macro, called SPL:

asm void SPL(newpri) { % reg newpri; spl newpri % con newpri; movw newpri,%r0 spl %r0 }

The lines that begin with % are patterns. If the arguments at the time the macro is called match the storagemodes in a pattern, the code that follows the pattern line will be expanded.

Use

The table below shows the (assembly) code that the compiler would generate with two different uses of SPL.It uses the following introductory code (along with the above definition):

f() { register int i;

code... matches... generates...

SPL(i); % reg spl %r8

SPL(3); % con movw &3,%r0

spl %r0

Enhanced asm facility

Example 327

Page 335: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

The first use of SPL has a register variable as its argument (assuming that i actually gets allocated to aregister). This argument has a storage mode that matches reg, the storage mode in the first pattern. Thereforethe compiler expands the first code body. Note that newpri, the formal parameter in the definition, has beenreplaced in the expanded code by the compiler's idea of the assembly time name for the variable i, namely%r8. Similarly, the second use of SPL has a constant as its argument, which leads to the compiler's choosingthe second pattern. Here again newpri has been replaced by the assembly time form for the constant, &3.

Using asm macros

The enhanced asm facility allows you to define constructs that behave syntactically like static C functions.Each asm macro has one definition and zero or more uses per source file. The definition must appear in thesame file with the uses (or be #included), and the same asm macro may be defined multiply (and differently)in several files.

The asm macro definition declares a return type for the macro code, specifies patterns for the formalparameters, and provides bodies of code to expand when the patterns match. When it encounters an asmmacro call, the compiler replaces uses of the formal parameters by its idea of the assembly language locationsof the actual arguments as it expands the code body. This constitutes an important difference between Cfunctions and asm macros. An asm macro can therefore have the effect of changing the value of itsarguments, whereas a C function can only change a copy of its argument values.

The uses of an asm macro look exactly like normal C function calls. They may be used in expressions andthey may return values. The arguments to an asm macro may be arbitrary expressions, except that they maynot contain uses of the same or other asm macros.

When the argument to an asm macro is a function name or structure, the compiler generates code to computea pointer to the structure or function, and the resulting pointer is used as the actual argument of the macro.

Definition

The syntactic descriptions that follow are presented in the style used in ``C language compilers''. The syntacticclasses type−specifier, identifier, and parameter−list have the same form as in that topic. A syntacticdescription enclosed in square brackets (``[ ]'') is optional, unless the right bracket is followed by ``+''. A ``+''means ``one or more repetitions'' of a description. Similarly, ``*'' means ``zero or more repetitions.''

asm macro: asm [ type−specifier ] identifier ( [ parameter−list ] ) { [ storage−mode−specification−line

asm−body ] * }

An asm macro consists of the keyword asm, followed by what looks like a C function declaration. Inside themacro body there are one or more pairs of storage−mode−specification−line(s) (patterns) and corresponding``asm−body(ies)''. If the type−specifier is other than void, the asm macro should return a value of the declaredtype.

storage−mode−specification−line: % [ storage−mode [ identifier [ , identifier ]* ] ; ]+

A storage−mode−specification−line consists of a single line (no continuation with \ is permitted) that begins

Enhanced asm facility

Using asm macros 328

Page 336: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

with % and contains the names (identifier(s)) and storage mode(s) of the formal parameters. Modes for allformal parameters must be given in each storage−mode−specification−line (except for error). The % mustbe the first character on a line. If an asm macro has no parameter−list, the storage−mode−specification−linemay be omitted.

Storage modes

These are the storage modes that the compiler recognizes in asm macros.

tregA compiler−selected temporary register.

uregA C register variable that the compiler has allocated in a machine register.

regA treg or ureg.

conA compile time constant.

memA mem operand matches any allowed machine addressing mode, including reg and con.

labA compiler−generated unique label. The identifier(s) that are specified as being of mode lab do notappear as formal parameters in the asm macro definition, unlike the preceding modes. Such identifiersmust be unique.

errorGenerate a compiler error. This mode exists to allow you to flag errors at compile time if noappropriate pattern exists for a set of actual arguments.

For the storage mode %con, asm allows you to impose conditions over the value of the constant using thefollowing operators: <, <=, >, >=, !=, ==, %, and !%. For example:

% con x==0expand if x equals zero.

% con x<16,y==0expand if x is less than 16 and y equals zero.

% con x%4expand if x mod 4 is not zero.

% con x!%4expand if x mod 4 is zero.

Enhanced asm facility

Definition 329

Page 337: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

asm body

The asm body represents (presumed) assembly code that the compiler will generate when the modes for all ofthe formal parameters match the associated pattern. Syntactically, the asm body consists of the text betweentwo pattern lines (that begin with %) or between the last pattern line and the } that ends the asm macro. Clanguage comment lines are not recognized as such in the asm body. Instead they are simply considered partof the text to be expanded.

Formal parameter names may appear in any context in the asm body, delimited by non−alphanumericcharacters. For each instance of a formal parameter in the asm body the compiler substitutes the appropriateassembly language operand syntax that will access the actual argument at run time. As an example, if one ofthe actual arguments to an asm macro is x, an automatic variable, a string like 4(%fp) would be substitutedfor occurrences of the corresponding formal parameter. An important consequence of this macro substitutionbehavior is that asm macros can change the value of their arguments. Note that this is different from standardC semantics.

For lab parameters a unique label is chosen for each new expansion.

If an asm macro is declared to return a value, it must be coded to return a value of the proper type in themachine register that is appropriate for the implementation.

An implementation restriction requires that no line in the asm body may start with %.

Writing asm macros

Here are some guidelines for writing asm macros.

Know the implementation. You must be familiar with the C compiler and assembly language withwhich you are working. You can consult the Application Binary Interface for your machine for thedetails of function calling and register usage conventions.

1.

Observe register conventions. You should be aware of which registers the C compiler normally usesfor scratch registers or register variables. An asm macro may alter scratch registers at will, but thevalues in register variables must be preserved. You must know in which register(s) the compilerreturns function results.

2.

Handle return values. asm macros may ``return'' values. That means they behave as if they wereactually functions that had been called via the usual function call mechanism. asm macros musttherefore mimic C's behavior in that respect, passing return values in the same place as normal Cfunctions. Note that float and double results sometimes get returned in different registers frominteger−type results. On some machine architectures, C functions return pointers in different registersfrom those used for scalars. Finally, structs may be returned in a variety ofimplementation−dependent ways.

3.

Cover all cases. The asm macro patterns should cover all combinations of storage modes of theparameters. The compiler attempts to match patterns in the order of their appearance in the asmmacro definition.

There are two escape mechanisms for the matching process. If the compiler encounters a storagemode of error while attempting to find a matching pattern, it generates a compile time error for thatparticular asm macro call. If the asm macro definition lacks an error storage mode and no patternmatches, the compiler generates a normal function call for a function having the same name as theasm macro. Note that such a function would have to be defined in a different source file, since its

4.

Enhanced asm facility

Definition 330

Page 338: nicolascormier.comnicolascormier.com/documentation/sys-programming/...Table of Contents Introduction to programming in standard C and C++...................................................................................1

name would conflict with that of the asm macro.Beware of argument handling. asm macro arguments are used for macro substitution. Thus, unlikenormal C functions, asm macros can alter the underlying values that their arguments refer to. Alteringargument values is discouraged, however, because doing so would make it impossible to substitute anequivalent C function call for the asm macro call.

5.

asm macros are inherently nonportable and implementation−dependent. Although they make it easierto introduce assembly code reliably into C code, the process cannot be made foolproof. You willalways need to verify correct behavior by inspection and testing.

6.

Debuggers will generally have difficulty with asm macros. It may be impossible to set breakpointswithin the in−line code that the compiler generates.

7.

Because optimizers are highly tuned to the normal code generation sequences of the compiler, usingasm macros may cause optimizers to produce incorrect code. Generally speaking, any asm macro thatcan be directly replaced by a comparable C function may be optimized safely. However, thesensitivity of an optimizer to asm macros varies among implementations and may change with newsoftware releases.

8.

Enhanced asm facility

Definition 331