csci 330: programming language concepts instructor: pranava k. jha data types-ii: composite data...

24
CSCI 330: Programming Language Concepts Instructor: Pranava K. Jha Data Types-II: Composite Data Types

Upload: james-winnett

Post on 14-Dec-2015

223 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CSCI 330: Programming Language Concepts Instructor: Pranava K. Jha Data Types-II: Composite Data Types

CSCI 330: Programming Language ConceptsInstructor: Pranava K. Jha

Data Types-II: Composite Data Types

Page 2: CSCI 330: Programming Language Concepts Instructor: Pranava K. Jha Data Types-II: Composite Data Types

Agenda

1. Records and Variant Records2. Arrays3. Strings4. Sets5. Pointers And Recursive Types6. Lists

Page 3: CSCI 330: Programming Language Concepts Instructor: Pranava K. Jha Data Types-II: Composite Data Types

Records

• Allow related data of heterogeneous types to be stored and manipulated together

• Usually laid out contiguously• Possible holes for alignment reasons• Smart compilers may rearrange fields to minimize holes (C

compilers promise not to)• Different terms in

– Algol 68, C, C++, and Common Lisp: struct– Java, C++, C#: class– Pascal: record– ML, Python, Ruby: lists (no keyword for the declaration)

Page 4: CSCI 330: Programming Language Concepts Instructor: Pranava K. Jha Data Types-II: Composite Data Types

Examples

In Pascal:type two_chars = packed array [1..2] of char;type element = record

name: two_chars;atomic_number: integer;atomic_weight: real;metallic: Boolean

end;

In C:

struct element {

char name[2];int atomic_number;double atomic_weight;bool metallic;

};

Page 5: CSCI 330: Programming Language Concepts Instructor: Pranava K. Jha Data Types-II: Composite Data Types

Memory Layout of Records

Likely layout in memory for objects on a 32-bit machine

Alignment restrictions lead to the shaded “holes.”

Page 6: CSCI 330: Programming Language Concepts Instructor: Pranava K. Jha Data Types-II: Composite Data Types

Packed Records

Pascal allows the programmer to specify that a record type (or an array, set, or file type) should be packed:

type element = packed recordname : two_chars;atomic_number : integer;atomic_weight : real;metallic : Boolean

end;

Page 7: CSCI 330: Programming Language Concepts Instructor: Pranava K. Jha Data Types-II: Composite Data Types

Memory Layout of Packed Records

Likely memory layout for packed records.

The atomic_number and atomic_weight fields are nonaligned, and can only be read or written via multi-instruction sequences.

Page 8: CSCI 330: Programming Language Concepts Instructor: Pranava K. Jha Data Types-II: Composite Data Types

Memory Layout of Rearranged Records

Rearranging record fields to minimize holes.

By sorting fields according to the size of their alignment constraint, a compiler can minimize the space devoted to holes, while keeping the fields aligned.

Page 9: CSCI 330: Programming Language Concepts Instructor: Pranava K. Jha Data Types-II: Composite Data Types

Variant RecordsA variant record provides two or more alternative fields or collections of fields, only one of which is valid at any given time.type element = record

name : two_chars;atomic_number : integer;atomic_weight : real;metallic : Boolean;case naturally_occurring : Boolean of

true : ( source : string_ptr; prevalence : real;);false : ( lifetime : real;)

end;

Page 10: CSCI 330: Programming Language Concepts Instructor: Pranava K. Jha Data Types-II: Composite Data Types

Memory Layout of Variants

Likely memory layouts for element variants.

The value of the naturally occurring field (shown here with a double border) determines which of the interpretations of the remaining space is valid. Type string_ptr is assumed to be represented by a (four-byte) pointer to dynamically allocated storage.

Page 11: CSCI 330: Programming Language Concepts Instructor: Pranava K. Jha Data Types-II: Composite Data Types

Arrays

• Arrays are the most common and important composite data types

• Unlike records, which group related fields of disparate types, arrays are usually homogeneous

• Semantically, they can be thought of as a mapping from an index type to a component or element type

• A slice or section is a rectangular portion of an array.

Page 12: CSCI 330: Programming Language Concepts Instructor: Pranava K. Jha Data Types-II: Composite Data Types

ArraysArray slices(sections) in Fortran90. Much like the values in the header of an enumeration-controlled loop (Section6.5.1), a: b: c in a subscript indicates positions a, a+c, a+2c, ...through b. If a or b is omitted, the corresponding bound of the array is assumed. If c is omitted, 1 is assumed. It is even possible to use negative values of c in order to select positions in reverse order. The slashes in the second subscript of the lower right example delimit an explicit list of positions.

second subscript of the lower right example delimit an explicit list of positions.

Page 13: CSCI 330: Programming Language Concepts Instructor: Pranava K. Jha Data Types-II: Composite Data Types

Arrays

Dimensions, Bounds, and Allocation• Global lifetime, static shape: allocate space for the array in static global

memory• Local lifetime, static shape: space can be allocated in the subroutine’s

stack frame at run time• Local lifetime, shape bound at elaboration time: an extra level of

indirection is required to place the space for the array in the stack frame of its subroutine (Ada, C)

• Arbitrary lifetime, shape bound at elaboration time: at elaboration time either space is allocated or a preexistent reference from another array is assigned (Java, C#)

• Arbitrary lifetime, dynamic shape: must generally be allocated from the heap. A pointer to the array still resides in the fixed-size portion of the stack frame (if local lifetime).

Page 14: CSCI 330: Programming Language Concepts Instructor: Pranava K. Jha Data Types-II: Composite Data Types

Memory Layout of Arrays

• Arrays in most language implementations are stored in contiguous locations in memory

• Like Records, arrays may contain “holes” due to alignment requirement

• Some languages (e.g., Pascal) allow the programmer to specify that an array be packed

• For multidimensional arrays, there are two layouts: row-major order and column-major order– In row-major order, consecutive locations in memory hold elements

that differ by one in the final subscript (except at the ends of rows).– In column-major order, consecutive locations hold elements that

differ by one in the initial subscript

Page 15: CSCI 330: Programming Language Concepts Instructor: Pranava K. Jha Data Types-II: Composite Data Types

Row- and Column-major Layout

Page 16: CSCI 330: Programming Language Concepts Instructor: Pranava K. Jha Data Types-II: Composite Data Types

Strings

• Strings are really just arrays of characters• They are often special-cased, to give them

flexibility (like polymorphism or dynamic sizing) that is not available for arrays in general.– It's easier to provide these things for strings

than for arrays in general because strings are one-dimensional and (more important) non-circular.

Page 17: CSCI 330: Programming Language Concepts Instructor: Pranava K. Jha Data Types-II: Composite Data Types

Strings

• In some languages, strings have special status, with operations that are not available for arrays of other sorts. – It is easier to provide special features for strings than for arrays in

general, because strings are one-dimensional.– Manipulation of variable-length strings is fundamental to a huge

number of computer applications.• Particularly powerful string facilities are found in various scripting

languages such as Perl, Python and Ruby.• C, Pascal, and Ada require that the length of a string-valued variable be

bound no later than elaboration time, allowing the variable to be implemented as a contiguous array of characters in the current stack frame.

• Lisp, Icon, ML, Java, C# allow the length of a string-valued variable to change over its lifetime, requiring that the variable be implemented as a block or chain of blocks in the heap.

Page 18: CSCI 330: Programming Language Concepts Instructor: Pranava K. Jha Data Types-II: Composite Data Types

Sets

• A set is an unordered collection of an arbitrary number of distinct values of a common type.

• Introduced by Pascal, and are found in many more recent languages as well.

• Many ways to implement sets, including arrays, hash tables, and various forms of trees.

• The most common implementation employs a bit vector whose length (in bits) is the number of distinct values of the base type.– Operations on bit-vector sets can make use of fast logical instructions

on most machines.– Union is bit-wise or; intersection is bit-wise and; difference is bit-wise

not, followed by bit-wise and.

Page 19: CSCI 330: Programming Language Concepts Instructor: Pranava K. Jha Data Types-II: Composite Data Types

Pointers And Recursive Types

• A recursive type is one whose objects may contain one or more references to other objects of the type.

• Pointers serve two purposes:– Efficient (and sometimes intuitive) access to elaborated objects (as in

C).– Dynamic creation of linked data structures, in conjunction with a heap

storage manager.

• In languages like C, Pascal, or Ada, which use a value model of variables, recursive types require the notion of a pointer. (Pointers aren't needed with a reference model.)

• In some languages (e.g., Pascal, Ada 83, and Modula-3), pointers are restricted to point only to objects in the heap.

Page 20: CSCI 330: Programming Language Concepts Instructor: Pranava K. Jha Data Types-II: Composite Data Types

Pointers (contd.)

• A dangling reference is a live pointer that no longer points to a valid object.

• Two sources of dangling pointers:– A pointer in a wider scope still refers to a local object of a subroutine

that has returned.– the programmer reclaims an object to which pointers still refer.

• Two implementation mechanisms to catch dangling pointers:– Tombstones– Locks and Keys

Page 21: CSCI 330: Programming Language Concepts Instructor: Pranava K. Jha Data Types-II: Composite Data Types

Tombstones

• Tombstones are a mechanism to detect dangling pointers that can appear in certain computer programming languages, e. g. C, C++ and assembly languages, and to act as a containment to their dangerous effects– The idea is simple: Rather than have a pointer refer to an object

directly, introduce an extra level of indirection.– When an object is allocated, the language run-time system allocates a

tombstone.– The pointer contains the address of the tombstone; the tombstone

contains the address of the object.– When the object is reclaimed, the tombstone is modified to contain a

value that cannot be a valid address.

Page 22: CSCI 330: Programming Language Concepts Instructor: Pranava K. Jha Data Types-II: Composite Data Types

• Every pointer is a tuple consisting of an address and a key.– Every object in the heap begins with a lock– A pointer to an object in the heap is valid only if the key in

the pointer matches the lock in the object.– When the run-time system allocates a new heap object, it

generates a new key value– When an object is reclaimed, its lock is changed to some

arbitrary value (e.g., zero) so that the keys in any remaining pointers will not match

Page 23: CSCI 330: Programming Language Concepts Instructor: Pranava K. Jha Data Types-II: Composite Data Types

Garbage Collection

• The language implementation notices when objects are no longer useful and reclaim them automatically

• More or less essential for functional languages– delete is a very imperative sort of operation– The ability to construct and return arbitrary objects from functions

requires unlimited extent and hence heap allocation to accommodate it

• Popular for imperative languages as well; e.g., in Clu, Cedar, Modula-3, Java, C#, and all the major scripting languages.

• A typical tradeoff between convenience and safety on the one hand and performance on the other.

Page 24: CSCI 330: Programming Language Concepts Instructor: Pranava K. Jha Data Types-II: Composite Data Types

Lists

• Defined recursively as either the empty list or a pair consisting of an object (which may be either a list or an atom) and another (shorter) list

• Ideally suited to programming in functional and logic languages.

• Several scripting languages, notably Perl and Python, provide extensive list support