data representation, data structures, and multi-file compilation
TRANSCRIPT
Data Representation, Data Structures, and Multi-file compilation
Data Representation :Binary representationOctal, HexadecimalData types
Memory concepts
Every piece of information stored on computer is encoded as combination of ones and zeros.
These ones and zeros are called bits.
One byte is a sequence of eight consecutive bits.
A word is some number (typically 4) of consecutive bytes.
Binary representation
bit 0bit 6 bit 5 bit 4 bit 3 bit 2 bit 1bit 7
A single (unsigned) byte of memory
10010111
In decimal representation, this number is:
1*20 + 0*21 + 0*22 + 1*23 + 0*24 + 1*25 + 1*26 + 1*27 = 233
Binary representation
bit 0bit 6 bit 5 bit 4 bit 3 bit 2 bit 1bit 7
A single (signed) byte of memory
1001011+/-
In decimal representation, this number is:
1*20 + 0*21 + 0*22 + 1*23 + 0*24 + 1*25 + 1*26
= +/- 105
One bit must be usedto store sign of number
Binary representation, cont.
What is the range of numbers that can be stored in a single signed/unsigned byte?
How would you write a program to convert an arbitrary base 10 number to binary?
How would you write a program to convert an arbitrary binary number to base 10?
What is the effect of right/left shifting bits (assuming the lost bit is set to zero)?
Octal representation
Octal representation: base 8 Just a simple extension of binary and decimal but using only the digits 0-7.
Best seen with an example:
What is the value of the octal number 711?
1*80 + 1*81 + 7*82 = 457
What is the octal representation of the number 64?
100 (since 0*80 + 0*81 + 1*82 = 64)
Try this in C using the "%o" format expression with printf: printf("%o\n", 457);
Hexadecimal representation
Hexadecimal representation: base 16 Just a simple extension of binary, octal, and decimal but using 16 "digits": 0-9,a,b,c,d,e,f
Example: What is the value of the hexadecimal number 10ef?
15*160 + 14*161 + 0*162 + 1*163 = 4351
Try this in C using the "%x" format expression with printf: printf("%x\n", 4351);
Understanding datatypes at a more fundamental level
int and char
char revisitedBefore doing some example bitwise operations, we first revisit our simple C datatypes to understand them at a deeper level.Recall that we have just a few basic types:
Char, int, float, double
Recall also that char represents a single byte of storage, while int is typically 4 bytes
Important: Do not be misled by the name "char" ; the char datatype is really no different from int (other than its storage capacity)
What do I mean by "no different from int"? We explore this with some examples on the next slide
Char vs. int
Consider the following declarations: int j = 4; char k = 4;
In memory, these appear as:
j
k
0 00 0000000 000000 0 0000000 0 000000 1
0 000000 1
They are both perfectly valid ways to represent the number 4.In one case (int), there is much more "wasted" memory. In the other case (char), there is a much stricter limit on howlarge the number can be if you choose to change it.
Char, cont.
Why would you not always use char to represent a small number, such as 4?
Consider what happens in this case:char j = 4;
j = j + 300; /* bad! Can't store 304 in a char!
So, it is safer to use a larger type, such as int, unless you are 100% sure that the char limit will never be exceeded in the program!
Char as "character" storageSo, if char is just an abbreviated int, what does it have to do with characters?The answer is twofold:
First, char can do nothing special with characters that int can't do.
Both store equivalent ASCII integer code when single quotes are placed around a single character in an assignment
Example: char c = 'e'; /* store the integer (ASCII) code for the character e in the byte c */
Int c = 'e'; /* same as above, but store integer in 4-byte (ie int) sequence.
Char example
The best way to understand this is with a simple example./* char_int1.c */#include <stdio.h>main(){ char c; int j; j = 100; c = 100; /* random choice < 255 */ printf("%d %d\n", j, c); /* print j and c as decimal ints */ printf("%c %c\n", j, c); /* print j and c as characters */ j = 'h'; c = 'h'; /* change assignment */ printf("%c %c\n", j, c); /* what is printed here? */ printf("%d %d\n", j,c); /* print asci code for 'h' */}
#include <stdio.h>int main(int argc, char* argv[]){ int input; if (argc !=2){ printf("%s\n", "Must enter a single argument"); exit(1); } input = atoi(argv[1]); /* grab input as integer */ if (input > 255 || input < 0){ printf("%s\n", "Must enter a number > 0 < 256"); exit(1); } printf("%s: %c\n", "The corresponding character is", input);}
#include<stdio.h>int main(int argc, char* argv[]){ char input; if (argc !=2){ printf("%s\n", "Must enter a single argument"); exit(1); } input = *argv[1]; /* grab single character from keyboard */ printf("%s %c: %d\n", "The ascii code for", input , input);}
Note: We will not understand whythe * needs to be here until we study pointers. However, you should be ablego write an equivalent code using scanf.
Very low-level stuff
Bitwise operations in C
Bitwise operations
C contains six operators for performing bitwise operations on integers:
& Logical AND: if both bits are 1 the result is 1
| Logical OR: if either bit is 1, the result is 1
^ Logical XOR (exlusive OR): if one and only one bit equals 1, the result is 1
~ Logical invert: if the bit is 1, the result is 0; if the bit is 0, the result is 1
<< n Left shift n places
>> n Right shift n places
Bitwise operations
Bitwise operations are considered "low-level" programming by today's standards. For many programs, manipulating individual bits is never necessary.
Sometimes, this level of control is needed for memory or performance optimization
In any case, it is very important for a conceptual understanding of programming
Bitwise examples: AND
Bitwise AND:
Char j = 11; char k = 14;j: 0 0 0 0 1 0 1 1
k: 0 0 0 0 1 1 1 0
---------------------
0 0 0 0 1 0 1 0 = 10
OR
Bitwise OR:
Char j = 11; char k = 14;j: 0 0 0 0 1 0 1 1
k: 0 0 0 0 1 1 1 0
---------------------
0 0 0 0 1 1 1 1 = 15
XOR
Bitwise XOR:
Char j = 11; char k = 14;j: 0 0 0 0 1 0 1 1
k: 0 0 0 0 1 1 1 0
---------------------
0 0 0 0 0 1 0 1 = 5
Shifting
Logical invert:Char j = 11;j: 0 0 0 0 1 0 1 1
~j: 1 1 1 1 0 1 0 0 = 244
Shifting char j = 11;j << 1: 0 0 0 1 0 1 1 0 = 22j >> 1: 0 0 0 0 0 1 0 1 = 5
Data Structures and Algorithms
Sorting
Comes up all the time
Demonstrates important techniques
Can be done many ways Different algorithms.
Bubble Sort
Very simple
Terrible
Go through list, swapping out-of-order neighbors
Continue until no more swaps
Bubble Sort
N = number of items
If first number is initially at bottom of list, have to go through list N times
Each time, looking/maybe swapping N times
Total of N2 operations
S..L..O..W.. for long lists
But if list is very nearly sorted, can be quick.
No one would really use this algorithm.
Insertion sort
About as simple, but better
Way most people sort cards
Keep inserting in order
Still ~N2, but faster on average
Data Structures
Both these methods very array-basedHave to look through half/most/all of list each iteration
Definitely need ~N iterations
Doomed to be fairly slow
For faster techniques, need different ways of looking at data.
Binary Trees
A binary tree is either empty, or consists of a node with a left and a right child.
Left and right children are binary trees
Complete Binary Trees
In a complete binary tree, every node has either 2 or 0 children, and all nodes w/ 0 nodes (`leaf nodes') are on the bottom level.
A complete binary tree with L levels has 2L-1 nodes;
One with N nodes has log
2(N+1) levels
Heaps
A binary tree with values (`keys') stored at each node.
Almost complete binary tree
Partial ordering: root's key is less than either of children, and both children are roots of heaps
Storing a heap in an array
Can easily store a heap in an array
Parent node i has left child (2*i+1) and right child (2*i+2).
Why bother?
Putting things in this partial order easier than sorting
Very easy to find lowest value in data once data is in heap
This is useful:Priority queue
Sorting!
Heap Sort teaser
Get data into heap
Top value is lowest value.
Delete top value; re-heap
Repeat until no more data
Results are sorted list!
Heap Operations:insert
Put # into existing heap:Put number in first available leaf node.
If parent tree no longer a heap, swap.
Then repeat this process until you hit the root.
Heap Operations: delete root
Take bottom-most value from the tree, put it where root used to be
Remove that node.
Go down heap, swapping if node larger than children.
Heap Ops: build heap from data
It's much easier to insert into an existing heap than build one at once.
Single nodes are always heaps!
Start from bottom, working up, inserting parents into heaps.
Repeat until no more data
Notice:
Heap insert/delete operations take ~lg(N) operations (one per level of the tree).
To build heap, each piece of data needs to be put in; ~ N lg N operations
To pull out sorted list, need to do N operations of a delete which takes ~lg N steps; another N lg N operations.
N lg N is much less than N2 for large N!!
Heapsort Algorithm:
Build heap from scratch
For each piece of data,Get root value
Delete from heap
Multiple-File compilation
Why more than one file?
As program gets bigger, having whole program in one file gets quickly awkward.
File hard to read
Takes forever to edit a 1M line file!
Hard to re-use code
Have to re-compile entire program even if just small change in one routine
Compilation vs. Linking
Compilation: compile source code into machine language.
Generates object file (.o)
Linking: bring in code from other libriaries that we might need
Link in code for printf() from std. C library; link in code for sin() from math library, etc.
Generates an executable
Compilation vs. Linking
If all of program is in one file, the distinction isn't important, and gcc will do the compile/link in one step.
Otherwise, do it seperately
Running Average Example
Sort Example