CS 208 s21 — C: Data Representation

Table of Contents

1 C Arrays (and Strings)

Like many programming languages, C provides fixed-length arrays. These arrays occupy a contiguous chunk of memory with the elements in index order.

int a[6]; // declaration, stored at 0x10

// indexing:
a[0] = 0x015f; 
a[5] = a[0];

// No bounds checking
a[6] = 0xbad; // writes to 0x28
a[-1] = 0xbad; // writes to 0xc

// an "array" is just a pointer to an element 
int* p; // stored at 0x40

// these two lines are equivalent
p = a;
p = &a[0];

// write to a[0]
*p = 0xa;

// see subsection on strings
char str[] = "hi!!!!";

This spreadsheet memory diagram shows one way the above code could affect memory. Note that this is now a 64-bit example, so memory addresses (and thus the pointer p) are 8 bytes wide. The row addresses and offsets work the same way as the previous examples, the rows are just 8 bytes instead of 4. This means the block of 24 bytes holding the 6 int's in the array a takes up 3 adjacent rows. Trace through the code and study the memory diagram—ask any questions you have in Slack!

I've put this into C Tutor here, if you'd like to see it in that context (it doesn't show the writes outside the array bounds).

1.1 C Strings

As previously mentioned, a C string is simply an array of char (array of 1-byte values) ending with a null terminator (0x00 or ~'\0'~). The terminator value is crucial because otherwise C has no way of knowing where the string ends. If a string is missing the null terminator, C will keep reading bytes from memory until it happens to encounter 0x00.

In memory, characters are represented using their ASCII values (American Standard Code for Information Interchange), which has each letter or symbol correspond to a particular hex value (see the table below or run man ascii in your terminal).

ascii.png

Thus, the string "hi!!!!" is represented by the array of bytes

0x68	0x69	0x21	0x21	0x21	0x21	0x00

1.1.1 The C string library

  • #include <string.h> gives you access to some useful functions for working with C strings
  • http://cplusplus.com/reference is an excellent source of documentation
  • strlen(s) returns the length of the string (char *) s, not counting the null terminator
  • strcpy(dest, src) copies the string at dest to src (both arguments are of type char * (pointers to the start of a char array)
    • strcpy provides no protection if src is longer than destination, it will simply overwrite whatever is in memory after the end of the dest array
    • fortunately, strncpy lets you provide a maximum number of characters to copy

2 C Structures

While there are no objects in C, the programmer can still create new data types. The two ways to create data types in C are structures (struct) and unions (union) (we won't worry about unions in this course). A struct is a way of combining different types of data:

// a way of combining different types of data together
struct song {
    char *title;
    int length_in_seconds;
    int year_released;
};
struct song song1;
song1.title = "What is Urinetown?";  // probably my favorite musical
song1.length_in_seconds = 213;
song1.year_released = 2001;
  • variable declarations like any other type: struct name name1, *pn, name_ar[3];
  • access fields with . or -> in the case of the pointer (p->field is shorthand for (*p).field)
  • like arrays, struct elements are stored in a contiguous region with a pointer to the first byte
    • sizeof(struct song)? 16 bytes
    • compiler maintains the byte offset information needed to access additional fields
      • e.g., the compiler knows that length_in_seconds is stored 8 bytes after the start of a song struct and generates assembly code appropriately