CS 208 s21 — C: Data Representation
Table of Contents
1 C Arrays (and Strings)
Like many programming languages, C provides fixed-length arrays. These arrays occupy a contiguous chunk of memory with the elements in index order.
int a[6]; // declaration, stored at 0x10 // indexing: a[0] = 0x015f; a[5] = a[0]; // No bounds checking a[6] = 0xbad; // writes to 0x28 a[-1] = 0xbad; // writes to 0xc // an "array" is just a pointer to an element int* p; // stored at 0x40 // these two lines are equivalent p = a; p = &a[0]; // write to a[0] *p = 0xa; // see subsection on strings char str[] = "hi!!!!";
This spreadsheet memory diagram shows one way the above code could affect memory.
Note that this is now a 64-bit example, so memory addresses (and thus the pointer p
) are 8 bytes wide.
The row addresses and offsets work the same way as the previous examples, the rows are just 8 bytes instead of 4.
This means the block of 24 bytes holding the 6 int
's in the array a
takes up 3 adjacent rows.
Trace through the code and study the memory diagram—ask any questions you have in Slack!
I've put this into C Tutor here, if you'd like to see it in that context (it doesn't show the writes outside the array bounds).
1.1 C Strings
As previously mentioned, a C string is simply an array of char
(array of 1-byte values) ending with a null terminator (0x00
or ~'\0'~).
The terminator value is crucial because otherwise C has no way of knowing where the string ends.
If a string is missing the null terminator, C will keep reading bytes from memory until it happens to encounter 0x00
.
In memory, characters are represented using their ASCII values (American Standard Code for Information Interchange), which has each letter or symbol correspond to a particular hex value (see the table below or run man ascii
in your terminal).
Thus, the string "hi!!!!"
is represented by the array of bytes
0x68 0x69 0x21 0x21 0x21 0x21 0x00
1.1.1 The C string library
#include <string.h>
gives you access to some useful functions for working with C strings- http://cplusplus.com/reference is an excellent source of documentation
strlen(s)
returns the length of the string (char *
)s
, not counting the null terminatorstrcpy(dest, src)
copies the string atdest
tosrc
(both arguments are of typechar *
(pointers to the start of achar
array)strcpy
provides no protection ifsrc
is longer than destination, it will simply overwrite whatever is in memory after the end of thedest
array- fortunately,
strncpy
lets you provide a maximum number of characters to copy
2 C Structures
While there are no objects in C, the programmer can still create new data types.
The two ways to create data types in C are structures (struct
) and unions (union
) (we won't worry about unions in this course).
A struct
is a way of combining different types of data:
// a way of combining different types of data together struct song { char *title; int length_in_seconds; int year_released; }; struct song song1; song1.title = "What is Urinetown?"; // probably my favorite musical song1.length_in_seconds = 213; song1.year_released = 2001;
- variable declarations like any other type:
struct name name1, *pn, name_ar[3];
- access fields with
.
or->
in the case of the pointer (p->field
is shorthand for(*p).field)
- like arrays, struct elements are stored in a contiguous region with a pointer to the first byte
sizeof(struct song)
? 16 bytes- compiler maintains the byte offset information needed to access additional fields
- e.g., the compiler knows that
length_in_seconds
is stored 8 bytes after the start of a song struct and generates assembly code appropriately
- e.g., the compiler knows that