Intro to Memory and Arrays in C
I wrote a C language tutorial for Cocoa Dev Central a ways back, but I didn't get into arrays or memory to keep the tutorial approachable. I'll write a formal follow-up soon, but I decided to post has some raw materials in the meantime, if just to get corrections in early. So here's a whirlwind tour of arrays and basic C memory. The next entry will discuss more advanced memory techniques.If you're not already comfortable with basic C syntax, I strongly recommend reading the C Language Tutorial for Cocoa at Cocoa Dev Central first.
Basic Memory
The common unit of measurement for memory is the byte. Each type of variable in C consumes a certain number of bytes. For example, a standard int variable generally needs 4 bytes of memory. This can, however, vary by OS, processor, and so on. Anyway, let's say I declare a simple int variable:
int monstersUnderBed;
On a G4 running Tiger, this variable uses 4 bytes of memory. This memory can come from one of two places: the data segment or the stack. These terms aren't exactly intuitive, but the difference is incredibly simple.
If a variable is declared outside of a function, it is typically considered "global," meaning that any function can access it. Global variables are stored in a special area in the program's memory called the data segment. This memory is in use for as long as the program is running.
The stack is used to store variables that are only used inside of a function. Stack memory is temporary. Once a function finishes running, all of the memory for its variables is freed. This cycle happens each time the function is called. There is an exception to this, but we're not there quite yet.
Here's a simple program that uses both a global variable and a stack variable (I need something to count and I just watched Monsters, Inc. the other night, thus the monsters):
#include <stdio.h>
// this global variable resides in the data segment
int globalMonsters = 2;
void addFourMonsters ()
{
// this variable uses memory in the stack
int stackMonsters = 4;
// we add the value of the stack variable
// to the global variable
globalMonsters += stackMonsters;
}
main ()
{
printf ("Global monsters: %i\n", globalMonsters);
addFourMonsters();
printf ("Global monsters: %i\n", globalMonsters);
addFourMonsters();
printf ("Global monsters: %i\n", globalMonsters);
}
This gives us some output like this:
Global monsters: 2
Global monsters: 6
Global monsters: 10
All that's going on here is we're creating a stackMonsters variable and adding the value of it to the globalMonsters variable. Not rocket science.
Simple values like ints, floats and single char variables don't need any special management. They'll automatically be cleaned up at the end of the function or when the program exits.
Arrays
If you're used to Cocoa's NSArray class or arrays in a scripting language like JavaScript, you'll be amazed at how primitive C's arrays are. They're literally just a series of individual values. In a basic case, you create an array of a fixed size, like this:
int myIntArray[5];
This declares an array which holds five integer values, so it takes as much memory as five individual ints:
int (4 bytes) x 5 = 20 bytes
The C language doesn't provide any way to resize basic arrays. You can do it manually, but it requires a slightly more advanced understanding of C memory management that we haven't touched on yet.
There are libraries that will do array management for you (and C++ has its own solution), but the basic C array is the one that's used most widely. Setting values in an array is dead simple:
int myArray[5];
myArray[0] = 99;
myArray[1] = 120;
myArray[2] = 22;
myArray[3] = 8287;
myArray[4] = 0;
Although you can't change the size of the array, you can change the values of arrays element at any time, in any order. It doesn't have to be sequential as shown above.
Loops for Processing Arrays
Once you have an array, you can use a loop to easily process it. The following code creates a five-element array in stack memory, and sets a random value for each slot.
#include <stdio.h>
#include <time.h>
#define COUNT 5
main ()
{
// seed the random number generator with the
// current time to get the ball rolling
srand( time(NULL) );
// this array uses stack memory because it's
// declared inside of a function. The array
// size is set by the COUNT constant
int stackArray[COUNT];
int i;
// loop through and insert a random value
// returned from the rand() function
for ( i = 0; i < COUNT; i ++ ) {
stackArray[i] = rand();
}
// loop through and print out the values at
// each slot in the array
for ( i = 0; i < COUNT; i ++ ) {
printf ("Value %i: %i\n", i, stackArray[i]);
}
}
This gives us output similar to following (remember, the values are random):
Value 0: 204319905
Value 1: 178291782
Value 2: 810292509
Value 3: 1392393136
Value 4: 822135393
The stackArray variable uses stack memory, so the cleanup is automatic. The memory will be freed when the function ends.
An array doesn't have to be of a predetermined size. A relatively recent advancement in common C programming is the "variable length array." I haven't run any statistics, but my guess is that a number of C books on the shelves today probably don't actually mention this technique.
The basic idea behind variable length arrays is that the size of the array can be determined on the fly. For example, here's a simple program that creates an array of a random size:
#include <stdio.h>
#include <time.h>
main ()
{
// seed the random number generator with the
// current time, then get a random number
srand(time(NULL));
int randomNumber = rand() % 100;
// create an array of a random size
int myArray[randomNumber];
printf("Array size: %i slots\n", randomNumber);
}
The output is something like this:
Array size: 63 slots
This may seem like no big deal if you're used to scripting languages, but given that C is a lower-level language, this is pretty cool.
C Strings Are Arrays
In C, a string is an array of char values. As a result, it has to follow all the rules of an array. Here's a simple example:
char siteName[10] = "Theocacao";
What might be surprising here is that I made a ten element array, even though the word "Theocacao" is only nine characters. In C, a string has to be "capped" with a special null character: '\0'. This is known as a "null-terminated string". If you build up the string one character at a time, it looks like this:
// declare the array
char siteName[10];
// add the characters
siteName[0] = 'T';
siteName[1] = 'h';
siteName[2] = 'e';
siteName[3] = 'o';
siteName[4] = 'c';
siteName[5] = 'a';
siteName[6] = 'c';
siteName[7] = 'a';
siteName[8] = 'o';
// add the null terminator to complete the string
// display string using %s in printf()
siteName[9] = '\0';
printf ("Site name: %s", siteName);
So an array for a C string always needs to be at least as long as the character count, plus one additional slot for the null terminator. That's why "Theocacao" needs ten slots, not nine.
If you hardcode the string in the program, you can leave both the element count and the null terminator out, so this is fine as well:
char siteName[] = "Theocacao";
The compiler will fill in the correct size at build time.
There are quite a few built-in functions that C provides for dealing with strings, but we'll leave that for another post. You can check out /usr/include/string.h in the meantime if you feel adventurous.
Note: "Sven-S. Porst" points out in the comments that saying a string is "just an array of chars" is an oversimplification. What he says is true, but the goal here is to reduce the basic concepts down to their simplest levels, then build on them later.
Wrap Up
This was a very quick introduction to some intermediate concepts in C programming. So now you should know a thing or two about arrays, as well as the difference between global and stack variables. The follow-up to this post will discuss pointers and dynamic memory management.
[Update: A terminology issue was fixed thanks to a gdb tutorial by Peter Jay Salzman.]
Intro to Memory and Arrays in C
Posted Feb 21, 2006 — 46 comments below
Posted Feb 21, 2006 — 46 comments below
ssp — Feb 21, 06 803
I guess I've seen my address being broken by too many pieces of software.
Scott Stevenson — Feb 21, 06 804
One step at a time there, dude. :) You can't teach everything at once.
Carl — Feb 21, 06 805
new
, right?Scott Stevenson — Feb 21, 06 806
You're at least part right, and you've exposed a mistake in the tutorial. The heap is where malloc gets its memory, but global variables are actually stored in the data segment. I believe the 'new' bit you refer to is specific to C++ objects.
Tom Bradford — Feb 21, 06 807
Frank McCabe — Feb 21, 06 808
However, I think that there should be a giant health warning attached to this:
the total size of a stack allocated array is limited by the maximum size of the stack. Typically, there is *no* warning given if you exceed this size.
e.g.:
foo(int len)
{
int array[len];
}
if you call foo with (say) 1200000, then, at least under gcc, the array will be silently given a garbage value and you use it at your own (and your customer's) risk. It is not clear what the maximum safe size of a dynamically sized array is, but gcc seems to limit it to 64K bytes.
The same applies to alloca'd memory - if its too big you get silent garbage.
Scott Stevenson — Feb 21, 06 810
For better or worse, there's a large quantity of code that uses this approach -- perhaps most notably, many of the examples on ADC. There's nothing to be gained by pretending that's not the case. But maybe I'll add a few more notes on the subject.
I'd probably just eliminate the 'string' talk altogether, because unfortunately, there are just too many different ways of manipulating the string concept in the C/C++ world
I can respect what you're saying, but I just don't agree with the conclusion. It's a matter of walking before you can run.
Scott Stevenson — Feb 21, 06 811
I haven't really used these things a lot so I wasn't aware of that. I'll update the text. Thanks.
Stripes — Feb 22, 06 812
Scott Stevenson — Feb 22, 06 813
I believe the bss is sometimes also called the uninitialized data segment.
Carl — Feb 22, 06 815
I realized that after I wrote. I've never actually used pure C.
Jon — Feb 24, 06 833
Peter Ulvskov — May 28, 06 1338
Has a variation of NSMutableArray been developed that allows this? If not, any advice on how to accomplish this?
Thanks ,
Peter
Scott Stevenson — May 29, 06 1342
You can just use -indexOfObject:
Narayan — Aug 09, 06 1532
Rama Rao B. — Sep 13, 06 1784
deepika — Sep 13, 06 1785
Tim — Oct 21, 06 2119
Right.
Typically, there is *no* warning given if you exceed this size.
Not at compile time, but, depending on the C implementation, you may receive a runtime error when you allocate or initialize the array (remember that you must initialize all stack variables). On a Unix-like system that provides unmapped address space (guard pages) between stack segments, such as Mac OS X, you will typically receive a Segmentation Fault. In an embedded system that does not provide memory protection, you may receive no error at all.
Consider vla.c:
#include <stdio.h> #include <stdlib.h> #include <string.h> int main(int argc, char* argv[]) { if (argc < 2) { fprintf(stderr, "usage: %s byte-count\n", argv[0]); return EXIT_FAILURE; } unsigned long sz = strtoul(argv[1], NULL, 0); char vla[sz]; printf("Array size: %#lx\n", sizeof vla); memset(vla, 0, sizeof vla); return EXIT_SUCCESS; }
Here is the execution log on a G5 Quad running Mac OS X 10.4.8. Note that in some cases you receive the Segmentation Fault upon allocation (before the printf() call) and in some cases you receive the fault during initialization (after the printf() call). This behavior may vary on an Intel system.
% gcc -std=c99 -Wall -Wextra -pedantic -o vla vla.c % ./vla 0 Array size: 0 % ./vla 1 Array size: 0x10000 % ./vla 0x100000 Array size: 0x100000 % ./vla 0x1000000 Segmentation fault % ./vla 0x10000000 Array size: 0x10000000 Segmentation fault % ./vla 0x100000000 Array size: 0xffffffff Segmentation fault
if you call foo with (say) 1200000, then, at least under gcc, the array will be silently given a garbage value and you use it at your own (and your customer's) risk. It is not clear what the maximum safe size of a dynamically sized array is, but gcc seems to limit it to 64K bytes.
Not sure what is meant by "silently given a garbage value." All stack variables have garbage values until you initialize them. The value 1200000 works fine with the vla.c given above, but this is system and program dependent.
% ./vla 1200000 Array size: 0x124f80
The maximum safe size of a dynamically sized array is at least as large as the maximum safe size of an equivalent statically sized array, and possibly larger. It is both system and program dependent: it depends on the system's maximum stack size (for Unix see getrlimit(2)) and your thread's worst-case stack usage (which is typically not easily determined, and is further complicated by dynamic allocation of stack space).
I have seen no 64K limitation with GCC. The default stack limit on my system is 8MB. As expected, vla.c fails with arrays that are close to 8MB in size (there is some initial stack usage, between 4K and 32K). Increasing the stack limit to 64MB works as expected. [/p]
% ./vla 0x700000 Array size: 0x700000 % ./vla 0x800000 Segmentation fault % ./vla 0x7ff000 Segmentation fault % ./vla 0x7f8000 Array size: 0x7f8000 % limit stacksize stacksize 8192 kbytes % ./vla 0x800000 Segmentation fault % limit stacksize unlimited % limit stacksize stacksize 65536 kbytes % ./vla 0x800000 Array size: 0x800000 % ./vla 0x3f00000 Array size: 0x3f00000 % ./vla 0x4000000 Segmentation fault %
Tim — Oct 21, 06 2120
Scott Stevenson — Oct 21, 06 2121
You're the first person that's tried to actually use the formatting to that extent. :) I think the main issue was that code wasn't preformatted for white space, but it is now.
Steve Sadler — Nov 27, 07 5138
I see a lot of people nit picking your page. It helped me understand, so i thank you.
A different steve
Rico Secada — Dec 29, 07 5301
What you are saying in this tutorial is only partly right. There is no such thing as the data segment or the stack in C! And this is a serious mistake.
The whole point of a high-level language like C is to avoid thinking like a person who is programming in assembler.
The C standard doesn't say anything about this. Some platforms follow the model you suggest, others don't.
If you actually need to know about the data segment or the stack, then you're outside the realm of C programming.
C has such things as automatic storage, static storage, dynamic
storage, which in turn have the features ascribed to them in the C
standard.
Apart from that - great tutorial!
Best regards, Rico.
Stripes — Dec 31, 07 5302
If you know how C's stack works on your CPU finding stack smashing bugs will be far simpler. If you know how your libc's malloc works finding "use after free", or whatever.
Using this sort of knowledge for "good" will help you debug. A whole lot.
Using it for evil (I know I can use memory after free as long as I don't malloc again, I know mallocs are rounded up to 16 bytes, I know the stack grows down, I know...) will create awesome bugs when somebody tries to use your code on a new CPU, new OS, new C compiler, or with a new libc. If there is any justice in the world the "somebody" will be you, but sadly it isn't always.
I'm on the fence about how much you can use it to optimize (speed or storage size) for without being good or evil :-)
Rico Secada — Jan 06, 08 5317
You are missing the point and you are creating confusion in people who are not skilled in C and who needs to understand these issues right :-)
You should correct the tutorial to conform with standard C and address the issues as automatic storage, static storage and dynamic
storage not as "the data segment" or "the stack". We are not dealing with assembler and this tutorial should not be using those terms regarding C and memory management.
If you know how C's stack works on your CPU finding stack smashing bugs will be far simpler. If you know how your libc's malloc works finding "use after free", or whatever.
Using this sort of knowledge for "good" will help you debug. A whole lot.
This is not within the scope of C. If you need to know exactly how "the stack" works, you need to stop working with C and start working with assembler.
Best regards.
Rico.
Holger — Feb 09, 08 5468
Holger
Ishan — Feb 25, 08 5568
mahe — Apr 16, 08 5739
Zack Jordan — Jun 17, 08 6073
Andrew — Nov 04, 08 6522
Pete — Mar 25, 09 6672
Get your own blog. I'm sorry. I see your point, but you're being hypervigilant. Let me provide my perspective.
I started programming in LOGO, then with OneClick, tried to learn C, failed, then learned ActionScript 1, 2 and 3, along with some PHP and MySQL. I am an advanced user of ActionScript now after 9 years in Flash / Flex.
I recently had a client that insisted on gobs of video embedded in a single file (72 to be exact) and my SWF compile times were MASSIVE. At one point, I tipped the scale by adding a single vector rectangle to my symbol library. From that point on, Flash crashed on export, and Flex would mutilate the visual assets.
I later found, after much research, that the problem was that, because Flash uses the Java runtime on the host machine, it's subject to Java's memory situation as well. So what was happening was all my behemoth assets were causing a stack overflow in the JRE. After an obscure trip through the land of "modify your JRE settings with a custom hidden XML file", Flash resumed compiling the library without a hitch.
SO... it's erroneous and a disservice to say that understanding the stack and global data storage are irrelevant to C (i'm using my own vernacular just to peeve a particular individual, because by the time you're programming in C, you'd have to be a clod to not get it when someone explains these things as clearly as Scott does). I guess you're going to say that understanding how data is stored with memory addresses is irrelevant too... because I spent DAYS many years ago wondering why in the hell the CodeWarrior was taking two ints, 2 and 2, and adding them together to get 80 million and something. Why? Because I was adding the addresses, not the values.
If you're programming C, you don't need to know assembly, but you damn well better know what it is, and you damn well better know some crazy $#!+ about memory management, because this stuff is NOT simple, and it IS obscure, and you DO need every iota and perspective possible to benefit and learn. I am finally really learning C and C++ as well as Objective C / C++ to make iPhone games, and I LOVE IT. Though my first attempt at C failed, I can definitely say I am a vastly better ActionScripter because I retained some CS knowledge from learning C.
Robust debugging is something every developer needs to be able to do, and you seldom have an assembly expert on hand to help you out with C. And if you did, and you asked them for help, they'd probably hit you with a big fat RTFM re: stacks and automatic storage and dynamic storage and C standards and OOP and GC and a whack-em stick the size of Rico's ego.
BTW: Box2D on the iPhone / iPod-Touch is... nerd crack. So much fun.
Cheers.
-Pete
James — Apr 01, 09 6677
Rico's wrong - you cannot debug a complex C app without some understanding of stack, stack frames and principles of memory allocation.
So, if you write your code right first time, you have nothing to worry about.
And we know THAT never happens.
ut — Apr 23, 09 6711
void main()
{
array[0] = 3;
}
int array[500];
void main()
{
array[0] = 3;
}
whats the difference of both arrays from memory allocation point of view?
ut — Apr 23, 09 6712
void main()
{
array[0] = 3;
}
int array[500];
void main()
{
array[0] = 3;
}
whats the difference of both arrays from memory allocation point of view?
ut — Apr 23, 09 6713
void main()
{
array[0] = 3;
}
int array[500];
void main()
{
array[0] = 3;
}
whats the difference of both arrays from memory allocation point of view?
ut — Apr 23, 09 6715
void main()
{
array[0] = 3;
}
int array[500];
void main()
{
array[0] = 3;
}
whats the difference of both arrays from memory allocation point of view?
Gordon Potter — Sep 13, 09 6862
First off thanks for these tutorials. They are nice, clean, and concise. Helpful for taking baby steps into the world of C programming.
Quick elementary question about your first example here:
Loops for Processing Arrays
Where do the functions rand() and srand() come from? I read your C tutorial and I would assume that there is a reference in the stdio.h or time.h but when I search both usr/include/stdio.h and usr/include/time.h I can't find them?
Perhaps I am missing something very elemental here? Are these hardcoded into the compiler?
Thanks,
Gordon
Gordon Potter — Sep 13, 09 6863
Great comment!
I am in a very similar situation as you. Long time Perl, PHP and Actionscript person. Now getting into C and Objective C. I am finding that learning the fundamentals is helping me better understand the higher level patterns and behaviors of stuff like Actionscript. Plus I want to learn how to make things for my beloved OS X.
Scott's basic approach to tutorials is well appreciated by us newbies. Even if they elide some of the finer points of C.
Keep up the great work Scott!
Chuck — Sep 14, 09 6864
NAME rand, srand, sranddev, rand_r -- bad random number generator LIBRARY Standard C Library (libc, -lc) SYNOPSIS #include
...followed by a whole bunch more information about what the functions do and how they're used.
By the way, random() and srandom() are better for getting random numbers (which is why the manpage for rand() says it's a "bad random number generator"). rand() and srand() are mainly there for compatibility.
Steffen Frost — Sep 28, 09 6890
#define COUNT 5
This #define topic hasn't been covered yet in the previous required tutorial.
Steffen Frost — Sep 28, 09 6903
Carl — Sep 28, 09 6911
What I came looking for, is an introduction to Arrays and how to manage the differences between a regular C style array and an NSArray. You don't go into that except to barely mention NSArray. Since this is geared towards Cocoa, I think your introduction should at least cover creating, copying and destroyine NSArrays, and their contents. A separate more detailed tutorial on actually using features/methods of NSArrays would be left for another tutorial.
I needed to learn this stuff for a short app I'm working on using a grid, 15 by 9, and tying myself up with memory management, persistence of data, and sparce data in an array, but not possible (well, nothing is really impossible). I finally went with a 135 element NSString (each element is 2 chars long) which I index arithmetically (j*width+i)*2 using substring to extract my data.
I found NO good tutorials on how to do multi-dimensional arrays with NSArray. I did find a few, but none were really that helpful. I've got it figured out now, but your first tutorial on this was so good, I was disappointed it didn't cover more. So, my comments on adding something on NSArray for the beginners.
Me — Oct 03, 09 6926
Martin — Oct 17, 09 6943
In your example of a variable length array, even though you set the size with a random number, you don't actually exceed the original length you specify. So :-
int stackArray[COUNT];
and
int stackArray[rand()];
are the same at runtime. To demonstrate a variable length array shouldn't you assign a value to the array passed the end of the array when it is setup? like in the following code :-
#include
#include
#define COUNT 5
main()
{
srand(time(NULL));
int stackArray[COUNT];
int i;
for(i=0; i< COUNT;i++){
stackArray[i] = rand();
}
for(i=0;i
In a version C that does not support variable length arrays as soon as you try to assign a value in this case to stackArray slot 6 it would crash with a "Segmentation fault"
Your thoughts,
Martin
Martin — Oct 17, 09 6944
#include <stdio.h>
#include <time.h>
#define COUNT 5
main()
{
srand(time(NULL));
int stackArray[COUNT];
int i;
for(i=0; i< COUNT;i++){
stackArray = rand();
}
for(i=0;i<COUNT;i++){
printf("Value %i: %i\n", i, stackArray);
}
stackArray[COUNT] = rand();
printf("Value %i: %i\n", COUNT, stackArray[COUNT]);
}
Bill — Oct 30, 09 6975
int (4 bytes) x 5 = 20 bytes
It's formatted as if it were a code segment but I believe it is just a bit of math? Maybe just show it as normal text.
Bill.
srikanth — Feb 08, 10 7390
I have written the program in this way. The array size is 16 according to the rule. But why I am able to access outside the size and how I am getting those values? I am bit confused in this. Can you please explain me Sir?
My mailId: [email protected]
#include<stdio.h>
main()
{
int array3[] = {1,2,3,4};
printf("\nSize of array3:%d.", sizeof(array3));
printf("\nValue in array3[0]: %d, array3[1]:%d, array3[2]:%d, array3[3]:%d, array3[4]:%d, array3[5]:%d, array3[6]:%d, array3[7]:%d\n", array3[0], array3[1], array3[2], array3[3], array3[4], array3[5], array3[6], array3[7]);
}
OUTPUT: I got.
Size of array3:16.
Value in array3[0]: 1, array3[1]:2, array3[2]:3, array3[3]:4, array3[4]:1, array3[5]:2, array3[6]:3, array3[7]:4
How this can be possible?