Introduction
Hi, once again, it's me - the ex-Microsoft Excel developer. This new article
is an examination of pointers within C# and .NET. This follows the pattern of my
earlier two articles Strings UNDOCUMENTED
and Array
UNDOCUMENTED. It's been a while since those last two but I will be rolling
in more articles in the new future.
Why talk about pointers? Aren't pointers supposed to be unsafe in C#, hence
requiring the unsafe
keyword. Well. Yes and no.
The keyword unsafe
in C# really means unverifiable--that the
runtime cannot verify that an unsafe program is type-safe. Generally, it is
impossible to determine if a program is type-safe or not. But, in .NET, the
runtime places restrictions and conventions on the type of code that can be
generated in IL; if the code adheres to these restrictions and conventions, then
it is verified as a type-safe. In actuality, even verified code can not be
type-safe because of some conventions that can not theoretically be verified are
automatically accepted as type-safe if encounted. Type-safety ensures, for one
thing, that any pointers do not refer to inappropriate memory or memory of the
wrong type. Thus, one of these restrictions is that any pointer dereference is
instantly considered unverifiable.
Type-safety is very useful in the case when two programs or libraries share
the same address space; type-safety ensures that one program cannot directly
refer memory from the other program. This a level of protection that
historically could only be provided by each program being in separate processes
with separate address spaces. It is useful for libraries and for codes and
controls hosted within other applications or servers such as IIS. Code that is
not type-safe will only executed if it "trusted" by the hosting application, or
if it is part of the Microsoft's .NET framework.
For stand-alone applications, there few, if any disadvantages, in using
unsafe
code. Managed C++, for instance, can only generate
unverifiable code. If it's good enough for C++, why not C#? However, the use of
pointers, in general, are prone to errors in any language, and care must be
used. Of course, object references are pointers, too, but misuse of object
references result quickly in a NullReferenceException
or
InvalidCastException
. Regular pointers do not throw exceptions
after an invalid cast. Regular pointers can be incremented or decremented or
converted to/from another integer type--increasing the likelihood of referring
to invalid memory. Regular pointers can also refer to invalid memory quietly, if
the memory allocation is freed elsewhere, if the garbage collector compacts the
heap (there shifting object references downward), or if stack-based allocation
is reclaimed after a function call.
Disadvantages aside, pointers can do wonders for the performance and
compactness of an application. In addition, because of their flexibility and few
restrictions, they can accomplished tasks which would be otherwise impossible,
particularly when dealing with unmanaged resources and the Win32 API. Pointers,
for instance, allow the binary representation of a double or other valuetype to
be serialized and deserialized as a compact stream of bytes to disk.
I personally avoid the use of pointers, if possible. If I use them, I
encapsulate the pointer-provided functionality with a special class to isolate
the use of pointers for ease of maintenance. The chances of pointers breaking is
relatively high with each new revision of the .NET framework or each new
platform that .NET is extended to, such as Win64, Rotor and the Compact .NET
framework. Having done some contract work for Microsoft just recently, I have
seen the specs on the next version of C#. All I can say is that many radical
improvements are in store for .NET and that it is highly likely that there may
be fundamental changes to the underlying engine that can affect the use of
pointers. Of course, you have the ability to bind the application to a
particular version of the framework, but keep pointer-based code isolated in a
single class will allow ease any future migration to a later, and better
framework.
The Pointer Data Type
In this article, I will give a summary of the various different things that
can be done with pointers, both documented and undocumented. I also show how one
can emulate many of the neat features provided to the Managed C++ compiler, but
missing in C#.
We were led to believe in C# that everything is an object. However, this is
not true. Pointers are indeed a recognized type in CLR; but, they are neither a
descendant of Object
nor of ValueType -
they are
a root of their own, with no member functions, not ToString()
, not
even GetType()
.
Being a root, a pointer cannot be converted or boxed to an Object, so all the
polymorphism provided by functions like Console.WriteLine doesn't work on
them. But you can create various pointer types using
Type.Create("System.Void*")
or typeof(int*)
, but you
can't dynamically instantiate one with Activator.CreateInstance()
,
since that returns an object.
Pointers must be converted to an IntPtr
indirectly. An
IntPtr
is guaranteed to be the same size of a pointer, which
becomes more important as .NET moves to PDAs, Unix, and Win64.
IntPtr.Size
contains the current size of a pointer.
To use pointers with the Reflection APIs, the pointers must be wrapped inside
the System.Reflection.Pointer
class.
Pointers require that the developer turn on the unsafe
switch in
the C# compiler and declare as unsafe
some enclosing scope such as
a class, method, or block where a pointer is used. Pointers are not
CLS-Compliant, so any code that cares about compliance should prefix the
function or class with the [CLSComplaint(false)]
attribute.
Pointers may be neglected by the .NET Framework, but they are a fundamental
data type in IL. There are actually three pointer types in IL, (not counting
object, array, or string).
& |
|
ref, out |
managed pointer |
* |
System.IntPtr |
void *, ... |
unmanaged pointer |
typedref |
System.TypedReference |
typedref |
|
I'll go over each of these types.
Unmanaged Pointers
Unmanaged pointers are the standard pointers that you are familiar with. In
IL, they are essentially like integers, except with the ability to dereference
the value contained within them.
In C#, unmanaged pointers can refer only to ValueType
s, and only
those ValueType
s that do not contain embedded object references -
also known as blittable types. A large amount of effort was expended by the C#
designers trying to keep the object data type inaccessible and opaque.
There are a variety of standard operations that can be performed with
them.
pointer ++ |
Pointer Increment |
pointer -- |
Pointer Decrement |
pointer + integer |
Pointer Arithmetic |
pointer - pointer |
Pointer Difference |
(void *) integer |
Integer Casting |
(void *) pointer |
Pointer Casting |
In addition, several keywords are also provided.
Stack-based Allocation: stackalloc
char *pointer = stackalloc char[10]
stackalloc
allocates memory dynamically from the stack. Its very
fast and the memory is reclaimed after the enclosing function exits. But it has
some limitations. It only works with blittable types. It's good for arrays that
will never be anything more that a few hundred kilobytes. By default, the
maximum stack size is one megabyte, I believe. In practice, function call depth
is very very small, unless you something like a recursive function scanning a
deep tree. Offsetting the risk that a stackalloc can generate an
StackOverflowException
is the infrequency of multiple dynamic stack
allocations in use at the same time.
Even for moderately large arrays, stack-based allocation can be preferable to
heap allocations, especially if these allocations are very frequent or very
temporary. The worst type of allocation for the garbage collector are large
arrays, especially temporary ones. Large objects (over 85K) are placed
separately in the large heap and are only released during a full garbage
collection. (That is, the more frequent Gen 0 or Gen 1 partial collections do
not release large memory allocations.) Large objects are never compacted,
resulting in potential fragmentation and wasted memory.
Memory Pinning: fixed
fixed (int *p = &obj.member)
{
}
The fixed
statement causes an object in the heap to be pinned.
For values stored on the stack, the fixed statement is not necessary and is
actually disallowed.
There are really three different types of fixed assignments, with three
different results in IL.
string referencing |
Calls
System.Runtime. Compiler.Services. RuntimeHelpers. OffsetToStringData |
array referencing |
This results in a call to ldelema in
IL. |
field member |
This results in a call to ldflda in
IL. |
The fixed
statement is exactly equivalent to the following in
Managed C++.
int __pin *p = &obj->member
There is no overhead for entering the fixed block other then the actual
(&) address referencing operation. Pinning occurs upon garbage collection,
when the runtime scans information about the local variables contained in the
function metadata and discovers that the variable "p" is marked as pinned
pointer. After control leaves the fixed boxed, the only overhead is the
assignment of null to p, so that the original object is no longer pinned.
Pinning has a HUGE cost to the garbage collector. I assume that you are
familiar with the generational algorithm of the garbage collection. Let us say
we allocated enough memory to fill Gen 0 Heap (the youngest), and that an
additional allocation will trigger a collection. If that very last allocation at
the end of the heap was pinned, the pinned object moves to generation 1. (Call
GC.GetGeneration(obj)
and see). Gen 1 is guaranteed to grow to
include the pinned memory at the very end of the Gen 0 Heap. Even if all other
memory in Gen 0 was freed, that would still leave a huge unreclaimed space of
memory and Gen 0 will begin allocating starting from its previous limit. That is
how bad "pinning" is.
Moral of the story: when you use fixed
, do whatever you have do
quickly and avoid any memory allocation in the process, which can potentially
trigger a garbage collection. If a garbage collection did occur inside a
fixed
block, most likely the pinned memory was close to the end of
Gen 0 heap.
Managed Pointers
Managed pointers are available in Managed C++ and IL, but not directly in C#.
The ref
and out
parameters are each implemented as
managed pointers. In managed C++, managed pointers are "interior"
__gc
pointers that point to ValueType
s, while "whole"
pointers are just regular object references.
There is a way to mimic managed pointers through
System.TypedReference
, but at a performance cost and the loss of
pointer arithmetic. Even though managed pointers are not directly supported by
C#, I will describe a little about them.
What are differences between managed pointers and unmanaged pointers?
The main difference is that managed pointers are tracked by the garbage
collector, and unmanaged pointers are not. After garbage collection, the
address of an object may move downward or disappear as memory is being
compacted. So, the data with a heap object pointed to by an unmanaged pointer
will most likely be invalid.
However, there is one exception, at least for the current version of the CLR.
Large arrays (over 85KB are never moved), so an unmanaged pointer reference to
data within that array will remain valid, provided another object reference to
the array exists so that the memory is not freed. But, using fixed is still
safer and won't hurt performance.
Managed pointers refer to field members or elements within the object.
Managed pointers can point to members of any types - not just blittable types.
During a garbage collection, objects in regular heap referred to by managed
pointers are both visited and later adjusted. References to values that are
outside the regular heap are not adjusted, because only objects in the heap are
modified.
The other big difference is that managed pointers can only be used as
function parameters or local variables. They cannot be used stored inside
structures or used as global/static variables.
If C# had provided managed pointers, it would have helped with the potential
pinning issue mention earlier, which is minor if the operation is performed
quickly; on the other hand, managed pointers can't be used inside object like
unmanaged pointers. The potential exists for compiler optimization.
System.TypedReference
TypedReferences
are basically managed pointers that also include
type information. They map to the typedref
datatype in IL. Like
managed pointers, they cannot be used as a member of structure or in global
variables; they can only be used in declarations for parameters and local
variables.
Unlike pointers, TypedReference
are derived from
ValueType
, which is derived in turn from Object. However, like
pointers, the runtime will disallow any conversion of the type to
Object
or any ValueType.
Otherwise, this would have
circumvented the restriction that managed pointers cannot be members within
structures and classes--an ability that would greatly hurt the performance
of garbage collection .
Someone has mentioned that TypedReference
were provided to
support ParamArray
for VB developers. I refuse to believe that they
aren't very useful if some developer thought that they merited their own special
undocumented keywords, especially since these are no other undocumented keywords
listed in the source for C# compiler.
There are four additional undocumented keywords supported by the
compiler all support the use of the TypedReference: __makeref, __refvalue,
__reftype, __arglist
.
TypeReferences are constructed by using __makeref.
int value = 0;
TypedReference typedRef = __makeref(value);
The type and value of a TypedReference is extracted by using the
__refvalue
and __reftype
.
Type t = __reftype(typedRef);
int result = __refvalue(typedRef, int)
To change the value of the memory location referred to by a
TypedReference
, one can use __refvalue
as on the
left-hand side of an assignment.
__refvalue(typedRef, int) = 2;
Typed references can be returned from functions, unlike regular C# pointers
that refer to heap objects, because the fixed block must be contained with a
single function. Type references offer a polymorphic alternative to using
objects that eliminates the need for boxing. The tradeoff is that the programmer
may need to explicitly check the type of the reference before reading or
assigning to the reference. Even with the checking, there are still enormous
performance benefits from excessive copying and the invocation of the garbage
collector.
A classic use is something like obj.SetProperty(propertyname,
typedRef)
and obj.GetProperty(propertyname, typedRef)
. If an
object had a number of "properties" that indexed by a name, but each of these
"properties" are of a different type, and not necessarily of a reference type.
One function call with one signature suffices to handle all the possible
types.
System.Reflection.FieldInfo
has a GetValueDirect(typedRef)
and SetValueDirect(typedRef, value)
that allow valuetypes in
unboxed form to be set and read easily through reflection. Otherwise, it would
have been almost impossible to set the value of valuetype through reflection,
since the regular function SetValue
only works on boxed types.
I say "almost" because another way of creating a Typed Reference is to
use the TypedReference.MakeTypedReference(Object target, FieldInfo[]
flds)
. In most cases, you will just use a one-element array of
FieldInfo
, but if a field desired is embedded in a series of
structs inside the object, multiple elements are needed. Once created, reading
and writing from the a typed reference is significantly faster than making
multiple calls to reflection with FieldInfo.GetValue
and
FieldInfo.SetValue
. On the other hand, the traditional reflection
approach allows non-public fields to be accessed, where the
MakeTypedReference
includes a ReflectionPermission
security attribute to guard against unauthorized member access.
The other places in the framework where TypedReference
s are used
include serialization and the obsolete and hidden System.Variant
struct, that was initially intended to be offered in VB.NET.
In addition, there is an __arglist
keyword that is similar to
params
. It may have been provided to support both VB and C++ style
variable-length arguments. Instead of passing an array, parameters are stored
inline with their type information. To read the values requires the use of the
System.ArgIterator
struct. (There is a System.ParamArray
struct available too, which takes an arglist, and has a GetArg(int
i)
and a GetCount()
. It's marked as obsolete and considered
dead technology, perhaps because it appears to support a maximum of nine
arguments.) String.Concat
has an arglist variant that allows
compilers to automatically perform concatenation of multiple strings with one
function call and no need to use additional arrays.
Console.WriteLine
also has an arglist variant.
Function (__arglist(a,b,c));
If you are looking for a performance advantage using __arglist
over params, you would be disappointed to find out that this argument
passing convention can be an order of magnitude slower than recommending method
of using params.
public void Function(__arglist)
{
ArgIterator iterator = new ArgIterator(__arglist);
for (int count=iterator.GetRemainingCount(); count>0; count--)
{
TypedReference typedRef = iterator.GetNextArg();
Console.WriteLine(__refvalue(typedRef, int));
}
}
TypedReferences
can also be constructed and manipulated by
static methods provided in the class.
TypedReferences
have several disadvantages like the lack of
pointer arithmetic, but do they produce type-safe code.
TypedReference parameters are used throughout the framework as a polymorphic
substitute for objects, to prevent unnecessary boxing.
System.Runtime.InteropServices.GCHandle
Managed C++ uses GCHandle
and its C++ wrapper
gcroot<>
to store managed pointers within unmanaged
classes.
GCHandle
, located in
System.Runtime.InteropServices
, is a structure that contains a
handle, which is an index to some global table of object references, together
with information about the reference type--whether it is a normal, pinned, weak
or long weak reference.
A GCHandle
is created by using GCHandle.Alloc(object,
gcHandleType).
The handle type can be one of GCHandleType.Normal,
GCHandleType.Weak, GCHandleType.WeakTrackResurrection
, and
GCHandleType.Pinned
.
It does have a couple uses for C# programmers.
1. Unboxing
In IL and Managed C++, the unboxing provides a managed pointer to the
underlying value within an boxed object.
In C#, however, during an unboxing operation, the value in the object is
copied over to another variable. Because of this, it's almost impossible to
change the value of a boxed int
. Once a ValueType
is
boxed and stored in the heap, the contents are virtually immutable unless an
Interface has been implemented on the ValueType
that allows its
values to be changed, since an object can be casted to an Interface.
In contrast, C++, one can dereference the managed pointer obtained through
the unboxing operation and directly change the content of the boxed form of the
value.
However, with a pinned object from a GCHandle
.
GCHandle h = GCHandle.Alloc(obj, GCHandleType.Pinned);
int *p = (int *) h.AddrOfPinnedObject();
*p = n;
h.Free();
2. Directly using GCHandle
in place of the
WeakReference
class
The WeakReference
class encapsulates a GCHandle
, a
struct
, into a class
. In addition, it defines a
finalizer that calls Free
on the handle. The use of wrapper class
introduces an overhead of 16 additional bytes, none of which may be freed until
a full garbage collection is performed.
If one were to write a WeakArray
, where all the elements are
weakly referred to, or a WeakHashMap
, which is a
HashTable
, found in Java, where elements drop out automatically
when no longer used, it would be more efficient to store a GCHandle
directly for each element and attach just one finalizer to the collection
instead of for each item. (A WeakHashTable
is similar to a String
Intern Table, where one can alway obtain the canonical form of multiple
equivalent objects, with the added feature of automatic freeing of unreferenced
objects).
Structure Layout
When using pointers, it is helpful to know how the runtime presently
rearranges fields in an object. The layout of an object can be overrided by
using the StructLayoutAttribute
with different values of
LayoutKind (Auto, Sequential, and Explicit)
.
For C# classes, the default behavior is AutoLayout. For C# structs, the
default behavior is sequential, since it is more likely to be used through
P/Invoke.
Auto Layout is designed to allow the runtime to proceed through garbage
collection most quickly and for reducing the total size of a struct or class.
Different arrangements of fields can result in different structure sizes because
of padding and alignment issues. Int32
need to be aligned on 4-byte
boundaries for optimal processor performance. Int64
and
doubles
require an 8-byte boundary. For primitive value types, the
boundary requirements correspond to the size of the data type.
All object references are sorted first to make it easier for the garbage
collector to visit each references. Then each field member is sorted from the
most restrictive to the least restrictive alignment requirements--beginning with
double, then int
, then short
, then bytes and
bool
. Note: A field member whose type is an 8-byte
struct
may still be aligned on 1 byte boundaries, if it consists
entirely of byte
field member; the alignment of the
struct
is based on the alignment requirements of its most
restrictive field.
Sequential Layout is pretty straightforward with the following caveats.
Object references are still sorted first. Each member is still properly aligned
with its natural byte boundary. There is no packing option in .NET.
Explicit Layout is useful for implemented C++-like unions in C#. But the
FieldOffset
attribute needs to be specified for each member.
Unmanaged Memory Allocations
If you are going to be using pointers, it may make sense to allocate from an
unmanaged heap. For the vast majority of needs is not any better than the using
the garbage collector, but if you are allocating a large amount of memory,
especially for short periods of time, need better control of when memory is
release, Win32 provides a set of Memory Management functions.
The class below is from the ECMA C# Standard specification. I had created a
similar Memory
class, which I can't seem to find at the moment,
which includes the Win32 CopyMemory, FillMemory, MoveMemory
and
ZeroMemory
functions, missing below. I will be replacing this
sometime soon.
using System;
using System.Runtime.InteropServices;
public unsafe class Memory
{
static int ph = GetProcessHeap();
private Memory() {}
public static void* Alloc(int size)
{
void* result = HeapAlloc(ph, HEAP_ZERO_MEMORY, size);
if (result == null) throw new OutOfMemoryException();
return result;
}
public static void Copy(void* src, void* dst, int count)
{
byte* ps = (byte*)src;
byte* pd = (byte*)dst;
if (ps > pd) {
for (; count != 0; count--) *pd++ = *ps++;
} else if (ps < pd) {
for (ps += count, pd += count; count != 0; count--)
*--pd = *--ps;
}
}
public static void Free(void* block) {
if (!HeapFree(ph, 0, block)) throw new InvalidOperationException();
}
public static void* ReAlloc(void* block, int size) {
void* result = HeapReAlloc(ph, HEAP_ZERO_MEMORY, block, size);
if (result == null) throw new OutOfMemoryException();
return result;
}
public static int SizeOf(void* block) {
int result = HeapSize(ph, 0, block);
if (result == -1) throw new InvalidOperationException();
return result;
}
const int HEAP_ZERO_MEMORY = 0x00000008;
[DllImport("kernel32")]
static extern int GetProcessHeap();
[DllImport("kernel32")]
static extern void* HeapAlloc(int hHeap, int flags, int size);
[DllImport("kernel32")]
static extern bool HeapFree(int hHeap, int flags, void* block);
[DllImport("kernel32")]
static extern void* HeapReAlloc(int hHeap, int flags, void* block,
int size);
[DllImport("kernel32")]
static extern int HeapSize(int hHeap, int flags, void* block);
}
Conclusion
This concludes my discourse on pointers for now. I will continue update this
article with new source code and actual benchmarks in the future. Be sure to
watch for update versions of this page.
There is probably a whole lot more to talk about with Marshaling and
P/Invoke.
All of this behind-the-scenes information takes some amount of work to
research and obtain, so, if you enjoyed this article, don't forget to vote.
Version History
Version |
|
April 30 |
Original article on
pointers |