How to Implement Classes in C
Classes are a core component of object-oriented (OO) languages. An instance of a class contains data plus functions that operate on the data. To create an instance of a class you call a new() function. How do OO languages implement the new() function? How can the new() function create a new instance of a class that it has never seen before? That is, how can a new class be "dropped in" without any changes to the new() function? This web page shows how it is done. It is wicked cool. Every programmer should know how to do this, even if you are not using C.
To illustrate how to create classes, we implement a Person class. An instance of the Person class contains data about a person – name, age, sex. The functions operate on the data. For simplicity, we implement just three functions: toString, clone, and delete. The toString() function returns a string representing the instance's data. The clone() function returns a copy of the instance. The delete() function destroys the instance (frees memory used by the instance). An instance of the Person class is created by calling the new() function like this:
new(Person, "John Doe", 30, 'M')
That instructs the new() function to create an instance of the Person class and the instance is to contain "John Doe" as the name of the person, 30 as the age, and 'M' as the sex. Let's look at a sample program that uses the Person class. First, the program creates an instance of the Person class containing the data "John Doe", 30, 'M' and stores the instance in a variable named "a". Then "a" is cloned and the clone is stored in a variable named "aa". Then another instance is created containing the data "Sally Smith", 29, 'F' and the instance is stored in a variable named "b". The John Doe instance and the clone are compared for equality. The toString() function is then called on each instance. Finally, memory for the instances are freed by calling the delete() function. Here is the sample program:
#include <stdio.h> #include "Person.h" #include "new.h" int main () { void * a = new(Person, "John Doe", 30, 'M'), * aa = clone(a); void * b = new(Person, "Sally Smith", 29, 'F'); if (a == aa) puts("clones"); printf("%s\n", toString(a)); printf("%s\n", toString(aa)); printf("%s\n", toString(b)); delete(a), delete(aa), delete(b); }
Here is the output from running the program:
name: John Doe, age: 30, sex: M name: John Doe, age: 30, sex: M name: Sally Smith, age: 29, sex: F
The John Doe instance and the clone are not equal. Interesting! That tells us something about the implementation: clone() creates a new instance of the class and copies the data in the first instance into the second instance; thus, the instances don't have the same pointer values. Another implementation of classes might result in the equality test returning true.
The toString() function formats the instance data by adding labels to each data item.
The class instances are pointed to by the variables "a", "aa", and "b". Their types are void *, which means "pointer to something". The structure of class instances is opaque. That is good. Users must not directly operate on instances. Only class functions should operate on instances. To achieve this opaqueness, the header file (new.h) for the new() function reveals nothing about the structure of class instances:
void * new (void * class, ...);
That "function prototype" says this: when the new() function is invoked, the first argument must be the name of a class, which is a "pointer to something". Following the name of the class, any number of values can be provided (the ellipses … means "any number of values"). The new() function returns a "pointer to something". That certainly reveals nothing to users about the structure of class instances!
The functions that operate on instances are equally non-revealing about the structure of instances:
char * toString (void * self); void * clone (void * self); void delete (void * self);
The name "self" is used to mean "an instance of the class".
The toString() function is passed a pointer to an instance of a class and returns a string (char * means "string"). The clone() function is passed a pointer to an instance of a class and returns a copy (clone) of the instance. The delete() function is passed a pointer to an instance of a class and returns nothing (it frees memory used by the instance). Those function prototypes reveal nothing to users about their working or about the structure of instances. That is wicked cool.
Here is the content of new.h
#ifndef NEW_H #define NEW_H #include <stddef.h> void * new (void * class, ...); void delete (void * self); void * clone (void * self); char * toString (void * self); #endif
A file named new.c contains the implementation of those function. Users should never see new.c; they should only see the content of new.h. When a user calls new(Person, "John Doe", 30, 'M'), that results in executing the new() function in new.c, which calls the relevant function in Person.c. Here is a diagram which shows the files that are visible to users and the files that are not visible:
The new.r ("r" for restricted to class implementors) header file tells class implementors: Here are the functions you must implement. There are different ways to express that, but we will express it using a C struct. Actually, two structs will be used to express classes: one to hold the class's data ("data struct") and another to hold the functions that operate on the data ("function struct"). new.r contains the latter struct (the function struct). The data struct is highly specific to the class; new.r cannot possibly know the members of that struct. The function struct contains the function declarations. Each class implementor must provide a function definition for each function declaration. The function struct also contains a member named "size"; the class implementor must assign this member the size of his/her data struct. Here's new.r
#ifndef CLASS_R #define CLASS_R #include <stdarg.h> #include <stdio.h> struct Class { size_t size; void * (* ctor) (void * self, va_list * app); void * (* dtor) (void * self); void * (* clone) (void * self); char * (* toString) (void * self); }; #endif
ctor = constructor function; creates an instance of a class. The ctor() function, when invoked, is provided a partially filled-in memory block (of size equal to the size member) and a variable list of arguments
dtor = destructor function; frees up memory used by the instance
The Person.h header file "declares" a Person variable:
extern void * Person;
The "extern" keyword indicates that we are declaring a variable named Person. Recall that "declaring" means memory is not allocated to the variable. When a user calls the new() function with argument Person, it is not this Person variable that is the argument; rather, it is the "definition" of Person that is the argument. Where is the Person variable defined? Answer: In Person.c
In Person.c the Person variable is declared and initialized to the address of _Person:
void * Person = & _Person;
The Person variable is a pointer to the function struct for the Person class!
_Person is a variable of type struct Class:
struct Class _Person = { sizeof(struct Person), Person_ctor, Person_dtor, Person_clone, Person_toString };
_Person initializes struct Class with the names of Person functions: Person_ctor implements the ctor() function listed in new.r, Person_dtor implements the dtor() function listed in new.r, Person_clone implements the clone() function listed in new.r, and Person_toString implements the toString() function listed in new.r
Notice sizeof(struct Person) in the initialization of _Person. What is struct Person? Answer: It is another structure in Person.c:
struct Person { void * class; char * name; int age; char sex; };
It is the "data struct" for person. That is, it is a struct describing the data that is relevant to persons (name, age, and sex). The first member is a pointer to its "function struct" (the struct containing the functions that operate on its data).
Let's recap. Person.c contains struct Person which describes the data relevant for persons; i.e., it is the data struct for the person class. Person.c contains struct Class which provides the names of functions that implement each function listed in new.r; i.e., it is the function struct for the Person class. And Person.c contains the implementation of the functions listed in new.r. Here is Person.c:
struct Person { // this is the "data struct" for the Person class void * class; char * name; int age; char sex; }; struct Class _Person = { // this is the "function struct" for the Person class sizeof(struct Person), Person_ctor, Person_dtor, Person_clone, Person_toString }; void * Person = & _Person; void * Person_ctor (void * _self, va_list * app) { // will show this function's code later } void * Person_dtor (void * _self) // this is called from the delete() function in new.c { // will show this function's code later } void * Person_clone (void * _self) // this is called from the clone() function in new.c { // will show this function's code later } char * Person_toString (void * _self) // this is called from the toStrong() function in new.c { // will show this function's code later }
A user program includes new.h and Person.h:
#include "new.h" #include "Person.h"
The user program invokes the new() function like this:
void * a = new(Person, "John Doe", 32, 'M');
The first argument is Person, which is the Person variable in Person.c, which is a pointer to the struct Class that is declared in Person.c. It results in calling the new() function in new.c.
void * new (void * _class, ...) { // let's look at this code }
The new() function reveals nothing about classes – the first argument is just a pointer to a void. So the first thing that the new() function does is specify that the first argument is actually a struct Class. It does this by creating a variable, class, of type struct Class and assigning it the parameter, _class.
void * new (void * _class, ...) { struct Class * class = _class; }
Recall that the first member of struct Class is "size". It is the size of the memory needed for the "data struct" (the Person data struct). So the next thing that the new() function does is create a block of memory for the data struct. "p" is a pointer to the block of memory.
void * new (void * _class, ...) { struct Class * class = _class; void * p = malloc(1, class->size); }
The next thing that the new() function does is wicked cool. It takes that block of memory that was just created and sets the first portion of the memory block to point to the function struct (i.e., point to class): "p" is cast to be a pointer to the memory block pointer (i.e., "p" points to a pointer to class) and then the value of the memory block that p points to is assigned the pointer to the function struct (class). Here is a graphic of this:
Here is the C code:
void * new (void * _class, ...) { struct Class * class = _class; void * p = malloc(1, class->size); * (struct Class **) p = class; }
"class" points to the function struct; i.e., it points to the struct holding the names of the Person functions (Person_ctor, Person_dtor, etc.). The next thing the new() function does is call Person's constructor function (Person_ctor) and passes it the data that the user provided to the new() function ("John Doe", 30, 'M'). The constructor function is passed the data struct, which is pointed to by p.
void * new (void * _class, ...) { struct Class * class = _class; void * p = malloc(1, class->size); * (struct Class **) p = class; va_list ap; va_start(ap, _class); class->ctor(p, & ap); }
The constructor function fills in the data struct and returns a pointer to the filled-in data struct. The returned pointer is assigned to p (p's old value is overridden).
void * new (void * _class, ...) { struct Class * class = _class; void * p = malloc(1, class->size); * (struct Class **) p = class; va_list ap; va_start(ap, _class); p = class->ctor(p, & ap); va_end(ap); }
The new() function returns a pointer to the data struct that was created and populated.
return p;
Here is the new() function in its entirety:
void * new (void * _class, ...) { struct Class * class = _class; void * p = malloc(1, class->size); * (struct Class **) p = class; if (class->ctor) { va_list ap; va_start(ap, _class); p = class->ctor(p, & ap); va_end(ap); } return p; }
Terminology: When the new() function was invoked, it had no idea that it would be calling the Person_ctor() constructor function. The new() function certainly does not hardcode what constructor function it will call. The constructor function that is called is determined as late as possible, at execution time. This is called late binding or dynamic linkage.
Now let's switch over to Person.c and look at the function that just got called, i.e., let's look at the Person_ctor() function.
The new() function sends to Person_ctor() a block of memory. The block is mostly empty, except for the first portion which has been filled in with a pointer to the Person function struct. The job of Person_ctor() is to fill in the memory block with the data relevant to persons – name, age, and sex.
void * Person_ctor (void * _self, va_list * app) { // let's look at the code in here }
The new() function knows nothing about the class that it calls, so the parameter list of Person_ctor merely says that the first argument is a pointer to something. So, the first thing that Person_ctor does is specify that the first argument is actually a struct Person. It does this by creating a variable, self, of type struct Person and assigning it the first parameter, _self.
void * Person_ctor (void * _self, va_list * app) { struct Person * self = _self; }
The new() function calls Person_ctor() with 4 arguments: a pointer to the block of memory it created, and the next 3 arguments are the list of class-specific data; in this case, person name, age, and sex. So Person_ctor simply gets the three items in the list (va_list) and assigns the items to the appropriate members of struct Person. Here is a graphic of this:
Here is the C code:
void * Person_ctor (void * _self, va_list * app) { struct Person * self = _self; char * name = va_arg(* app, char *); self->name = malloc(strlen(name) + 1); strcpy(self->name, name); int age = va_arg(* app, int); self->age = age; char sex = va_arg(* app, int); self->sex = sex; }
Here is Person_ctor in its entirety:
void * Person_ctor (void * _self, va_list * app) { struct Person * self = _self; char * name = va_arg(* app, char *); self->name = malloc(strlen(name) + 1); strcpy(self->name, name); int age = va_arg(* app, int); self->age = age; char sex = va_arg(* app, int); self->sex = sex; return self; }
Here are the two functions side-by-side:
new.c | Person.c |
void * new (void * _class, ...) { struct Class * class = _class; void * p = malloc(1, class->size); * (struct Class **) p = class; if (class->ctor) { va_list ap; va_start(ap, _class); p = class->ctor(p, & ap); va_end(ap); } return p; } |
void * Person_ctor (void * _self, va_list * app) { struct Person * self = _self; char * name = va_arg(* app, char *); self->name = malloc(strlen(name) + 1); strcpy(self->name, name); int age = va_arg(* app, int); self->age = age; char sex = va_arg(* app, int); self->sex = sex; return self; } |
Let's switch back to new.c and look at the delete() function.
void delete (void * self) { // let's look at the code in here }
Whereas the new() function was invoked with the function struct, the other functions (delete, clone, toString) are invoked with the data struct.
Recall that the data struct points to the function struct which contains the destructor function (dtor) that is to be called. Here is a graphic that shows variable "a" pointing to the data struct which points to the function struct:
The delete() function knows that it is being passed a pointer to the data struct. And the data struct has a pointer to the function struct. So, the delete() function is being passed a pointer to a pointer to a struct Class.
void delete (void * self) { struct Class ** cp = self; }
The "cp" variable (cp = class pointer) points to the data struct which points to the function struct. (*cp) is the pointer to the function struct. Now we can reference the destructor function (dtor):
void delete (void * self) { struct Class ** cp = self; (* cp) -> dtor(self); }
It is the responsibility of the function that is called to free its class-specific memory (e.g., the Person class is responsible for freeing the memory associated with name, age, and sex).
But it does not free the memory block for the data struct, so the delete() function does that.
void delete (void * self) { struct Class ** cp = self; self = (* cp) -> dtor(self); free(self); }
Here is the delete() function:
void delete (void * self) { struct Class ** cp = self; if (self && * cp && (* cp) -> dtor) self = (* cp) -> dtor(self); free(self); }
Now let's see how Person.c implements the dtor() function. The Person_dtor() function is passed a pointer to something (pointer to void).
void * Person_dtor (void * _self) { // let's look at the code in here }
But Person_dtor() knows that it is a pointer to struct Person, so it declares a local variable of type struct Person.
void * Person_dtor (void * _self) { struct Person * self = _self; }
The name member was created using malloc, so we free that memory:
void * Person_dtor (void * _self) { struct Person * self = _self; free(self->name); }
And set that member to point to null:
void * Person_dtor (void * _self) { struct Person * self = _self; free(self->name); self->name = 0; }
Here is the Person_dtor() function:
void * Person_dtor (void * _self) { struct Person * self = _self; free(self->name), self->name = 0; return self; }
Here are the two functions side-by-side:
new.c | Person.c |
void delete (void * self) { struct Class ** cp = self; if (self && * cp && (* cp) -> dtor) self = (* cp) -> dtor(self); free(self); } |
void * Person_dtor (void * _self) { struct Person * self = _self; free(self->name), self->name = 0; return self; } |
Let's now look at the clone() function in new.c and the function it calls Person_clone() in Person.c. The clone() function uses the same pattern that we used with the delete() function: convert the parameter to a pointer-to-a-pointer to struct Class, and then return the result of calling the class's clone() function. The Person_clone() function simply calls the new() function, passing it the data (name, age, sex) which is obtained from the parameter.
new.c | Person.c |
void clone (void * self) { struct Class ** cp = self; return (* cp) -> clone(self); } |
void * Person_clone (void * _self) { struct Person * self = _self; return new(Person, self->name, self->age, self->sex); } |
The final function is the toString() function. It uses the same pattern: convert the parameter to a pointer-to-a-pointer to struct Class, and then return the result of calling the class's toString() function.
In the Person_toString() function I decided to format a person's data like this: if the data is this:
"John Doe", 30, 'M'
Then the Person_toString() function will return this:
name: John Doe, age: 30, sex: M
We need to allocate enough memory for the person's name (string):
char *str = malloc(100 * sizeof(char));
Then we concatenate the values (name, age, sex) using the snprintf() function: (neat function!)
snprintf(str, 100, "name: %s, age: %d, sex: %c", self->name, self->age, self->sex);
Here are the two functions side-by-side:
new.c | Person.c |
char * toString (void * self) { struct Class ** cp = self; return (* cp) -> toString(self); } |
char * Person_toString (void * _self) { struct Person * self = _self; char *str = malloc(100 * sizeof(char)); snprintf(str, 100, "name: %s, age: %d, sex: %c", self->name, self->age, self->sex); return str; } |
Here is a zip file containing the source code, header files, and makefile
Acknowledgement: Everything shown above I learned from the book Object-Oriented Programming with ANSI-C by Axel-Tobias Schreiner. See chapter 2.
Last Updated: May 18, 2021