Mjollnir

C version of the MPI _ IoC idea

Each C source file can be thought of as a "C-Class", or the best code structure I could think of for near-OOP in C. Each "object" consists of a structure and methods or functions that operate on the items in that structure - collected in one file with one header. The header file provides a void pointer to the structure so the implementation may be changed without affecting other parts of the application. See the very simple TestMpiIocCApp.h below for example; it provides a typedef for the user of the class and a few simple methods for constructing the object, starting the application, and destroying the object when the application is finished, as well as checking the rank and size.

When any of the "C-Class" functions are invoked, the first thing the function does (except for the "constructor") is examine the supplied object to see if it's the correct type. This is done by having the structure contain an integer that is unique to objects of this type and checking it whenever one of these external-facing functions is invoked. All functions not listed in the header are static, and no unnecessary conversions are performed.

The sources:

Demonstration test application sources:
TestMpiIocC.c: This is a simple test routine. This main program makes use of the TestMpiIocCApp code. This "main" constructs the test application object, starts it running, and destroys it when the program finishes. After constructing the application, the program could check licensing including making sure the number of cores on which it is running doesn't exceed a license limit. Other than this, there are probably no necessary changes required here except perhaps a name-change.
TestMpiIocCApp.h
TestMpiIocCApp.c: This is the routine a user would modify for creating an application. Some parts of this "class" won't need modification; like the function to confirm that the correct object was supplied to the functions, the constructor, and the destructor (unless the user application allocates objects that need special cleanup).
TestMpiIocCMsg.h
TestMpiIocCMsg.c: This is a simple message just to show the application sending and receiving messages. Use this as an example for creating whatever messages are needed for the application.
Abstraction Layer sources:
MpiIocCAppLayer.h
MpiIocCAppLayer.c: (Not intended to be modified by the user) These routines allow the user to register message types with the abstraction layer, to get the rank and size, to send and broadcast message types to other nodes, and to start and stop the MPI abstraction layer.
MpiIocCMpiAssist.h
MpiIocCMpiAssist.c: (Not intended to be modified by the user) These routines simplify the task of creating message types that will be registered with the abstraction layer. Three #define functions are used in this process. The first one MICMA_DATATYPE_PREFIX takes the structure name and number of components, the second one MICMA_DATATYPE_ITEM is used for each of the items in the structure that must be transferred between nodes, and the third one MICMA_DATATYPE_SUFFIX finishes the type definition. The second one calls the MICMA_setComponentInfo routine which is available in the header file, but otherwise not intended to be used directly. These functions will be used by all the message types created for the application. It's used by the Terminate and Capabilities messages (see below) as well as the Test message (see above).
MpiIocCBTree.h
MpiIocCBTree.c: (Not intended to be modified by the user) These routines are used by the abstraction layer to store and retrieve the necessary methods for sending and receiving the messages. The binary tree stores a pointer to the constructor function, the tag number, the callback function and object, and the MPI datatype.
MpiIocCMsgTerminate.h
MpiIocCMsgTerminate.c: (Not intended to be modified by the user) This is the internal message for shutting down the abstraction layer. It can be used as a good example of how to create a simple user-defined message type.
MpiIocCMsgCapabilities.h
MpiIocCMsgCapabilities.c: (Not intended to be modified by the user) These routines are for gathering and transferring the capabilities of each node back to the control node (rank=0) so the user application can use them for determining the amount of memory available, how many instances share that memory, how many cores are on the blade / processor, and similar.

The sequence:

The TestMpiIocC main begins by creating an object of the test application using the "constructor" function, it then starts the application, and when the application is finished and control is returned to the main, it calls the "destructor" function and exits. Any time after the constructor is called, the user application may access the rank and size.
When the constructor for this test application is called, it allocates memory for the "object", constructs and stores the MPI abstraction layer object, and registers the internal message types for termination and capabilities.
The main application may do any necessary license checking at this point - as the size of the network is now available.
When the start method is called, the application gathers the capabilities of the current node (done on all nodes) and passes this back to rank=0 (what I call the "control node"). When all nodes have reported their capabilities back to the control node (including the control node itself) and stored for later use, the abstraction layer calls the user function TA_startupCallback which allows the user application to examine the capabilities of all the nodes, determine how to initially distribute the work, &c. and send the first messages that begin the communications. A test message is broadcast to all the nodes and control reverts to the abstraction layer.
When the test message is received, it is sent back to the control node. When the test message is received by the control node, it sends the Terminate message to all nodes and exits the message loop; control returns to the main program.

Keep in mind that because the nodes are all running asynchronously, the printed messages explaining what's happening may not be in a recognizable order. Run the application with this command: mpiexec -n 3 bin/TestMpiIocCApp and it may produce this print:

	(0:3)	 MICMC_dump (capabilities) message id:737179 tag:59 INFO:: cores:8 name:Eshnunna upTime:889736 load:(60992,46752,36704) ram:33599250432
	(1:3)	 MICMC_dump (capabilities) message id:737179 tag:59 INFO:: cores:8 name:Eshnunna upTime:889736 load:(60992,46752,36704) ram:33599250432
	(2:3)	 MICMC_dump (capabilities) message id:737179 tag:59 INFO:: cores:8 name:Eshnunna upTime:889736 load:(60992,46752,36704) ram:33599250432

	(main)	 TA_startupCallback Application starting.

	(1:3)	 TMICM_dump (test message) message id:238971 tag:23
	(1:3)	 MICMT_dump (terminate) message id:898379 
	(2:3)	 TMICM_dump (test message) message id:238971 tag:23
	(2:3)	 MICMT_dump (terminate) message id:898379 
	(0:3)	 TMICM_dump (test message) message id:238971 tag:23

	(main)	 TA_testMsgCallback Application ending.

The capabilities messages may arrive in any order, even if they arrived in numerical order for the test shown here. Once all the capabilities are received, the "Application starting." message will appear, and then the remainder of the messages - in whatever order happens. The test application could use the capabilities messages to see how many instances are running on the individual CPU / blades by looking at the name of the node ("Eshnunna", in this case). The application may then divide the available RAM based on how many instances are running on this node and sharing memory and cores. The application may designate one node to be the read (and buffer) or write node for the other instances running on this same CPU / blade - to reduce message overhead, for example.

In this example, rank=1 (see the first column information for the "(rank:size)" identifier) received the test message, sent it back to the control node, and then received the terminate message. Rank=2 also received the test message, sent that back to the control node and then received the terminate message. The control node received the test message from somewhere (not shown - could've been from either rank=1 or rank=2, but probably from rank=1) and then sent the terminate message to the other nodes.

This system was built with cmake, and the CMakeLists.txt files are here and here. You will have to adjust them for your own environment.