Tiled matrix
Tiled matrix implements a multi-level tiled matrix.
A matrix is divided up into M levels, order to avoid cache misses. A few callback-based routines have been implemented for this matrix type, along with allocation and freeing.
For example, a test program (included in the source) on an E5405 (option 0 is brain-dead cache-thrashing matrix multiplication (_core function), option 1 is with a single level (_L1) of tiling, each roughly the size of the CPU's L2 cache, option 2 is a single level (_L1) of tiling with tiles the size of the CPU's L1 cache, and option 3 is two levels of tiles (_L2) the size of L2 containing tiles the size of L1) on an old version:
./xtiling.bin Processor is Xeon E5405; L1=32768B/32.00kB/0.03MB (4096 doubles), L2=3145728B/3072.00kB/3.00MB (393216 doubles) Size of: float:4 double:8 long double:16 char:1 short int:2 int:4 long int:8 long long int:8 Option 0 (core):Allocating, multiplying, checking matrices of 1, with size of 30, tile 9, subtile 32 (total 8640 per side; total size 74649600) Option 0: 10854 seconds, 636430microseconds Option 0: Got fives! Option 1 (L1):Allocating, multiplying, checking matrices of 1, with size of 30, tile 9, subtile 32 (total 8640 per side; total size 74649600) Option 1: 2322 seconds, 681719microseconds Option 1: Got fives! Option 2 (L1):Allocating, multiplying, checking matrices of 1, with size of 30, tile 9, subtile 32 (total 8640 per side; total size 74649600) Option 2: 1500 seconds, 258941microseconds Option 2: Got fives! Option 3 (L2):Allocating, multiplying, checking matrices of 1, with size of 30, tile 9, subtile 32 (total 8640 per side; total size 74649600) Option 3: 1464 seconds, 907788microseconds Option 3: Got fives!
Recent changes
The code is being put up on the web for the first time.
Getting tiled_matrix
To get the source code, you will need the Canonical's Free software distributed version control system (DVCS) bzr. Once you have bzr, you can get the latest version of tiled_matrix with the following command:
bzr branch http://digitasaru.net/bzr/tiled_matrix/
To download any revisions subsequent to your branch, cd into the tiled_matrix directory and then use the command
bzr update
Building tiled_matrix
Building tiled_matrix requires the GNU autotools (specifically, autoconf, automake, and libtool) and the GNU Compiler Collection's C++ compiler (g++). If you wish to use OpenMP, you will need to have GCC version 4.1 or later. Versions earlier than GCC 4.3 may work, but are not tested.
Once you've downloaded and installed the prerequisites and gotten the source code to tiled_matrix, the following commands will build tiled_matrix:
libtoolize autoreconf --insall ./configure [options] make #the next line is optional make check
As this is proof-of-concept code, it is entirely unsupported. In addition, no installation system has been tested.
Configuration options
The following configuration options are available:
- --enable-debug
- Turns on debugging and debugging symbols. It also turns off any optimizations.
- --enable-profiling
- Turns on GCC's profiling support. This is not tested, but will probably work (it was inherited from a previous project and is generally a good thing to have around; it's just not been enabled and tested.)
- --enable-openmp
- Turn on OpenMP multi-processor/core support.
- --enable-log_indices
- Make all matrix indices long unsigned ints instead of unsigned ints.
Future work
- The (barebones) check needs to be fixed, and additional checks introduced. This may or may not bring a cunit dependency.
- The code should become (Open)MPI-aware.
- Processor data is always welcomed.
License
tiled matrix is distributed under the GNU Affero General Public License version 3 or later.
Requirements
Building the library and tests requires the following:
- GNU g++ 4.3 or later
- GNU make
- GNU automake
- GNU autoconf
- GNU libtool
Page last modified Monday, 17-Aug-2009 14:08:02 EDT