CS322 4/20/03

A Programming Language Modification For C++

I propose to make a modification to the C++ standard adding a new data type called a BIT (which is written in caps here to make it easily identifiable) and being exactly that, with a few reasonable qualifications. The new BIT data type will significantly increase the efficiency of programs that need to store many flags or perform any variety of bit-level manipulations. BITs will be internally implemented as ints, and would typically be allocated in arrays. The number of ints required for an array ARRAYSIZE of bits will be equal to ARRAYSIZE / sizeof(int) (rounded up), where sizeof(int) is typically 32 bits. This implementation should improve the memory efficiency of programs making extensive use of BIT arrays and in some cases their performance as well. The D programming language uses a similar data type to the BIT (and of the same name) though it is implemented somewhat differently. D is a relatively new language, but it seems to be a viable one, and I imagine it works quite well with BITs.

The addition of the BIT data type to the C++ standard will allow full backwards compatibility with all legacy code except that which happens to use “bit” as a variable name, function name, typedef, etc. Therefore, it should have very little negative impact, and may even fulfill one of the hopes of many a C++ programmer. Indeed, it seems that every general-purpose language ought to implement a BIT data type, especially those that deal directly with hardware. BITs would effectively replace bit fields, though they should be retained for compatibility.

Some of the many uses for BITs include direct hardware access, flags, custom data pseudo-types, and gene sequences for genetic algorithms. The performance of such programs may actually be increased as compared to using arrays of bools, but realistically, it probably would not make much difference. It would however, improve the memory efficiency.

Learning to use the BIT data type would be relatively effortless, as it is used just like a bool except ones and zeroes are preferred to true or false. Anyone who would be likely to want to use this feature would certainly have no difficulties doing so. The only tricky part is that there are no pointers allowed to individual BITs, only to arrays of BITs. The reason for this is that the BITs are implemented using internal manipulations on ints, which cannot (or should not) be addressed except as a unit. It would be possible to access each BIT in the typical array fashion, however. BITs are intended for use in large arrays.

Here is a sample of some code before the implementation of BITs:

#include <iostream>

int main()

{

bool genes[120];

for (int i=0;i<120;i++) genes[i]=0;

for (int i=0;i<120;i++) genes[i]=i%2;

for (int i=0;i<120;i++) cout << genes[i];

cout << endl;

}

Here is the same code with BITs:

#include <iostream>

int main()

{

bit genes[120];

for (int i=0;i<120;i++) genes[i]=0;

for (int i=0;i<120;i++) genes[i]=i%2;

for (int i=0;i<120;i++) cout << genes[i];

cout << endl;

}

It should be clear that it will not be difficult at all to make the switch over to BITs from bools. Despite this great ease of use, it should be easier to write code that was previously written in inline assembly to perform direct hardware access and the like. Genetic algorithms in particular would benefit greatly from the use of BITs by reducing the memory requirements for each individual up to a factor of eight. In the example given, the memory occupied by the array genes was reduced from 120 bytes to 16 bytes.

In general, the addition of the BIT data type will have little effect on C++ programming in general, but for a few applications that are not altogether uncommon, it will allow significant improvements. In some cases, these improvements could make the difference of memory management not having to swap pages to disk, resulting in massive performance increases. Most of the time though, this would not be the case, and little or no change would be apparent. In other cases, assembly code could be avoided, greatly improving code readability and ease of programming. While it is true that bit fields could perform some similar function as BITs, large bit field arrays would result in hideous and inefficient code. BITs are much simpler to use and understand than the arcane bit fields. Long live the BIT!