next up previous
Next: Genetic Operators Up: No Title Previous: The Computational Medium

The Genetic Language

The simplest possible instantiation of a digital organism is a machine language program that codes for self-replication. In this case, the bit pattern that makes up the program is the body of the organism, and at the same time its complete genetic material. Therefore, the machine language defined by the CPU constitutes the genetic language of the digital organism.

It is worth noting at this point that the organic organism most comparable to this kind of digital organism is the hypothetical, and now extinct, RNA organism [6]. These were presumably nothing more than RNA molecules capable of catalyzing their own replication. What the supposed RNA organisms have in common with the simple digital organism is that a single molecule constitutes the body and the genetic information, and effects the replication. In the digital organism a single bit pattern performs all the same functions.

The use of machine code as a genetic system raises the problem of brittleness. It has generally been assumed by computer scientists that machine language programs can not be evolved because random alterations such as bit flips and recombinations will always produce inviable programs. It has been suggested [23] that overcoming this brittleness and ``Discovering how to make such self-replicating patterns more robust so that they evolve to increasingly more complex states is probably the central problem in the study of artificial life.''

The assumption that machine languages are too brittle to evolve is probably true, as a consequence of the fact that machine languages have not previously been designed to survive random alterations. However, recent experiments have shown that brittleness can be overcome by addressing the principal causes, and without fundamentally changing the structure of machine languages [71,78].

The first requirement for evolvability is graceful error handling. When code is being randomly altered, every possible meaningless or erroneous condition is likely to occur. The CPU should be designed to handle these conditions without crashing the system. The simplest solution is for the CPU to perform no operation when it meets these conditions, perhaps setting an error flag, and to proceed to the next instruction.

Due to random alterations of the bit patterns, all possible bit patterns are likely to occur. Therefore a good design is for all possible bit patterns to be interpretable as meaningful instructions by the CPU. For example in the Tierra system [71,72,73,74,77,78], a five bit instruction set was chosen, in which all thirty-two five bit patterns represent good machine instructions.

This approach (all bit patterns meaningful) also could imply a lack of syntax, in which each instruction stands alone, and need not occur in the company of other instructions. To the extent that the language includes syntax, where instructions must precede or follow one another in certain orders, random alterations are likely to destroy meaningful syntax thereby making the language more brittle. A certain amount of this kind of brittleness can be tolerated as long as syntax errors are also handled gracefully.

During the design of the first evolvable machine language [71], a standard machine language (Intel 80X86) was compared to the genetic language of organic life, to attempt to understand the difference between the two languages that might contribute to the brittleness of the former and the robustness of the latter. One of the outstanding differences noted was in the number of basic informational objects contained in the two.

The organic genetic language is written with an alphabet consisting of four different nucleotides. Groups of three nucleotides form sixty-four ``words'' (codons), which are translated into twenty amino-acids by the molecular machinery of the cell. The machine language is written with sequences of two voltages (bits) which we conceptually represent as ones and zeros. The number of bits that form a ``word'' (machine instruction) varies between machine architectures, and in some architectures is not constant. However, the number required generally ranges from sixteen to thirty-two. This means that there are from tens of thousands to billions of machine instruction bit patterns, which are translated into operations performed by the CPU.

The thousands or billions of bit patterns that code for machine instructions contrasts with the sixty four nucleotide patterns that code for amino acids. The sixty-four nucleotide patterns are degenerate, in that they code for only twenty amino-acids. Similarly, the machine codes are degenerate, in that there are at most hundreds rather than thousands or billions of machine operations.

The machine codes exhibit a massive degeneracy (with respect to actual operations) as a result of the inclusion of data into the bit patterns coding for the operations. For example, the add operation will take two operands, and produce as a result the sum of the two operands. While there may be only a single add operation, the instruction may come in several forms depending on where the values of the two operands come from, and where the resultant sum will be placed. Some forms of the add instruction allow the value(s) of the operand(s) to be specified in the bit pattern of the machine code.

The inclusion of numeric operands in the machine code is the primary cause of the huge degeneracy. If numeric operands are not allowed, the number of bit patterns required to specify the complete set of operations collapses to at most a few hundred.

While there is no empirical data to support it, it is suspected that the huge degeneracy of most machine languages may be a source of brittleness. The logic of this argument is that mutation causes random swapping among the fundamental informational objects, codons in the organic language, and machine instructions in the digital language. It seems more likely that meaningful results will be produced when swapping among sixty-four objects than when swapping among billions of objects.

The size of the machine instruction set can be made comparable to the number of codons simply by eliminating numeric operands embedded in the machine code. However, this change creates some new problems. Computer programs generally function by executing instructions located sequentially in memory. However, in order to loop or branch, they use instructions such as ``jump'' to cause execution to jump to some other part of the program. Since the locations of these jumps are usually fixed, the jump instruction will generally have the target address included as an operand embedded in the machine code.

By eliminating operands from the machine code, we generate the need for a new mechanism of addressing for jumps. To resolve this problem, an idea can be borrowed from molecular biology. We can ask the question: how do biological molecules address one another? Molecules do not specify the coordinates of the other molecules they interact with. Rather, they present shapes on their surfaces that are complementary to the shapes on the surfaces of the target molecules. The concept of complementarity in addressing can be introduced to machine languages by allowing the jump instruction to be followed by some bit pattern, and having execution jump to the nearest occurrence of the complementary bit pattern.

In the development of the Tierran language, two changes were introduced to the machine language to reduce brittleness: elimination of numeric operands from the code, and the use of complementary patterns to control addressing. The resulting language proved to be evolvable [71]. As a result, nothing was learned about evolvability, because only one language was tested, and it evolved. It is not known what features of the language enhance its evolvability, which detract, and which do not affect evolvability. Subsequently, three additional languages were tested and the four languages were found to vary in their patterns and degree of evolvability [78]. However, it is still not known how the features of the language affect its evolvability.



next up previous
Next: Genetic Operators Up: No Title Previous: The Computational Medium



Thomas S.Ray
Thu Aug 3 13:59:36 JST 1995