These new instructions provide support for network access, and for
cell division and differentiation, among other things.
struct NetAddr { N32u node; /* 32 bit IP address of the real machine */ N16u portnb; /* 16 bit port number for the Tierra socket */ }; typedef struct /* structure for IPMap array */ { NetAddr address; TPingData data; I32s PendReq; /* number of pending TPing requests */ } MapNode;The TPing part of the structure will be explained in the Sensory Mechanism: TPing below.
When a network Tierra process begins, it reads a ``MapFile'' containing the IP addresses and port numbers of all the known participating nodes on the Tierra network. It creates the initial IPMap array from this list. After executing each million instructions, each Tierra node sends a TPing message to each machine on its IPMap list. The target machines, respond by returning a TPing reply. As the replies come in, they are time stamped, in the ``Fresh'' field of the TPing data structure. If a machine has not responded for more than PendReqTime (default to 24 hours), it is assumed to have left the net, and is removed from the IPMap array.
If any kind of message is received from a machine not presently on the
IPMap list, it is added to the list. The MapFile on disk is also
kept up to date with respect to changes in the IPMap list.
{ I32s t; /* tag for type of message 0 reserved for free struct */ ComAddr s; /* source address */ ComAddr e; /* destination address */ ComData d; /* data */ } IOS;Messages directed specifically at that organism, such as TPing responses, are buffered there. Messages are processed in a first-in, first-out, order, and when the buffer is full, it beings writing over the oldest messages.
struct Event { I32s m; /* millions of instructions */ I32s i; /* individual instructions, modulus one million */ }; struct TPingData { I32u Fresh; I32s Speed; I32s NumCells; I32s SoupSize; I32u TransitTime; I32u Time; Event InstExec; Event InstExecConnect; I32s OS; };The components of this structure are listed below:
I32u Fresh; When a TPing data structure arrives, it receives a time stamp on the local machine. This aids in determing if the data is sufficiently fresh to be used reliably.
I32s Speed; This is the number Tierran virtual machine instructions per second executed, calculated during the last million instructions executed. Since the Tierra process runs as a low priority background process, the digital organisms only get left-over CPU cycles. This variable is a key measure of their energy supply.
I32s NumCells; This tells how many organisms there are on the node. This is relevant since the CPU cycles (Speed) and the SoupSize will have to be divided among this many creatures.
I32s SoupSize; This is the size in bytes, of the memory space of the ``soup'' on the node. This tells us how much room there is on the node to hold organisms.
I32u TransitTime; This measures the time in milliseconds, that it takes for the TPing request to cross the net and return. This gives a measure of the `` distance'' between nodes, in the sense that I have discussed it in my ruminations on the toplology of cyberspace.
I32u Time; This is currently implemented as the local 32 bit clock time. However, we intend to change it to the local 24 hour clock time. Creatures that choose to migrate around the planet staying on the dark side (or something like that) could use this kind of data.
Event InstExec; This measures how many virutal instructions have executed on the node since the Tierra process began, which is a measure of the age of the habitat island.
Event InstExecConnect; This measures how many virtual instructions have executed since the Tierra process began, while the node was connected to the net. The ratio: InstExecConnect/InstExec is a measure of the connectivity of the node. Some nodes, such as those that connect by modem, will be intermittently connected. Creatures may develop special adaptations for intermittently connected nodes, and these two parameters allow them to sense some relevant connectivity qualities.
I8s OS; This is a flag to identify the major Operating System types, in the event that creatures can differentiate them and develop niche specializations to the different features of different underlying OS types (questionalble).
Any time that ANY data is transferred between nodes (such as the
migration of a creature), the TPingData goes with it. TransitTime
can not be calculated on incoming data, as TransitTime measures
the round trip time. However, every incoming TPingData structure
results in the return of a TPingData structure to the source machine,
so that the TransitTime field can be filled on the source machine.
Some of these instructions send messages from one Tierra node to
another, based on the suggestion of an IP address in a register
of the virtual CPU. Because the suggested IP addresses might not
be valid, they are forced to match valid IP addresses from the
IPMap array. The match is based on the closest Hamming distance
(the minimum number of different bits).
Each CPU includes a pointer into the IPMap array, which is
incremented each time the getipp instruction is executed.
This pointer is initialized to a random location in the IPMap
array when the CPU is created. When the pointer increments past
the end of the array, it wraps around.
The ifsig instruction specifies a signal type by a value in a CPU register (specified in the opcode.map file). If the signal type specified in the register, matches to the t (tag) field of any IOS structure in the IOS buffer, the ifsig instruction is considered true, and the next instruction will be executed next. If there is no match of the register to a tag in the buffer, the ifsig instruction is considered false, and the next instruction will be skipped.
This means that, in the case of the TPing response,
ifsig will be true as long as there are unread TPing
responses in the buffer, but it will become false once the
TPing responses have been cleared from the buffer by the
execution of tpingr instructions. The
network ancestor uses ifsig to determine if it will
read a TPing response using the tpingr instruction.
When a organism receives a time slice, the size of the slice (number of
instructions to be executed) is determined. This is the SliceSize.
The actual number of cycles that an organism will execute during its time
slice will be SliceSize times the number of CPUs that it has.
Each CPU in the organism will execute SliceSize instructions during the
time slice. However, each CPU executes one instruction, in turn,
rather than having each CPU execute its entire slice, before allowing
the next CPU to execute.
Before 29-1-97, csync was an all-or-nothing synchronization. There was no mechanism to permit a sub-set of the CPUs to synchronize independently from the other CPUs of an organism. However, on 29-1-97 csync was enhanced to take the following form:
The csync instruction permits synchronization of groups of CPUs of an organism. If a CPU is not yet a member of a synchronization group, then the first time that it executes csync, it will create a new sync group. The sync group identity is inherited when new CPUs are created by the split instruction. The only way to add CPUs to a sync group is by splitting of the CPUs, and membership in a sync group is permanent for the life of the CPU. Once defined, the sync group identity of a CPU can not be changed.
Once it is already a member of a sync group, when a CPU
executes csync, it will stop executing and wait (loosing its time
slice(s)), until all of the CPUs of its sync group have also
executed csync. When all of the CPUs of the group have
executed csync, then they will all simultaneously begin
executing again. Individual sync groups disappear when all CPUs of
the group halt (or join).
Writing the network ancestor has long been the last major obstacle
to the start of the network Tierra experiment. It is not a trivial
problem, because the process of writing the network ancestor also
results in the final definition of the network Tierra assembler
language. New instructions will need to be created, and the details
of the register usage of existing instructions will be finalized.
In addition, once the network ancestor begins to run on the network,
many instructions and features of the Tierran machine language
that have been developed over a period of several years will receive
their first tests and will need to be debugged. In April 1996,
the pseudo-code for the network ancestor was written. In May the
pseudo-code was translated into Tierran assembler code.
The network ancestor first examines itself to determine where it begins, where it ends, and how large it is. The end address is found by searching for an "end template" at the end of the organism's executable code. After the beginning address is subtracted from this end address, the resulting size (320) is shifted left (multiplied by 2) to create a private data space following the code segment. This space will be used by the sensory cell(s) to store and compare TPing data from various nodes on the net. The cell then splits into two threads, which differentiate into different ``cell types'', by jumping to different parts of the code. One cell type specializes for reproduction, the other for sensing conditions on the net using the TPing mechanism.
The reproductive cell splits into four threads in order to parallelize the time-consuming process of copying the data from the mother to the daughter space. However, just before cell division, three of the four threads halt. The remaining reproductive cell copies the IP address of the node that is the current best recommendation from the sensory cell(s), from the soup data area to the relevant register of the CPU, and sends its daughter to that node. The daughter will only be placed on the local node if it compares favorably to other nodes and receives the best recommendation from the sensory cells.
The sensory cell remains as a single thread most of the time. It uses the getipp instruction to copy successive TPing data structures into its private data space from the IPMap array on the local node. The IPMap array contains the most recently arrived TPing data structures from all the nodes on the net that this local node is aware of.
The sensory cell keeps two TPing data structures in its private data space. The first structure at the beginning of the data space is placed there immediately after the differentiation of the sensory cell. All subsequent calls to getipp or tpingr place the TPing data into the space just following the first structure. Each time the sensory cell reads a new TPing data structure into this space, it subtracts the time-stamp value from the current time, with the difference being the age of the data. This age value is then shifted right several times (that is, successively divided by 2). If the result is zero, the data is considered to be fresh enough. If the data is not fresh enough, it sends a TPing request to that node to get more recent data, and it loops back to call getipp again to look at the data from a different node (while waiting for the previous TPing reply to return). However, each time, just before the getipp instruction, the sensory cell tests for a TPing reply signal. If there is such a signal, it indicates that a TPing reply is waiting in the buffer. In this case, rather than using getipp, it uses tpingr to receive the incoming TPing data.
After each new TPing data structure is written into the (second) data space, it is compared to the data in the first space. The network ancestor looks at the ratio: Speed/NumCells. This is a measure of the amount of energy resource available to each cell on that node. If the newly arrived TPing data has a higher Speed/NumCells ratio, it is copied into the first data space, replacing the TPing data that was written there previously. In this way, the first data space always contains the most recent "best" recommendation. It is from this space that the reproductive cell fetches the IP address just before executing the divide instruction, and the daughter is always sent to that IP address (which could be the local node).
A problem noted in early runs was that mutational degradation of the code that determines how fresh of TPing data is required, causes the algorithms to require increasingly fresh TPing data. This causes early evolution of excessive use of TPing sensory function. In order to resolve this problem, the Tierra virtual operating system was modified to TPing each machine in its MapFile, during the execution of each million instructions. This change effectively relieves the organisms themselves of the burden of keeping the TPing data up-to-date. So for the December test, the code of the ancestor was modified by commenting out the code used to test the freshness of the TPing data. The resulting new network ancestor is: 0492aaa . The instruction set used in this experiment is summarized in the opcode map.
Evolutionary network tests conducted through January 1997 indicated that the ancestral algorithms were not able to survive in competition with simpler algorithms using the random number generator to determine the IP address of the machine to migrate or send the daughter cell to. One possible explanation for the failure of this algorithm is that during the life-span of an individual organism, it is only able to examine and compare the TPing data from a very small number of machines on the net. The ancestral organism is forced to choose from this small set of machines. The January '97 ancestor was designed to permit the algorithm to compare more TPing data structures, by parallelizing the sensory thread. Where the previous ancestors had a sensory cell, the January '97 ancestor has a sensory organ, or sensory tissue.
The January '97 ancestor includes some other innovations, in addition to the sensory organ. An overview of the algorithm follows.
Previously, the development of a cell into a tissue has been implemented through a series of in-line split instructions. In the hopes of enhancing the evolvability of the developmental process, all tissue developments are now implemented by the following function call where the number of cells to split into is specified by a value in the cx register:
; Split Function, split into cx CPUs (call and return) ; Required Start Values when calling: ; cx = # of CPUs to split into ; dx will be set to CPU # nop1 ; split function template 10110 nop0 ; split function template 10110 nop1 ; split function template 10110 nop1 ; split function template 10110 nop0 ; split function template 10110 zeroD ; dx = 0, prepare to enter split loop nop1 ; split loop template 1 shr ; cx = cx / 2 half number of CPUs ifz ; if cx == 0 perform next instruction, otherwise skip it ret ; return to calling code split ; 1 -> 2 CPUs jmpb ; jump to top of split loop nop0 ; split loop complement 0
The January '97 ancestor algorithm begins by examining itself by looking at the beginning and ending templates, and subtracting the two addresses, resulting in a size of 320 bytes.
After the self-exam, the algorithm splits into two threads which differentiate into different tissues. Note that splits involving differentiation are not handled by the split function:
; Split into copy and ping threads: zeroD ; dx = 0 split ; 1 -> 2 CPUs pushD ; push dx onto stack popC ; pop cx from stack, processor # ifz ; if cx == 0 perform next instruction, otherwise skip it jmpo ; jump to template below (reproduction code) nop0 ; reproduction code complement 01010 nop1 ; reproduction code complement 01010 nop0 ; reproduction code complement 01010 nop1 ; reproduction code complement 01010 nop0 ; reproduction code complement 01010 jmpo ; jump to template below (ping code) nop0 ; ping code complement 00111 nop0 ; ping code complement 00111 nop1 ; ping code complement 00111 nop1 ; ping code complement 00111 nop1 ; ping code complement 00111 ifz ; dummy to separate templates
The reproduction thread begins by allocating a space for the daughter cell. The algorithm has 320 bytes of executable code, but requires 512 bytes for data. Therefore, the size calculation is multiplied by three, and 960 bytes are allocated for the daughter. The copy procedure is then called, with a specification to use two CPUs to copy the genome:
; call copy procedure (variable CPU version): ; this procedure will copy cx bytes ; starting from [bx] (source) ; starting to [ax] (destination) ; using dx CPUs
The copy procedure causes the copy thread to split into a tissue of cx (two) CPUs to copy the genome. The procedure divides the total number of bytes by the number of CPUs, to calculate the number of bytes that each CPU will copy. The development into multiple threads is effected by a call to the split function. After splitting into a tissue of two CPUs, each CPU calculates the offset of the source and destination data that it will copy:
; ax = start of destination ; bx = start of source ; dx = used by split to specify the CPU number ; divide the work up among the CPUs popC ; cx = # bytes/CPU to copy offAACD ; ax = ax + (cx * dx) offBBCD ; bx = bx + (cx * dx)
Each CPU copies its share of bytes, then, all but the zero CPU halt. The remaining CPU (0) then returns from the copy procedure. At this point the reproduction algorithm has collapsed back into a single thread (cell).
After the genome has been copied from mother to daughter, the IP address of the latest recommendation from the sensory algorithm is moved from the data segment into the cx register, and the divide instruction is executed. This causes the daughter cell to be sent to the machine with the specified IP address, and the daughters code in the local soup is randomized, and the daughter's memory allocation is freed. The reproduction algorithm then loops back to the point of calculating three times the size of the code segment, the allocation of the memory for the daughter, etc, in an infinite loop.
The sensory tissue uses an algorithm which should be able to scale to any number of cells (limited to powers of 2). The January '97 ancestor uses eight cells in the sensory tissue. They develop in the following fashion (where the numbers correspond to the CPU # which is set by the split instruction in the dx register):
; Development of Ping CPUs: ; 0 --> 0 --> 0 --> 0 ; | | \-> 1 ; | \-> 1 --> 2 ; | \-> 3 ; \-> 1 --> 2 --> 4 ; | \-> 5 ; \-> 3 --> 6 ; \-> 7
Before forming the tissue, the original cell of the sensory algorithm uses the getipp instruction to get the seed TPing data structure. After formation, all cells of the tissue calculate the offset of their private data area. Then, all cells of the tissue (except CPU 0), use getipp to get a TPing data structure into their private data area. Note that having CPU 0 not get a TPing data structure at this point, allows the algorithm to remember the best data from the previous loop(s) through this code.
With eight sets of TPing data collected, the CPUs begin the process of pairwise comparison of the structures. The data is compared on the basis of the ratio: Speed/NumCells, where Speed is the speed of the Tierran virtual CPU in instruction per second on some machine on the net, and NumCells is the number of digital organisms on that machine. If the higher numbered CPU has a higher ratio, its data is copied into the data area of the lower numbered CPU:
; Comparison of Ping data, halting of Ping CPUs: ; * ; 0 0 --> 0 --> 0 --> 0 ; 64 1 -/ | | ; 128 2 --> 2 -/ | ; 192 3 -/ | ; 256 4 --> 4 --> 4 -/ ; 320 5 -/ | ; 384 6 --> 6 -/ ; 448 7 -/ ; * Offsets into data segment of Ping CPUs private TPing data, ; requires 512 bytes for TPing data
After the data is compared and conditionally copied, the higher numbered CPU halts. The comparison of the data and halting of the CPUs repeats in a loop until a single CPU is left, containing the best data from the eight. Because the copying of the data is conditional, the various CPUs must synchronize each time through the loop, using the csync instruction.
The conditional copying of the TPing data is effected by calling the same copy procedure used for replicating the genome. Only the first 32 bytes of the 64 byte TPing data structure are copied (as only that part of the data is needed). Eight CPUs are used by the copy procedure to copy the 32 bytes of TPing data.
After having selected the best of the eight TPing data structures,
the single remaining sensory CPU loops to the top of the sensory
code where it splits into eight CPUs, and gets eight more TPing
data structures again. This is an infinite loop.
Starting on May 15, the network ancestor ran in a four-node Tierra network for five days, with mutation turned off. It was noted that the TPing perceptual system was not working properly. The getipp instruction was always returning data from the same node rather than from successive nodes.
In the week of May 20, the new network ready source code, Tierra V5.0, along with the new network ancestor was distributed to the Tierra developers. On May 27, another test was started on four nodes but this time with mutation turned on. It quickly crashed and burned with segmentation violations. With mutation on, the full instruction set received a full test, and several of the new untested instructions revealed faults. These bugs were patched over the period of the next two weeks.
It was also discovered that the problem with the getipp
instruction was not in the C support code, but in the assembler
code of the network ancestor itself. With all bugs patched, the
system appeared to be fully functional and additional testing
with mutation turned on began in the week of June 10. These tests
revealed that even on a local area net, more that 90% of the migrating
creatures were lost in the net, and never arrived at the target
node. This unacceptably high loss rate caused a very strong selection
pressure against migration, and it could be seen that creatures
rapidly evolved to reproduce only on the local node, and not migrate
over the net. This problem and its resolution are discussed in the
section on
Transport.
Because the network ancestor chooses where to send its daughter based on a comparison of various nodes on the net, new conditions are created that had not been seen in the non-network version of Tierra. Most significantly, it is possible for a node to have no creatures born locally, even though it may be inhabited by living cells. This created some problems with our observational system causing divide by zero errors, which were easily fixed. However, another problem with nodes where no creatures are born locally is that the creatures living on those nodes will gradually age due to the accumulation of mutations, and they will cease to function. The existing configuration of Tierra would allow these non-functional cells to remain indefinitely, since the lack of local reproduction would prevent those cells from being replaced by the (non-existent) newborn generation. This was considered undesirable because the Speed/NumCells ratio for this node would not reflect the reality that the number of functional cells would eventually go to zero, making this a relatively desirable node.
For this reason, the parameters LazyTol, RepInst, and ce->d.repinst were introduced. ce->d.repinst counts how many instructions this cell has executed since its last reproduction. RepInst is a rough average of the number of instructions executed per reproduction, for all cells in the soup. LazyTol is a start-up parameter that sets the tolerance for lazy (non-reproductive) behavior. It triggers the following code:
while (InstExe.m && ce->d.repinst > RepInst * LazyTol) ReapCell(ce);
In its current configuration, network Tierra polls for incoming messages between virtual CPU time slices to creatures. But we have found that the amount of network traffic is high and evidently the buffers are overflowing. This is very likely the main cause of packet loss in our tests on local area nets.
By monitoring the numbers of different messages being sent and received, we discovered that the TPing request was being overused. The volume of TPing traffic starts out at reasonable level, but then grows by about two or three orders of magnitude in the course of early evolution. It then begins to decline as the creatures evolve to stop migration.
The network ancestor reviews the data in the array of TPing structures stored locally on each node and fetches successive structures with the getipp instruction. It then subtracts the timestamp in the TPing data from the current time. The value of this computed time difference is then shifted right several times. If at the end of these shift operations the value has gone to zero, the data is accepted as sufficiently fresh and is used. However, if the shifted value is non-zero, the data is judged to be too old, and a TPing request is sent to retrieve a newer copy.
The acceptance of "old" data depends on the integrity of these successive shift operations in the genetic code. If a mutation disrupts this series of operations, the creature can demand more and more recent data, and will consequently send TPing requests more and more frequently.
In order to inhibit the evolution of stupid mutants which make excessive use of the TPing instruction, two modifications have been made. When each creature receives a time slice, the CPU time slice is counted down as each virtual machine instruction is executed. Tierra has always provided a mechanism to independently set the cost (in CPU cycles) of each instruction. However, this feature has never been used; until now, all Tierran instructions have been implemented as costing a single CPU cycle. To inhibit excessive use of the TPing instruction, its cost has been raised to ten CPU cycles. In addition, it is now considered to be a fatal error if a TPing request is sent while there is an unread TPing reply in the buffer.
In an additional effort to reduce network traffic, the TPing-send, surf, and divide-eject instructions have all been modified so that if the target node is actually the local node, then the data does not go out on the net, but is handled locally.
At the Second Tierra Workshop,
Manor Askenazi suggested
that a method of enhancing selection for network savvy behavior
and against small fast creatures that ignore the net
would be to have an occasional "catastrophe" kill all
the creatures on a node. Because the catastrophes are only local,
the devastated node can be repopulated by immigrants coming over
the net. This process has the effect of absolutely selecting against
creatures who ignore the net.
This suggestion has been taken, and on June 14 an Apocalypse function
was developed that periodically kills all creatures on the local
node.
When the Apocalypse function was first implemented, Tierra did not recover properly after the Apocalypse. Creatures immigrated from the net to the devastated node. However, the system was not properly initialized for the first creature to enter the now-empty soup. Some problems appeared with the Tierran memory allocation system and the reaper and slicer queues. It took a week to resolve this problem because there were many different pieces of code that were failing. The problems finally appeared to be resolved on June 21.
The test to determine if the Apocalypse function is finally stable, "evolved'' into a little network experiment. I let it run for a week (from 21-6-96 until 27-6-96), using a four-node local network. To thoroughly test the Apocalypse function, it was set to cause the Apocalypse every million instructions on one node, and every twenty million on the other three nodes.
This little net was seeded with the network ancestor, which is 640 bytes long (although only 320 bytes are executable code; the remaining 320 bytes are data space). Throughout the week, all four nodes have were completely dominated by 640 byte creatures. Other sizes are appearing regularly, but as far as I can tell, none of them have established stable populations. The four nodes have executed the following numbers of instructions and generations:
Instructions (in millions) Generations 599 2579 1248 3717 1412 3918 1803 5159
Previous runs with other versions of Tierra have always shown both diversification and change in the dominant size classes in much less evolutionary time. However, work reported in: Ray, T. S. 1994. Evolution, complexity, entropy, and artificial reality. Physica D 75: 239-263. has shown that the pattern of evolution is very sensitive to the underlying genetic language the instruction set. Of the four languages compared in that study, two showed a lot of evolution, while two evidently had a low evolvability and showed little evolution. It appears that the language that we are currently using has a very low evolvability.
We have not yet done sequence comparisons on the programs in the current runs of the network version of Tierra. It is possible, though unlikely, that there is a lot of evolution within the dominant size class. In one of the four languages described in the Physica D paper, the dominant size class remained stable for long periods and only changed in sudden jumps, never gradually. When I analyzed the sequences in those periods of size stability, however, I found that there was a very large diversification of sequences and a tremendous divergence from the starting sequence, even when the total size was not changing. It is possible (though again unlikely) that the current experiment is evolving by a similar pattern, with evolution only of the sequence, but not the size. It still seems to me more likely that we are using an instruction set of very low evolvability. This result is not unexpected, and is a large part of the reason that we expect to spend a year in testing and development before the system is ready for public release.
As soon as we have a thorough documentation for the Tierra instruction
sets, we will extend an open invitation for anyone to design network
ancestors for testing in the network environment. We will compare
the evolvability of these, and select the most evolvable as the
basis for the larger scale network Tierra project.
In another simple test for evolvability, the network ancestor
was run in a single-node network with no Apocalypse, and selection
strongly favoring size reduction (uniform random variate slice
size). At about 20 million instructions into this run, the predominant
size shifted from 640 to 384 bytes. It is interesting that the
original algorithm has 320 bytes of code and 320 bytes of data
space. The data space is used to hold two 64 byte TPing structures,
leaving 192 bytes of unused space. We suspect that the 384 byte
creature may have 320 bytes of code and space for one TPing structure.
This test showed that the system is capable of some evolution.
However, after the transition at 20 million instructions from
640 to 384 byte organisms, we have seen no further change in the
average size, even though the simulation has run for 72 million
instructions. Again, it appears that our current instruction set
has a very low evolvability.
On July 2 the above test (with a uniform random variate slice size, which strongly favors optimization to smaller sizes) was repeated in a four-node network installation. These nodes are experiencing Apocalypse at a frequency of one per 20 million instructions. At about 20 million instructions in time, the predominant size shifted from 640 to 512 bytes. The size then remained stable at 512 bytes until about 500 million instructions, when the size began to wander widely, and for prolonged periods. However, the size always returned eventually to 512 bytes.
Some brief observations made with the Beagle tool at the 640 million instruction mark showed the diversity of size classes was substantially higher than was observed in the previous long run. Some significant invasion of other size classes were noted. For example, there was an invasion of 136 byte creatures, which reached a population of over 50 individuals, and then quickly went extinct. A little later there was an invasion of 404 byte creatures who achieved a population of 40 individuals before they too quickly died.
At a little past 800 million instructions, there was an invasion of 256 byte algorithms which effectively came to dominate, displacing the 512 byte creatures. The 256 byte algorithms dominated until the experiment was terminated a little short of two billion instructions. A brief examination of a successful 256 byte creature showed the ordering of the instructions to be very different from the ancestor.
These observations suggest that this instruction set is more evolvable than first thought, but that there is unusual behavior with respect to changes in size.
I observed the recovery from the apocalypse at 660 million on one of the nodes. This recovery took a very long time. Creatures of various sizes frequently appeared, presumably coming from the other nodes, but they were not able to establish populations and fill the soup; they kept going extinct. Here are some excepts from the log showing the difficulty:
ie = instructions executed in millions gn = generations elapsed ti = clock time nc = number of adult cells sp = speed of processor, in virtual instructions executed per second ie660 gn3947 ti836358677 Wed Jul 3 10:51:17 1996 nc237 sp9900 ie661 gn4137 ti836359746 Wed Jul 3 11:09:06 1996 nc2 sp935 ie662 gn4256 ti836361252 Wed Jul 3 11:34:12 1996 nc10 sp664 ie663 gn4259 ti836361373 Wed Jul 3 11:36:13 1996 nc71 sp8264 ie664 gn4261 ti836361577 Wed Jul 3 11:39:37 1996 nc89 sp4901 ie665 gn4264 ti836361678 Wed Jul 3 11:41:18 1996 nc80 sp9900 ie666 gn4267 ti836361805 Wed Jul 3 11:43:25 1996 nc110 sp7874 ie667 gn4269 ti836361936 Wed Jul 3 11:45:36 1996 nc57 sp7633 ie668 gn4273 ti836362045 Wed Jul 3 11:47:25 1996 nc55 sp9174 ie669 gn4276 ti836362186 Wed Jul 3 11:49:46 1996 nc147 sp7092 ie670 gn4278 ti836362293 Wed Jul 3 11:51:33 1996 nc224 sp9345
At the time of the apocalypse (10:51), there were 237 cells, and
the processor was executing 9900 virtual instructions per second.
The soup had not recovered from the apocalypse at the time of
the next million, which took eighteen minutes of real time. By
11:34, the soup was effectively colonized, and began partially filling
with creatures. However, even this did not immediately lead to the
filling of the soup. Altogether, it took about one hour for the
soup to fill completely again. I think that on a four-node network,
there is a real danger of Apocalypses occurring close together
in time on all four nodes and effectively causing a global extinction.
However, this very slow recovery seems to not be typical. On other
occassions, I have seen the recovery to be essentially immediate.
dec4 ; decrement cx (size) by 4, cx -= 4 movii4 ; move contents of [bx] to [ax] (copy four instructions)became:
dec ; decrement cx (size) by 1, cx -= 1 movii ; move contents of [bx] to [ax] (copy one instruction)In this new experiment, there is a much higher size diversity at any moment. For example, a snapshot of one node show the follwoing size distribution:
Size Frequency 346 424 90 170 88 53 344 27 370 6 352 4 356 2 358 1 566 1 others 8Average size appears to be changing both more gradually, and more wildly. In the previous experiment, the copy procedure of the ancestor used four CPUs, and a four byte movii instruction. This suggests that size would change most easily in increments of sixteen bytes. However, the size changed only in much larger increments. In the last experiment, the size evolved immediately to predominantly powers of two (512 and 256). In this experiment, still using four CPUs, but moving only one byte at a time, size does not seem to be restricted in any way. It appears to be able to easily assume any arbitrary value. The predominant sizes are not even multiples of four.
In the first 800 million instructions of the run, the sizes changed gradually and wildly. Since then, size has remained very stable, first at 360 bytes (from about 800 to 1,700 million), then at 272 bytes (from about 1,700 to 2,600 million).
The first network tests revealed that even on a local area net, more that 90% of the migrating creatures were lost in the net, and never arrived at the target node. This unacceptably high loss rate caused a very strong selection pressure against migration, and it could be seen that creatures rapidly evolved to reproduce only on the local node, and not migrate over the net. The work described in this Transport section was conducted by Joseph Hart.
There are several reasons for the high rate of packet loss observed in the tests. We are using UDP sockets, which are the least reliable means of communications: they provide no guarantee of delivery. We selected this protocol because it is the most lightweight, and fastest of the IP protocols. It also does not require a "wait for a confirmation" like the more reliable protocols (such as TCP). However, we have also found that the maximum size of packet that we can send by UDP is dependent on the path, and can be as small as approximately 1500 bytes. When the network ancestor is fully packaged, it is 2616 bytes long. Therefore we have found that the network ancestor is able to travel on local area nets, but not between Japan and the USA.
At the First Tierra Workshop, it was proposed that the socket communications be separated into an independent process, so that the polling for packets would not slow down Tierra, and conversely the attention to other activities would not interfere with the polling. However, this proposal was never implemented.
As a first attempt to resolve the transport problem, the communications functions were split off into a separate process, as had been recommended by the First Tierra Workshop, and the unreliable UDP protocols were replaced with TCP. However, these changes had disasterous consequences on the speed of the system. We reasoned that the requirement of TCP for acknowledgement was causing traffic jams.
Rather than attempt to unravel the TCP traffic jam, we decided to reimplement UDP, and merge the two processes back into one. However, we arranged that large messages (presumably genomes) would be broken into several packets, shipped separately, and reassembled at the other end. This breaking up and reassembling of packets is usually performed by TCP. However, we prefer to do it ourselves to avoid the overhead of acknowledgement.
We also made two other changes to improve performance. Previously, XDR had packaged Tierran instructions as 32 bit integers, even though they only occupy a single byte. This caused genomes on the net to be four times as large as they needed to be. Joseph found a way to package them economically.
In addition, the single process Tierra only checks for incoming packets between ``time slices''. A time slice is when a single ``cell'' is being allocated CPU cycles by the virtual operating system. It is a common practice in Tierra to make the time slice proportional (or equal) to the size of the genome. In the original Tierra, the genome, and thus time slices were eighty bytes/cycles. However, the network ancestor has a size of roughly 640 bytes. In addition, it is multi-threaded. If it has five CPUs, then the time slice would be 3200 cycles, fourty times as long as in the original Tierra.
Before the changes described here, Tierra serviced only a single incoming packet between time slices. This, combined with the longer time slices, generated a large packet backlog. During the time slice, while Tierra is not servicing communications, the pending packets were buffered by the UDP protocols, which provide a fairly small buffer. The combination of these factors was no doubt a major cause of packet loss.
Now, the arrival of a packet causes an interrupt, causing the Tierra system to buffer the packet. The packets are then all cleared from the Tierra buffer between time slices. These changes were completed by Joseph Hart in October 1996, and have effectively solved the transport problem.
During the experimentation with various transport mechanisms,
Joseph created and API for the communications protocols, so that
it is now easy to substitute new communications mechanism in Tierra.
The central problem of the Tierra experiment is to find the conditions under which evolution can generate complexity. One primary consideration is to have a highly evolvable genetic language. In the section Evolvability and the Instruction Set above, I discussed the relationship between the structure of the genetic language and its evolvability.
Another factor affecting the evolvability of any given genetic language are the genetic operators that it interacts with. The evolvability of a genetic language is not determined by its structure alone, but also by the nature of the genetic operators, and the interaction between the two.
I have greatly admired the work of Karl Sims and John Koza. Their systems have shown a very high level of evolvability. I believe that the success of these systems owes in large part to the power of their genetic operators, or the interaction of the operators with the structure of the language.
The Genetic Programming (GP) of John Koza and the Genetic Images of Karl Sims both use genetic languages based on Lisp trees. The genetic operators manipulate the Lisp trees by replacing nodes in the trees (mutation), or by swapping nodes along with all their descendant branches between trees (cross-over).
The cross-over operation on Lips trees causes entire (perhaps coherent) sections of code to be moved around between genomes. Contrast this with the genetic operators of Tierra (until now) which do nothing more than flip bits in the linear genome. Although some haphazard forms of sex have emerged in Tierra that result in movements of chunks of code, these are not a part of the genetic operators applied by the Tierra system, and do not occur reliably.
On the other hand, there are some features of the structure of the Lisp language which make it less powerful than the (relatively) conventional machine languages on which Tierra is based. It is not natural for Lisp languages to support looping and function (sub-routine) calls. One of the most powerful innovations that Koza has discovered in his system involves introducing a provision for automatically defined functions.
However, looping and function calls are a natural part of the machine languages on which Tierra is based. Therefore I have given some thought as to how to have the best of both worlds: the power of genetic operators like those used in genetic programming, and the natural directed graph structure of machine languages supporting looping and function calls. Given that Tierra already has a natural directed graph structure, it remains only to enhance the power of the genetic operators. Towards this end, new genetic operators have been created for Tierra. Details can be seen in the source code implementing these new genetic operators.
Finally, it is worth remembering that there is no theory of evolvability. Everyone who thinks about evolvability issues has intuitions about what makes a system evolvable. However, these intuitions should not be taken for anything more than they are. There is no theoretical nor empirical basis for them. This is a very sad state of affairs. My efforts to enhance evolvability through genetic operators is based on intuition not science.
The new genetic operators include insertion, deletion and cross-over operations, and so represent a change in the philosophy towards this kind of genetic operation. These operations are being imposed by the Tierra operating system, and are not under the control of the creatures themselves.
The reason for the shift is that the primary goal and focus of the work is now the attainment of complexity increase. This is a very big, and probably very difficult objective. For this reason I am willing to try anything that might aid in achieving the goal. Among the rewards of achieving the goal, should be creatures that do many more things for themselves, which should make the imposed genetic operators seem to be a fairly trivial sin.
In order for the genetic operators to operate on genetic information, it is helpful to be able know where the genetic code is located within the cell space, which now also contains data. Towards this end, some new information has been added to the Dem structure associated with each cell:
typedef struct /* structure for allocated memory block of cell */ { I32s p; /* location of start of cell memory */ I32s s; /* size of cell memory */ } Mem; typedef struct /* structure for demographic data */ { Genotype gen; /* size and genotype name of cell */ ... Mem mg; /* main cell genetic memory */ I32s MovOffMin; /* minimun offset into daughter movii-ed to */ I32s MovOffMax; /* maximum offset into daughter movii-ed to */ ... } Dem;
As the mother cell writes (presumably genetic) code into the space of the daughter cell, it keeps track of the minimum and maximum offsets that it has written to in the daughter cell. At birth, this data is incorporated into the "Mem mg;" structure of the cell, and becomes a part of the genetic identity of the cell. For this reason, two creatures that have exactly the same instruction sequence, of the same length, but offset by a different amount into the space of the daughter cell, would be classed as different genotypes.
This genetic memory definition is not an abolute method of determining what memory is genetic and what memory is data. It would be possible for a creature to have genetic code and data interspersed in several blocks within the cell. In that case, our definition would fail. However, our definition is now in place and has become a part of the world in which the creatures live and must adapt.
The information on the location of genetic memory is used by the genebanker in determining what code to sequence when comparing the genomes of two creatures. In this context, the information only affects our observations, but not the course of evolution in the world of Tierra.
However, the information on the location of the genetic memory is also used by the new genetic operators, in order to determine what memory will be subjected to genetic operations. In this context, our definition of what is genetic code and what is data does affect the course of evolution.
The copy mutation affects only the data being moved by the movii instruction, presumably from mother to daughter. This insures that there is some genetic variation among the daughters. Like the background mutation, the copy mutation only affects the bits used by the Tierran machine instructions (e.g., the low-order five bits).
Within the first year, a third source of noise was added to Tierra,
the flaw. The flaw is not genetic in nature, but it often has
genetic consequences. It is controlled by the following variable
in the soup_in file:
GenPerFlaw = 32 flaw control by generations
Each time an instruction is executed, it is
possible for the result to contain an error of plus or minus one.
For example, when two numbers are added, the result placed in the
destination register is always correct, unless it is affected by
a flaw. In the case of a flaw, the result may differ from the
correct result by +1 or -1. As another example, when the bits in
a register are shifted left by one place, if they are affected by
a flaw, they would be shifted two places or not at all. In another
example, when data is moved to a register, if it is affected by a
flaw, it will go to a neighboring register rather than the destination
register.
On way that the flaws have genetic consequences is when they alter the incrementing of the source and destination registers used by movii in copying data from mother to daughter. This can cause single instructions to be inserted or deleted.
Operators that don't change the size Mutations: MutationOps() bit flip: mut_site() random replacement: mut_site() Crossover: CrossoverInstSamSiz() Operators that change the size: Performed at Segment boundaries: Crossover: CrossoverSeg() Insertion: InsertionSeg() Deletion: DeletionSeg() Performed at instruct boundaries: Crossover: CrossoverInst() Insertion: InsertionInst() Deletion: DeletionInst()Here I will briefly describe each of the new operators:
s[0] ^= (1 << (tirand() % (I16s) InstBitNum));where s[0] is the instruction to be mutated. The mut_site() induced bit flip affects only those bits used by the instruction set (e.g., the low-order five bits in the case of the original instruction set, where InstBitNum = 5).
One feature of implementing mutations by means of bit flips is that it creates a mutation-connectivity between the instructions of the set, depending on the mapping of instructions to the numeric opcodes. The single-bit flip mutation can only convert an instruction into the other instructions that differ by a single bit (hamming distance one). In the original instructions set, each instruction can be mutated only into five other instructions. This means that the mapping of instructions to opcodes (determined by the opcode.map file) is important. However, it is not clear how this mapping should be done, or what kinds of mappings have the highest evolvability.
s[0] = tirand() % (I16s) InstNum;This means that the mutated instruction will be replaced by any other instruction in the set, with equal probability (e.g., any one of thrity-two instructions in the case of the original instruction set, where InstNum = 32).
The random replacement form of mutation was introduced to eliminate the dependency of the properties of mutation on the mapping of instructions to opcodes (see above). It is considered to be a more powerful kind of mutation.
MateChunkSize = 1 + tlrand() % MateGenSize;Then the location of this chunk of code from within the mate genome is also determined as a uniform random variate:
MateChunkStart = 1 + tlrand() % (MateGenSize - MateChunkSize + 1);This chunk of code from the mate is then inserted into the daughter genome at the randomly selected insertion point.
DelSiz = 1 + (tlrand() % (ODaughtGenSize / 2));The offset of this chunk into the daughter genome is also a uniform random variate:
DelOff = tlrand() % (ODaughtGenSize - DelSiz + 1);
DCrossSeg = 2 + (tlrand() % (ODaughtNumSegs - 1)); /* DCrossSeg marks the point at the beginning of segment N */ /* acceptable cross points are segments 2 thru N */ MCrossSeg = 2 + (tlrand() % (MateNumSegs - 1));The locations of the selected segment boundaries are then determined, and the cross-over occurs at those locations. The decision to use the left or right portion of the daughter genome is based on preserving the larger number of daughter segments, not the larger number of daughter instructions:
if (2 * DCrossSeg > ODaughtNumSegs) /* use left chunk of daughter */ else /* use right chunk of daughter */If we use the left chunk of the daughter, then we use the right chunk of the mate, and vice versa.
DCrossSeg = 1 + (tlrand() % (ODaughtNumSegs + 1));The chunk of code to be donated by the mate is determined first as a uniform random variate of the number of segments in the mate:
MChunkNumSegs = 1 + (tlrand() % MateNumSegs);and then the offset of this chunk is determined as a uniform random variate:
MChunkStartSeg = 1 + tlrand() % (MateNumSegs - MChunkNumSegs + 1);Then the actual locations of the start of the chunk is determined, as well as its size in bytes. This chunk of code is then inserted into the daughter genome at the randomly selected segment boundary.
DelNumSeg = 1 + tlrand() % (ODaughtNumSegs / 2);The offset of this chunk into the daughter genome is also randomly determined:
DelOffSeg = 1 + tlrand() % (ODaughtNumSegs - DelNumSeg + 1);The actual location and size of this chunk is then determined, and it is deleted from the daughter genome.
The ancestor in the November 1996 experiment differentiates into two cell types. One cell type executes only the code for replication (allocating memory space for the daughter, copying the genome from mother to daughter, and spawning the daughter as an independent process). The other cell type executes the sensory and decision making code. This code uses the TPing data to compare various nodes on the net, and recommends the best looking node to the reproductive threads, so that the daughter can be sent to that node.
This was the first test in which the network Tierra system was fully functional, and in which it was worthwhile to observe evolution. The ancestor used in this test was based on the algorithm described in The Creation of the First Network Ancestor. However this version of the ancestor used two CPUs for the copy procedure rather than four, and moved data a single byte at a time, rather than four bytes at a time. In this run, each organism was limited to a maximum of sixteen CPUs.
This experiment ran for about four days. Organisms which were common in the net were captured at the middle and end of the experiment.
An organism sampled from the middle of experiment, 5528aab, had simplified the perceptual algorithm. Rather than comparing the Speed / Cell values of the different nodes, they simply stepped through the IPMap list, placing the IP address of successive nodes into the register used by the divide instruction. This essentially has the effect of directing the daughters to be sent to random locations within the net. Although this algorithm became simplified, the organism as a whole still retained the differentiation into two cell types. The level of parallelism in the reproductive threads increased from two to eight CPUs.
There is a strong selective pressure for increased parallelism in the reproduction code. In the experiments conducted together with Kurt Thearling ( Thearling, Kurt, and Ray, T. S. 1994), the replicators increased to 16 or 32 CPUs when the genomes were only aroud 60 - 80 bytes in length. It would be expected that the larger genomes used in this experiment would select for greater parallelism. However, we had placed a hard limit of sixteen CPUs per organism in this run. Since the number of CPUs used in the reproductive code is normally a power of 2, the need to have even one CPU in the perceptual system limited the number of reproductive CPUs to just eight. Giving up the single perceptual CPU could permit the number of reproductive CPUs to double from eight to sixteen. This could be a fairly strong selective pressure against maintaining the perceptual system, with this low CPU limit.
An organism sampled from the end of the experiment, 2444aaa, had degenerated into a single cell type. The perceptual thread was lost completely. The reproductive thread had increased its parallelism to sixteen CPUs. These organisms produced a daughter who was placed on the local node, then they migrated to a node whose IP address most closely matched (by hamming distance) to the value zero. It seems logical that the Apocalypse should eliminate such a strategy from the net, therefore there remains some doubt as to whether this was actually the predominant strategy at the end of the run.
De-differentiation from two cell types to a single cell type can be considered to be a negative result for the network experiment.
Some other observations are worth noting. For this experiment, the only mechanism for causing Tierra to keep out of the way of the users was to give the process a low priority, via Nice -19. We have a lot of experience with running the network Tierra process on our machines in this way, and we find that there is no noticable degredation of performance.
However, two users wrote to us and asked us to remove the Tierra process from their machine. We suspect that this may happen when users who run load meters on their machines, notice that the CPU is remaining fully loaded. These users may become alarmed and seek to terminate the process causing the high load, even if there is no effect on the performance of the machine.
In addition, at three of the sites, there was periodically a major problem with the Tierra process being disconnected at the remote end. We believe that this may have been due to having a telnet session for each Tierra process. It is likely that telnet sessions are not stable for long periods over such long distances.
In preparation for the December test, we studied the TPing data closely and found that there were bugs in the system that caused this data to be unreliable. This means that the ability of the organisms to perceive relevant conditions on the net was compromised. With unreliable perceptual information, the perceptual - decision making system no doubt could not be maintained by selection. Before the December test, several bugs in the perceptual system were fixed.
Two additional measures were taken to make the Tierra network more stealthy, in order to avoid annoying users. Rather than using telnet sessions for each machine, we started the Tierra process as a background process, and logged off. In addition, we implemented a system that watches keyboard and mouse hits, and causes the Tierra process to sleep (twenty minutes in this run) after each hit. It is expected that this sleep behavior will increase the heterogeneity of CPU cycle availability, and thus the selective forces for cell differentiation.
An additional problem noted in earlier runs was that mutational degradation of the code that determines how fresh of TPing data is required, causes the algorithms to require increasingly fresh TPing data. This causes early evolution of excessive use of TPing sensory function. In order to resolve this problem, the Tierra virtual operating system was modified to TPing each machine in its MapFile, after executing each million instructions. This change effectively relieves the organisms themselves of the burden of keeping the TPing data up-to-date. So for the December test, the code of the ancestor was modified by commenting out the code used to test the freshness of the TPing data. The resulting new network ancestor is: 0492aaa. The instruction set used in this experiment is summarized in the opcode map.
In addition to automatically maintaining the freshness of the TPing data, the new version keeps a record of the number of pending but unanswered TPing requests to each node. When this number reaches PendReqMax (set to 3 in this run), the node is assumed to be sleeping, and the Speed value for that node is set to zero. Also, if the elapsed time since the last TPing data was received exceeds PendReqTime (set to 24 hours in this run), then the node is assumed to have terminated the Tierra process, so the node is removed from the list of participating nodes.
The test involved about 140 participating machines, located at ATR in Japan, Aizu University in Japan, The University of Delaware, The Santa Fe Institute, The Free University of Brusells, Sussex University in England, and the Swiss Federal Institute in Lausanne.
We are expecting the heterogeneity in available CPU cycles to provide the selective forces for maintaining cell differentiation. Availability of CPU cycles is reflected in the Speed (virtual instructions per second) field of the TPing data structure. In the December test, we examined the distribution of this variable. Over a 48 hour period, these values were collected after every million instructions executed on our point of view node (at ATR), from all the other nodes on its MapFile. This generated 504,589 values of Speed data, of which 188,594 (37%) were zero values. Zero Speed values indicate that the Tierra process is sleeping on the associated node. If genomes migrate to a sleeping Tierra process, their packets will be lost in the net, and they will die. This should provide a strong selective force for avoiding migration to sleeping nodes.
In addition to the high percentage of sleeping nodes, there is also a very wide distribution of (non-zero) CPU speeds. Organisms on fast nodes can reproduce faster than organisms on slow nodes, thus there should be selection to discriminate and migrate to fast nodes. The frequency distribution of the 504,589 values of the Speed data is illustrated in two FIGURES, first sorted into 100 bins, then sorted into only 10 bins.
Some examples of the distribution of Speed values over time are shown in the six figures below. Each figure shows the CPU Speed, in instructions per second, over time for one node. The horizontal axis is time, in millions of instructions executed on the point-of-view node (not the node being observed).
The Tierra process runs as a low priority background process, by using a ``Nice'' value of -19. This causes the Speed of Tierra to mirror the load of non-Tierra processes on the machine (the Tierra Speed is high when the load from other processes is low). Thus the Speed of Tierra will vary with the load on the machine. This can be seen in the FIGURE SpeedTime.a, on a relatively slow machine.
When the user of a machine touches the keyboard, or the mouse, Tierra immediately goes to sleep for twenty minutes (from the last hit). This is reflected in the FIGURE SpeedTime.b, by the Speed value dropping to zero. The machine in the figure above was evidently not being touched by a user, whereas the machine shown here is only infrequently touched.
The machine in FIGURE SpeedTime.c shows a gradual decline in the Speed of Tierra (probably due to an increasing process load), followed by a period of sustained user keyboard/mouse activity, followed by frequent but intermittent user keyboard/mouse activity.
The machine in FIGURE SpeedTime.d shows frequent but intermittent activity over all but the first part of the observation period.
The machine in FIGURE SpeedTime.e shows prolonged periods of sustained user keyboard/mouse activity, and periods of frequent but intermittent user keyboard/mouse activity.
The machine in FIGURE SpeedTime.f shows sustained user keyboard/mouse activity over most of the observation period.
The results of the December run are similar to the results of the November run. Early in the run, the TPing sensory algorithm degenerated into just incrementing through the TPing list, causing migrations roughly at random to the nodes on the list. This is illustrated by the organism 1060aaa.
A little later in the run, the sensory thread was eliminated completely. In this case, the random number generator was used to provide the IP address for migrations, as illustrated by 2652aan.
We may consider the results to be negative, with respect to selection for differentiation. It turns out that an undifferentiated algorithm with a random dispersal strategy is quite effective, and survives preferentially over the more complex algorithms that attempt to use a differentiated CPU thread to identify the nodes with a higher ratio of Speed / NumCells.
It seems likely that our seed algorithm is not adequate for the task of maintaining itself and its offspring on good nodes. One possible explanation for the inadequacy of this algorithm could be based on the distribution, in time of CPU Speeds on the net. For example, a distribution like the one shown in FIGURE SpeedTime.d may be too erratic to predict by the method of the seed algorithm. Faced with a network of machines showing this kind of behavior, random dispersal is probably the best strategy. On the other hand, if the predominant pattern is like the one shown in FIGURE SpeedTime.f, then it should be possible to have an algorithm to search out good nodes. In addition, the seed algorithm does not attempt to exploit any long term patterns, such as daily cycles.
At this point it appears that it would be worthwhile to gather more data about CPU Speed patterns in time and space on the net, the properties of the TPing sensor system (particularly the latency of data acquisition in relation to the temporal patterns of the data), and the life span, fecundity and migratory patterns of individual organisms. Based on this better understanding of the world in which the digital organisms live and the natural history of the organisms themselves, we could then design and test new seed algorithms, which might be better adapted to this world.
The Tierra network underwent fairly continuous testing from January through mid-February 1997. The tests involved about 140 participating machines, located at ATR in Japan, Aizu University in Japan, The University of Delaware, The Santa Fe Institute, The Free University of Brusells, Sussex University in England, and the Swiss Federal Institute in Lausanne. However, on Valentine's day, we received a complaint that the process was causing packet storms, creating network congestion. From that point we returned to local testing within the ATR networks to identify and control the causes of the bursts of high-volume communications. The sections below describe some of our observations during this test period.
This has been a period of debugging. There have been a variety of bugs which cause the system to halt in one way or another. Joseph Hart has named these bugs ``The Tar Pits'', because they cause the affected Tierra node to become a death trap for any immigrant creatures. Below I will describe in some detail, a case history of one of these traps.
When a packet is received by the Tierra process, it generates a signal. This signal invokes a signal handler routine, within which we move any received packets to a temporary backlog queue maintained by the Tierra process. Later, between time-slices, the packets in the queue are processed.
Incoming packets arrive first at the operating system maintained socket queue, which is different from the Tierra maintained backlog queue. If these packets arrive more quickly than they can be removed, and this condition persists long enough, the socket queue will eventually fill. When this happens, it appears that the operating system ceases to generate a signal for incoming packets until some are removed from the socket queue. It appears that the signal handler is disabled if there are more than about 76 unprocessed packets at the socket queue.
Disabling of the signal handler caused our code to stop processing incoming packets, and the transition was one-way. From this point on, there was a complete cut-off of incoming communications (Figure: horizontal - time in hours; vertical - bytes per second received). This bug was fixed by causing the signal handler to also be called at the time the incoming message backlog is processed, even though a signal may not be pending.
However, before fixing the signal handler bug, any node which received a burst of packets would convert permanently into a tar pit. Because all genomes immigrating to this node would perish in the system buffer, the first apocalypse following the conversion would result in the permanent extinction of all creatures from that node (Figure: horizontal - time in hours; vertical - number of cells).
The apocalypse randomizes all code in the soup memory, but leaves one (randomized) cell in the soup, so that there can be at least one active CPU to generate current TPing data. An unanticipated result of having a CPU execute a randomized soup, is that on the average, one in sixty-four executed instructions will be a tpings (TPing send) instruction. Each time the tpings instruction is executed, and outgoing packet is generated. Note that the interrupt handler bug causing the tar pit only cuts off incoming packets. Outgoing packets are not affected. Thus the execution of a randomized soup can cause a huge outpouring of TPing messages (Figure: horizontal - time in hours; vertical - bytes per second sent, outgoing).
In the example illustrated here, the incoming communication was cut off at approximately 52.26 hours into the run (Figure: horizontal - time in hours; vertical - bytes per second received).
At approximately 52.47 hours, the subsequent apocalypse caused the extinction of all functional organisms, and the randomization of the soup (Figure: horizontal - time in hours; vertical - number of cells).
From this point in time there are episodes of large outpourings of TPing messages reflected in a high volume of outgoing traffic (Figure: horizontal - time in hours; vertical - bytes per second sent, outgoing). The level of this TPing traffic is reset after each subsequent apocalypse, which re-randomizes the soup. This large traffic volume continues for 24 hours, after which there is a complete cut-off of outgoing traffic. The reason for the 24 hour delayed cut-off, is that when a node receives no communications from another node for a period of 24 hours, the silent node is dropped from the local list of participating nodes. Therefore, 24 hours after all incoming communications are shut off, all external nodes are dropped from the net, resulting in a one-node net, which will generate no out-going traffic. In this run on this node, the cut-off of outgoing traffic occurred at approximately 75.91 hours.
The very noisy behavior of the node illustrated above, can be contrasted with the much quieter behavior of another node from the same network run, which did not fall into the tar pit: (Figure: horizontal - time in hours; vertical - bytes per second received). (Figure: horizontal - time in hours; vertical - number of cells). (Figure: horizontal - time in hours; vertical - bytes per second sent, outgoing).
However, even in the absence of the high noise levels generated by the execution of a randomized soup, we can see spikes of noise generated by mutant organisms which make heavy use of the tpings instruction. This behavior has been discussed above in the section Excessive use of TPing. This behavior had been limited by making the use of the tpings costly, thus introducing selection against its excessive use. However, it is likely that the transient spikes of heavy TPing traffic are largely responsible for the overflowing of buffers that pushes nodes into this particular tar pit.
In order to resolve the myriad of problems associated with excessive use of the tpings instruction, this instruction has been eliminated from the instruction set. The instruction had already become superfluous, and was not used by the most recent ancestors.
For comparative purposes, a subsequent test was run with the tpings instruction removed from the instruction set, but without the fix to the signal handler bug. In this run, very few nodes fell into the tar pit, but for comparison, I show the data for a node that did: (Figure: horizontal - time in hours; vertical - bytes per second received). (Figure: horizontal - time in hours; vertical - number of cells). (Figure: horizontal - time in hours; vertical - bytes per second sent, outgoing). It can be seen in these figures, that the spikes of high volume traffic are greatly reduced, and the execution of a randomized soup generates very little traffic.
For comparison, I also show data from a node in this run (with the tpings instruction removed from the instruction set, but without the fix to the signal handler bug) that did not fall into the tar pit: (Figure: horizontal - time in hours; vertical - bytes per second received). (Figure: horizontal - time in hours; vertical - number of cells). (Figure: horizontal - time in hours; vertical - bytes per second sent, outgoing).
In addition to the bursts of TPing messages caused by the tar pit described above, we have also seen burst of communications involving heavy immigration. The cause of these bursts is still not known, and is still under investigation, but here I will illustrate it with some data.
The typical behavior can be seen in the period between twelve and fifteen hours in this figure: (Figure: horizontal - time in hours; vertical - bytes per second sent, outgoing). The rate of transmission increases by roughly an order of magnitude, from about 2K per second to almost 20K per second.
Closer examination shows that the burst is due to an increase in the numbers of genomes emmigrating: (Figure: horizontal - time in hours; vertical - genomes emmigrating per million instructions executed).
The transmission rate in bytes per second could also be increased by emmigration if larger genomes were emmigrating. However, this does not appear to be the case: (Figure: horizontal - time in hours; vertical - average size in bytes of genomes in the soup). This figure shows the average size of genomes in the soup over time. This is not necessarily the average size of emmigrating genomes, but for now we will assume that it is. We can see in this figure that the average size roughly doubled during the burst, which may have contributed to the increase (in bytes per second). However, similar genome sizes were seen in the first several hours without a transmission burst, and very much larger genomes were seen after eighteen hours, again without a transmission burst.
Thus I conclude that the burst in transmission rate was caused by a three hour transient of much more mobile genomes. We will be watching this phenomenon to try to understand it further, and to learn how to suppress it.
There is an interesting footnote to the studies described in this section. The signal handler bug caused machines to fall into a tar pit, after which they executed randomized soups for extended periods of time. The data on the number of cells in the soup revealed that pseudo-replication sometimes took place in these randomized soups. (Figure: horizontal - time in hours; vertical - number of cells).
In this run, after the apocalypse, the number of cells could only increase due to cell division. This would require the following sequence of events: 1) execution of the mal instruction to allocate a memory space for the daughter. This memory space is required to be twelve bytes or more in size. 2) Writing of data by the mother, into the memory space of the daughter using the movii instruction. There is a requirement that the number of bytes written to this space should be 0.3 or more of the size of the allocated space. 3) Execution of the divide instruction after 1) and 2).
I call this pseudo-replication, because although there is an increase in the number of cells, no doubt, the daughter cells do not contain the genetic material of their mothers. If they did, their numbers would have increased exponentially until they filled the soup.
In this section I will report a variety of observations that occurred in the month of May.
In a run ending on May 5, I noticed that eight of twenty-nine machines showed a peculiar pattern of spikes in the CPU speed: (Figure: horizontal - time in hours; vertical - CPU speed in instructions per second). These spikes occur at a very regular interval of about sixty-five minutes. When the spikey phenomenon drops out, the CPU speed doubles.
This pattern turned out to be an interaction with the network process of Erik McDermott, another ATR researcher. On each machine, his process performs a fixed number of calculations, reports its completion to a master, then sleeps until the last machine has reported in. After the last machine reports, the master sends out new conditions to all machines and the calculation repeats. Both McDermott's process and the Tierra process run at Nice 19, and share the CPU roughly equally. When McDermott's process rests for the network-wide synchronization, Tierra uses the entire CPU resource, and roughly doubles its speed.
We received a recommendation to use TCP communications protocols rather than UDP, because TCP has built-in congestion control mechanisms. We have been concerned that TCP carries a higher overhead than UDP, due to its requirement for an acknowledgement. So we have begun speed comparisons.
In our first test based on a network of thirty machines inside of ATR, we were surprised to find that the TCP network ran about 50% faster than the UDP network. The first two colums below list the number of millions of instructions executed by each machine in a fixed period of time (about 115 hours), for the TCP ended 97-05-12, and the UDP ended 97-05-05. The third column shows the ratio TCP/UDP. Finally at the bottom we show that average of the thirty values, and another average of twenty-eight values with the two outlying ratios excluded.
To the right of the values comparing speeds tested in a local network, are the speeds of the same machines when run as a part of a larger international network. The UDP data is taken from a network of 156 machines whose run ended 97-05-19. The TCP data is from a network of 98 machines whose run ended 97-06-09.
local network only international network ------------------------ ----------------------- TCP UDP TCP/UDP TCP UDP UDP/TCP 5789 4010 1.44 1767 4342 2.46 abel 9002 6231 1.44 1350 5448 4.04 banach 3924 3608 1.09 1454 4232 2.91 batu 10874 1245 8.73 1642 11850 7.22 cam8 7720 5940 1.30 1632 6369 3.90 euclid 4205 3246 1.30 1537 3766 2.45 goro 5745 3161 1.82 1847 ---- ---- haar 11412 5003 2.28 ---- 10081 ---- hakone 5000 1660 3.01 ---- ---- ---- hal 4935 3316 1.49 1399 4489 3.21 hamburger 6330 2967 2.13 1825 6825 3.74 hecate 4780 3291 1.45 1334 4579 3.43 higgs 4615 3035 1.52 1335 ---- ---- hsun16 4301 3021 1.42 1258 4232 3.36 hsun20 4722 3113 1.52 ---- 4402 ---- hsun22 4483 2938 1.53 ---- 4090 ---- hydrogen 10423 7721 1.35 2061 9143 4.44 ikoma 8495 6034 1.41 1904 12434 6.53 kilimanjaro 11091 7511 1.48 1699 10438 6.14 lucas 8228 5659 1.45 1818 7958 4.38 marr 7266 5282 1.38 1501 7073 4.71 moon 11327 7719 1.47 1355 9551 7.05 muse 9369 7567 1.24 1458 ---- ---- newton 4940 4251 1.16 1455 4631 3.18 planet 4923 1902 2.59 1904 5851 3.07 rokkoh 8226 7280 1.13 1512 6388 4.22 suisei 4265 4241 1.01 1816 4307 2.37 tsurugi 9542 6978 1.37 2038 7825 3.84 turing 7081 5574 1.27 1729 7651 4.43 vega 10104 6470 1.56 1376 8900 6.47 vita ---------------------- --------------------- average 1.78 average 4.24 no outlier (<3.0) average 1.49
An experiment was conducted to test a theory to explain the Emmigration Bursts described above. The theory is that occassionally mutants appear, which do nothing but migrate. They would essentially live in the network itself, the space between machines, rather than on the machines themselves. Upon arriving at a Tierra node, the would immediately execute the surf instruction causing them to bounce to some other node. This behavior would cause them to avoid the reaper, because in the extreme case, they would exit their host machine in their first time slice, and so would never remain on a machine long enough to rise to the top of the reaper queue.
The theory was tested with a creature containing the following sixteen byte genome:
nop1 ; dummy rand ; generate random IP address surf ; go there nop1 ; dummy nop1 ; dummy nop1 ; dummy nop1 ; dummy nop1 ; dummy nop1 ; dummy nop1 ; dummy nop1 ; dummy nop1 ; dummy nop1 ; dummy nop1 ; dummy nop1 ; dummy nop1 ; dummyThe dummy instructions are included to keep the size above the minimum allowed (usually set at 12 bytes). These were introduced into a network of just four nodes. One hundred copies of the bouncer genome were injected, simultaneously, into each of the four nodes (at total of four hundred bouncers). The resulting immigration is illustrated in the (Figure: horizontal - time in hours; vertical - genomes emmigrating per million instructions executed). The injection occurred just past three hours into the run. This caused a small spike in the immigration traffic, which cleaed from the network in less than ten minutes.
It is believed that bouncers are unlikely to be the cause of the Emmigration Bursts described above, because they have too short of a half-life in the network.
A full scale network test conducted at the end of August showed that traffic loads between sites was too high. In low bandwidth sites, the Tierra traffic overloaded the lines into the site, causing heavy packet loss. In high bandwidth sites, this problem was not noted, but the Tierra application alone accounted for as much as half of the traffic into and out of the site. These traffic loads are unacceptable.
From these tests we have concluded that the traffic load problem is not due to bursts of unusual activity, but that the normal level of activity is too high. However, our tests on packet loss rates have shown that the traffic loads in the local net are not a problem. The problem is only with traffic into and out from a site. Thus we have determined to redesign the implementation of the network to reflect the differences in bandwith within and between sites.
Our plan is to have separate Tierran machine instructions for local and between-site movements. All between-site movements will pass through a cluster server, where the total traffic into and out of the site will be metered and limited.
In period from May to November we continued testing and debugging of the Tierra network. In early November, Joseph Hart identified and fixed a few bugs which influenced the sensory system.
For example, each creature has a pointer into the array of TPing data structures. These pointers are initialized at random, and incremented sequentially through the list with wrap-around. However, a bug caused these pointers to be reset to zero after their random initialization. The effect of this was that all creatures would always see only the first nodes on the list. Since the normal (ancetor) creature sees only fifteen nodes from the list, this bug caused the sensory system of the entire population to be limited to approximately the fist fifteen nodes on the list, even when there may be 150 nodes in the network.
After fixing the recently identified bugs, a network of sixty
machines within ATR was started on November 3. This was the
first test in which the sensory system survived through
prolonged periods of evolution. The project has two milestones of success:
1) Survival of the differentiated state under conditions of free
evolution by natural selection.
2) Increase in the number of cell types through evolution.
From the network test started on Friday November 3, we achieved the
first milestone. It is the less significant of the two, but it is
a major advance for the work.
Until now, with the experience of hundreds of tests, we watched the differentiated state rapidly eliminated by natural selection. It was obvious that the sensory system did not make a positive contribution to the fitness of the organisms, and the genetic code that supported it was rapidly degraded and eliminated. Digital seed organisms originally containing ten cells of two cell types (two reproductive cells and eight sensory cells), collapsed into organisms with a single cell type (only reproductive cells, but the number of these cells tended to increase to our imposed limit of 256 cells).
But now the differentiated state is surviving through prolonged periods of evolution. For the first time, we are able to study the evolution of digital organisms with more than one cell type. We are observing changes in the decision making algorithms used by the sensory system, and we are observing changes in the relative and absolute sizes of the two tissues (reproductive and sensory).
While the undifferentiated reproductive-cell-only organisms tended to increase the size of their reproductive tissue to our imposed limit of 256 cells, the differentiated organisms are maintaining a delicate balance between the sizes of the two tissues which are remaining small (both in the 4-32 cell range).
In the second run of the succesful Tierra network, started November 11, we made the following observations:
A key developmental gene (labeled "dev" in the figures from the "Selecting Naturally for Differentiation" manuscript) has been duplicated. Both copies of the gene are being expressed. One copy is expressed by the reproductive tissue, and one copy is expressed by the sensory tissue. At this time, after one day of evolution, the two copies of the gene are identical.
This is significant, because biologists believe that gene duplication, followed by divergence of the two copies, is a key feature of the evolution of complex genomes, and particularly of differentiated multi-cellularity.
Meta-observation: This is an interesting story today. But tomorrow evolution may have proved it to be nonsense. One copy of the gene may have been lost, or, even after long periods of evolution, the two copies of the gene may never diverge significantly. "Real" biologists don't have their interesting stories so readily tested by evolution.