Variations upon the theme of Evolutionary Language Game
by Daniel Devatman Hromada

Introduction
Evolutionary Language Game (ELG) first proposed in (Nowak, Plotkin, & Krakauer, 1999) is a
stunningly simple yet mathematically feasible stochastic model addressing the question « How
could a coordinated system of meanings&sounds evolve in a group of mutually interacting
agents ? ». In most simple terms, the model can be described as follows:
Let’s have a population of N agents. Each agent is described by an n x m associative matrix
A. A’s entry aij specifies how often an individual, in a role of a student, observed one or more other
individuals (teachers) referring to object i by producing signal j. Thus, from this matrix A, one can
derive the active « speaker »
matrix P by normalizing rows :

while the « hearer » passive matrix Q by normalization of A’s columns:

The entries pij of the matrix P denote the probability that for an agent-speaker, object i is associated
with sound j. The entries qji of the matrix Q denote the probability that for an agent-hearer, a sound
j is associated with the object i.
Subsequently, we can imagine two individuals A and A’, the first one having the language L
(P, Q), the other having the language L’ (P’, Q’). The payoff related to communication of such two
individuals is, within Nowak’s model, calculated as follows:
n

n

F ( A, A′ ) = ∑∑ pij q′ji = Tr ( PQ′ )
i =1 j =1

And the fitness of the individual A in regards to all other members of the population can be obtained
as follows :
1
f ( A) =
∑ F ( A, A′)
| P | −1 A′∈P
( A′≠ A)

After the fitness values are obtained for all population members, one can easily apply traditional
evolutionary computing methods (Sekaj, 2005) in order to direct the population toward more
optimal states. In the experiments described in this paper, we have applied a binary search variant of
the roulette wheel algorithm within which the probability of the selection of individual I as a future
teacher, is proportional to I’s fitness.
It has to be stated, however, that Nowak’s results indicate that even without such « evolutionary
engine » behind, ELG model shall converge to weak local optima. Such mathematical property of
ELG model makes it thus plausible candidate for explanation of coordinated communication
systems even among species where coordination of sound-meaning pairs does not neccessarily
augment individual’s fitness. While such cases where teacher is chosen by « random » could
possibly explain the first stages of emergence of language practically ex nihilo in case of higher
vertebrae, they will be left aside in the rest of this paper.
Thus, all numeric experiments presented below depart from the assumption that the hypothesis:
« successful alignment of one’s sound-meaning associative mindmatrix A with mindmatrices of
other members of one’s population augments one’s fitness and thus augments one’s probability to
replicate content of one’s mindmatrix into mindmatrices of younger individuals»...is true.

First simulation
The aim of the first simulation, which we label as standard Evolutionary Game (sELG), was to
confirm the validity of the ELG model and test its sensitivity to different values {1, 4, 7, 10} of the
parameter k_learn which specifies how many times should be the matrix sampling procedure
repeated during the learning process. The size of the population was N=100, the size of the
associative matrix was 5 x 5. For every value of k_learn parameter, the simulation had been run 198
times. Every run was halted after 10000 generations. In every generation, the random wheel
algorithm have chosen one fit individual to be the « teacher » whose associative matrix was
sampled into the associative matrix of one « student » individual chosen by random.
Results of 1st simulation
As is indicated by Figure 1., all runs converged rather swiftly to local absorbing states. The result is
thus consistent with results presented in (Nowak et al., 1999). The global optima were, however,
attained quite rarely, 18 times in case of k_learn=1, 13 times in case of k_learn=10, 7 times in case
of k_learn=4 and 9 times in case of k_learn=7. For other information concerning, for example, the
generation WHEN in average different absorbing states were attained, c.f. (Hromada, 2012)1.

Figure 1: Figure 1: Evolution of fitness in time in a standard
Evolutionary Language Game

Second simulation
In the second simulation, which we label as standard Evolutionary Language Game with noise
(sELGn), the stochasticity of the model was increased by introduction of noise-generating
probabilist p_mutation={0.05, 0.01, 0.001, 0.0001} phenomena into the learning process. For every
probability value there were 20 runs. The value of parameter k_learn=4, other parameters were
identic to those used in the first simulation (i.e. N=100, m=5, n=5).
1 Note that local optima which Nowak labels as « absorbing states » are denoted by term « orbitals» in (Hromada,
2012)

Figure 2: Evolution of fitness in time in a standard Evolutionary Language
Game with noise
Results of 2nd simulation
Fitness for different values of the parameter p_mutation are plotted on Figure 2. It is evident that
if ever the value of p_mutation depasses certain threshold, the system shall behave too
stochastically and shall oscilate in the proximity of the weakest local optimum. On the contrary,
when the p_mutation is very low, one can notice that the system converges to high attractor states
which have the fitness value 4 and 5. Thus the presence of very low amount of noise seems to play
an important role in cases whereby the system gets stucked in local optima – verily the bug in the
learning process can give the system an important stochastic kick which shall allow, in the long
run, the system to attain the global optimum state. In evolutionary computing, such an approach is
already widely used not only in genetic algorithms but as well in case of algorithms derived from
Simulated Annealing (Kirkpatrick & Vecchi, 1983) approach.
Results obtained in experiment 2 are consistent with Nowak’s data, as well as with data
obtained by Kvasnicka & Pospichal who state : « if we introduce random errors into the learning
process, obtained results differ dramatically and are dependent from the probability of occurrence
of these errors. If ever this probability exceeds the critical value located somewhere between 0.001
and 0.01, the evolution shall start to behave in a stochastic manner and shall cease to converge to
the final value of fitness » (Kvasnička & Pospíchal, s. d.).

Third simulation
The objective of the third simulation was to exploit the ELG question in order to find the answer to
the question : « Which strategy is more fit ? To be taught once or multiple times? ». Seemingly
identic to first simulation within which we modelled the fact of « being taught more than once » by
different values of the parameter k_learn, this simulation differed primarily in two aspects : 1)
stochastic parameter p_mutation was assigned the value 0.001 in order to ensure that the system
will converge, sooner or later, to the global optimum 2) none of 99 runs for every simulation was
stopped until the average fitness of the population attained unescapable proximity of the global
optimum2. Every run could be thus characterized by a « temporal length », i.e. the number of
generations neccessary to attain the global optimum, and two distributions were subsequently
compared by statistic tests.
2 Since the maximum average fitness of the model hereby presented is n=m=5, we state that all models for which
average fitness attained value 4.98 (or higher) have attained « unescapable proximity of the global optimum ».

Results of 3rd simulation
All 198 runs have converged to global optimum, the longest run for k_learn=1 took 376950
generations to converge to theoretically optimal alignment of mindmatrices among the members of
the populatios. The longest run in case of k_learn=4 needed 215290 generations to converge to
quasioptimal alignment of all mindmatrices of all members of the population.
Distribution of 99 temporal lengths, one for each run, is not a normal distribution according to
Shapiro-Wilk test of normality (W = 0.7407, p-value = 6.244e-12 ) for runs where k_learn=1.
Similiarly, distribution is not not normal in case of distribution of 99 temporal lenghts for all 99
runs executed with k_learn=4 (W = 0.8709, p-value = 8.533e-08 ). A non-parametric test had to be
therefore applied in order to compare the two distributions : the Mann-Whitney-Wilcoxon ranksum test had shown that the difference between the two distributions is not significative (W = 4207,
p-value = 0.08562 ). Strictly statistically speaking there is therefore no difference between situation
whereby teacher’s mindmatrix is sampled into student’s four times, or only once.

Fourth simulation
The aim of this simulation was to shed some light upon the answer to the question : « Which
strategy is more fit ? To be taught once by multiple teachers or to be taught multiple times by
one unique teacher? ». We have used the data obtained in the 3rd simulation for the case «multiple
times by one unique teacher ». For cases with multiple teachers, we have modified the sELGn so
that before every matrix sampling for a randomly chosen student, a new teacher is chosen by a
roulette wheel algorithm according to his fitness. Given that k_learn=4 in both cases, all other
parameters were identic to preceeding simulations.
Results of the 4th simulation
The Mann-Whitney-Wilcoxon rank-sum test suggest that the difference between the two
distributions is not significative (W = 5068.5, p-value = 0.6778 ). Therefore it seems that within the
standard ELG model, the fact whether the student learns from one or more teachers does not speed
up the convergence of the population towards the global optimum. The longest run in case of
« multiple teachers » simulations took 298910 generations to converge.

Fifth simulation
The aim of the fifth simulation was to see whether phenomenon like Baldwin effect could arise in
sELG world. While the traditional «cultural evolution» was modelled by sELGn model as presented
in simulations 2-4, the « genetic evolution » was implemented as a sort of meta-evolution within
which the parameter k_learn itself could evolve. In concrete terms, every individual of the
population was defined not only by his mindmatrix A (and the « speaker » matrix P and the
« hearer » matrix Q derived from A), but also by a chromosome containing a single integer
specifying the k_learn parameter. All the members of population were initialized with the
k_learn=0, but in every generation, a mutation of this value could have occured with probability
p=0.1 which could increment or decrement the value. The value was bounded by interval <0,10>, it
could not increase nor decrease above, respectively below these bounds. Values of other parameters
were identic to those presented in simulations 1-4.
Results of the 5th simulation
Identically to simulations 3 & 4, all runs have attained the proximity of globally optimal attractor
state (the longest run took 361590 generations to converge). While Wilcoxon rank sum test has not
indicated any significant difference between the distribution obtained in this simulation when
compared to the « multiple teacher » one obtained in the fourth simulation (W = 4587, p-value =
0.3722 ), nor any difference in regards to distribution obtained for « parental learning with
k_learn=4> » from the third simulation (W = 4385, p-value = 0.1646 ), there was nonetheless a

significative difference observed between the distribution obtained in this simulation and the
distribution for « parental learning with k_learn=1 ». Somewhat contrary to the expectations we had
when we were launching the model, the most simple variant of the parental learning seems to
converge faster, i.e. in less generations, to the global optimum (one-sided : W = 3765, p-value =
0.001773 ) than the «bounded Baldwin » variant we presented in this simulation.
For completeness, we consider it worth noting that the further analysis of the values of evolvable
k_learn parameter indicates that within the executed 100 runs, the mean of the parameter value was
3.57 and median 3.351 in the moment when a given simulation had attained the proximity of the
optimal state. Given the fact that that possible values of k_learn were bounded into the interval
<0,10> this could seem surprising since one would expect the values to be closer to the middle of
the interval. All is explained, however, when one realizes that the random drift responsible for
evolution of the k_learn from its initial value 0 does not have time to do so in cases of «lucky fast
simulations » which converge to global optimum in few thousand generations,. This can be clearly
seen when one compares the distribution of temporal lengths of such « fast simulations » which
converged to optimum in less than 10000 generations, with the rest. While the mean value of the
k_learn parameter for the « fast simulations » is 1.92, the mean value for others is 3.73 and the
difference between the two subgroups is, of course, significative (W = 819, p-value = 8.384e-07 ).

Discussion
The elegance of ELG is so high, that one is immediately tempted to state that « ELG offers THE
mathematical formalism explaining the emergence of shared communication system in the
population of agents whose sound-meaning couples are randomly initialized». On the other hand, it
could be easily reproached that the initial ELG model is too much reductionist and, what is worse,
that it is based on assumptions which are contradictory to the real state of things which had to be the
case when the human language evolved. For example, the assumption that the teacher->student
information transfer can be modelled by the sampling of the WHOLE teacher’s associative matrix,
or that the fitness of any individual I in generation G is defined as an average payoff of all possible
communication acts with all other individuals of the population, both these assumptions seem to us
to be highly unrealistic in relation to functioning of groups of primates in the period where
coordinated sign-meaning communicative systems emerged into existence.
But ELG is interesting even if all relations of ELG to human sciences and linguistics would be
considered as non-relevant. Verily, we believe that ELG is worth of scientific interest even if it is
considered as a solely mathematic, informatic and/or game-theory problem. More concretely : as a
stochastic framework able to converge into well-defined global optimum state (i.e. the state where
in all rows of of all members of the population, there is only one entry having value 1 with zeroes in
all other entries of the same row), ELG can furnish a useful toolbox for evaluation and
comparison of diverse evolutionary computing approaches.
Within this paper we have introduced, in experiments 3-5, an evaluation method based on nonparametric Mann-Whitney-Wilcoxon rank-sum test. We have taken as granted an analytical result
published in (REF), indicating that if ever there is reasonable amount of noise present during the
learning process, the population shall sooner or later converge to optimally coordinated
communication system. This being granted, we were asking : « How fast shall the global optimum
attained ? » and considered an evolutionary algorithm and/or a set of given parameter (k_learn,
p_mutation) values as better if, for a given configuration, the algorithm converged significantly
faster to global optimum (i.e. in less generations3) than in case of other configurations.
3 In this paper we had used the term « generation » which is common to Evolutionary Computing domain. But since
in the sELG model, a generation consists of 1) choice of teacher(s) 2) choice of ONE student 3) information transfer
from teacher to student, i.e. replacing student’s mindmatrix with a new one, determined by the teacher; it seems to
be more appropriate to label such coarse-grained time steps as « lessons » or « days ».

Finally, it has to be stated that in order to transform ELG into a full-fledged evolutionary algorithm
evolutionary toolbox, the notion of time has to be somewhat refined. From coarse-grained notion of
generation, which, in case of Nowak’s or Kvasnicka’s work, is equivalent to k_learn>1 acts of the
sampling of the whole matrix, we propose to found further work on a more finer notion of a
ostensive definition (Wittgenstein, 2009). One can easily understand that within the ELG model,
every internal step of an envelopping matrix sampling loop 4 can be interpreted as such « definition
by pointing » whereby teacher associates the sound with the meaning within the mindmatrix of the
student.
Thus, under the conditions that 1) diverse models are evaluated by statistical non-parametric tests
comparing number « of time steps needed to attain global optimum » 2) a time-step is defined like
an ostensive definition ; one could propose such fine-grained ELG as a possible evaluation toolkit
not only for diverse evolutionary computing techniques, but possibly even as a more general
method to assess the performance of game-theory approaches to attain Nash-like (Trapa & Nowak,
2000)equilibria . And if assumption 3) the parameters like P_mutation, K_learn, number of
teachers, parents etc. can also evolve by means a genetic meta-evolution governing the subordinated
linguistic evolution, it could be expected that the phenomenon interpretable as Baldwin effect shall
be discovered in the world defined by ELG-framework.

Bibliography
Hromada, D. D., (2012). Hromada, D. D. Evolutionary insight into spontaneous emergence of
shared sound-meaning mappings in multi-agent communities. Accessible at:
http://localhost.sk/~hromi/academic/2012/evolutionary_insights.pdf
Kirkpatrick, S., & Vecchi, M. P. (1983). Optimization by simmulated annealing. science, 220(4598),
671–680.
Kvasnička, V., & Pospíchal, J. (2007). Evolúcia jazyka a univerzální darwinizmus. Myseľ,
inteligencia a život (Slovenská Technická Univerzita.). Bratislava.
Nowak, M. A., Plotkin, J. B., & Krakauer, D. C. (1999). The evolutionary language game. Journal
of Theoretical Biology, 200(2), 147–162.
Sekaj, I. (2005). Evolučné vỳpočty a ich vyuzitie v praxi.
Trapa, P. E., & Nowak, M. A. (2000). Nash equilibria for an evolutionary language game. Journal
of Mathematical Biology, 41(2), 172–188.
Wittgenstein, L. (2009). Philosophical investigations. Wiley-Blackwell.

4

for my $row (0..$Matrix_height) { #matrix sampling
for my $column (0..$Matrix_width) {
$Student_A_Matrix[$row][$column]+=1 if rand()<Teacher_P_Matrix[$row][$column]; #ostensive definition
}
}