Variations upon the theme of Evolutionary Language Game by Daniel Devatman Hromada Introduction Evolutionary Language Game (ELG) first proposed in (Nowak, Plotkin, & Krakauer, 1999) is a stunningly simple yet mathematically feasible stochastic model addressing the question « How could a coordinated system of meanings&sounds evolve in a group of mutually interacting agents ? ». In most simple terms, the model can be described as follows: Let’s have a population of N agents. Each agent is described by an n x m associative matrix A. A’s entry aij specifies how often an individual, in a role of a student, observed one or more other individuals (teachers) referring to object i by producing signal j. Thus, from this matrix A, one can derive the active « speaker » matrix P by normalizing rows : while the « hearer » passive matrix Q by normalization of A’s columns: The entries pij of the matrix P denote the probability that for an agent-speaker, object i is associated with sound j. The entries qji of the matrix Q denote the probability that for an agent-hearer, a sound j is associated with the object i. Subsequently, we can imagine two individuals A and A’, the first one having the language L (P, Q), the other having the language L’ (P’, Q’). The payoff related to communication of such two individuals is, within Nowak’s model, calculated as follows: n n F ( A, A′ ) = ∑∑ pij q′ji = Tr ( PQ′ ) i =1 j =1 And the fitness of the individual A in regards to all other members of the population can be obtained as follows : 1 f ( A) = ∑ F ( A, A′) | P | −1 A′∈P ( A′≠ A) After the fitness values are obtained for all population members, one can easily apply traditional evolutionary computing methods (Sekaj, 2005) in order to direct the population toward more optimal states. In the experiments described in this paper, we have applied a binary search variant of the roulette wheel algorithm within which the probability of the selection of individual I as a future teacher, is proportional to I’s fitness. It has to be stated, however, that Nowak’s results indicate that even without such « evolutionary engine » behind, ELG model shall converge to weak local optima. Such mathematical property of ELG model makes it thus plausible candidate for explanation of coordinated communication systems even among species where coordination of sound-meaning pairs does not neccessarily augment individual’s fitness. While such cases where teacher is chosen by « random » could possibly explain the first stages of emergence of language practically ex nihilo in case of higher vertebrae, they will be left aside in the rest of this paper. Thus, all numeric experiments presented below depart from the assumption that the hypothesis: « successful alignment of one’s sound-meaning associative mindmatrix A with mindmatrices of other members of one’s population augments one’s fitness and thus augments one’s probability to replicate content of one’s mindmatrix into mindmatrices of younger individuals»...is true. First simulation The aim of the first simulation, which we label as standard Evolutionary Game (sELG), was to confirm the validity of the ELG model and test its sensitivity to different values {1, 4, 7, 10} of the parameter k_learn which specifies how many times should be the matrix sampling procedure repeated during the learning process. The size of the population was N=100, the size of the associative matrix was 5 x 5. For every value of k_learn parameter, the simulation had been run 198 times. Every run was halted after 10000 generations. In every generation, the random wheel algorithm have chosen one fit individual to be the « teacher » whose associative matrix was sampled into the associative matrix of one « student » individual chosen by random. Results of 1st simulation As is indicated by Figure 1., all runs converged rather swiftly to local absorbing states. The result is thus consistent with results presented in (Nowak et al., 1999). The global optima were, however, attained quite rarely, 18 times in case of k_learn=1, 13 times in case of k_learn=10, 7 times in case of k_learn=4 and 9 times in case of k_learn=7. For other information concerning, for example, the generation WHEN in average different absorbing states were attained, c.f. (Hromada, 2012)1. Figure 1: Figure 1: Evolution of fitness in time in a standard Evolutionary Language Game Second simulation In the second simulation, which we label as standard Evolutionary Language Game with noise (sELGn), the stochasticity of the model was increased by introduction of noise-generating probabilist p_mutation={0.05, 0.01, 0.001, 0.0001} phenomena into the learning process. For every probability value there were 20 runs. The value of parameter k_learn=4, other parameters were identic to those used in the first simulation (i.e. N=100, m=5, n=5). 1 Note that local optima which Nowak labels as « absorbing states » are denoted by term « orbitals» in (Hromada, 2012) Figure 2: Evolution of fitness in time in a standard Evolutionary Language Game with noise Results of 2nd simulation Fitness for different values of the parameter p_mutation are plotted on Figure 2. It is evident that if ever the value of p_mutation depasses certain threshold, the system shall behave too stochastically and shall oscilate in the proximity of the weakest local optimum. On the contrary, when the p_mutation is very low, one can notice that the system converges to high attractor states which have the fitness value 4 and 5. Thus the presence of very low amount of noise seems to play an important role in cases whereby the system gets stucked in local optima – verily the bug in the learning process can give the system an important stochastic kick which shall allow, in the long run, the system to attain the global optimum state. In evolutionary computing, such an approach is already widely used not only in genetic algorithms but as well in case of algorithms derived from Simulated Annealing (Kirkpatrick & Vecchi, 1983) approach. Results obtained in experiment 2 are consistent with Nowak’s data, as well as with data obtained by Kvasnicka & Pospichal who state : « if we introduce random errors into the learning process, obtained results differ dramatically and are dependent from the probability of occurrence of these errors. If ever this probability exceeds the critical value located somewhere between 0.001 and 0.01, the evolution shall start to behave in a stochastic manner and shall cease to converge to the final value of fitness » (Kvasnička & Pospíchal, s. d.). Third simulation The objective of the third simulation was to exploit the ELG question in order to find the answer to the question : « Which strategy is more fit ? To be taught once or multiple times? ». Seemingly identic to first simulation within which we modelled the fact of « being taught more than once » by different values of the parameter k_learn, this simulation differed primarily in two aspects : 1) stochastic parameter p_mutation was assigned the value 0.001 in order to ensure that the system will converge, sooner or later, to the global optimum 2) none of 99 runs for every simulation was stopped until the average fitness of the population attained unescapable proximity of the global optimum2. Every run could be thus characterized by a « temporal length », i.e. the number of generations neccessary to attain the global optimum, and two distributions were subsequently compared by statistic tests. 2 Since the maximum average fitness of the model hereby presented is n=m=5, we state that all models for which average fitness attained value 4.98 (or higher) have attained « unescapable proximity of the global optimum ». Results of 3rd simulation All 198 runs have converged to global optimum, the longest run for k_learn=1 took 376950 generations to converge to theoretically optimal alignment of mindmatrices among the members of the populatios. The longest run in case of k_learn=4 needed 215290 generations to converge to quasioptimal alignment of all mindmatrices of all members of the population. Distribution of 99 temporal lengths, one for each run, is not a normal distribution according to Shapiro-Wilk test of normality (W = 0.7407, p-value = 6.244e-12 ) for runs where k_learn=1. Similiarly, distribution is not not normal in case of distribution of 99 temporal lenghts for all 99 runs executed with k_learn=4 (W = 0.8709, p-value = 8.533e-08 ). A non-parametric test had to be therefore applied in order to compare the two distributions : the Mann-Whitney-Wilcoxon ranksum test had shown that the difference between the two distributions is not significative (W = 4207, p-value = 0.08562 ). Strictly statistically speaking there is therefore no difference between situation whereby teacher’s mindmatrix is sampled into student’s four times, or only once. Fourth simulation The aim of this simulation was to shed some light upon the answer to the question : « Which strategy is more fit ? To be taught once by multiple teachers or to be taught multiple times by one unique teacher? ». We have used the data obtained in the 3rd simulation for the case «multiple times by one unique teacher ». For cases with multiple teachers, we have modified the sELGn so that before every matrix sampling for a randomly chosen student, a new teacher is chosen by a roulette wheel algorithm according to his fitness. Given that k_learn=4 in both cases, all other parameters were identic to preceeding simulations. Results of the 4th simulation The Mann-Whitney-Wilcoxon rank-sum test suggest that the difference between the two distributions is not significative (W = 5068.5, p-value = 0.6778 ). Therefore it seems that within the standard ELG model, the fact whether the student learns from one or more teachers does not speed up the convergence of the population towards the global optimum. The longest run in case of « multiple teachers » simulations took 298910 generations to converge. Fifth simulation The aim of the fifth simulation was to see whether phenomenon like Baldwin effect could arise in sELG world. While the traditional «cultural evolution» was modelled by sELGn model as presented in simulations 2-4, the « genetic evolution » was implemented as a sort of meta-evolution within which the parameter k_learn itself could evolve. In concrete terms, every individual of the population was defined not only by his mindmatrix A (and the « speaker » matrix P and the « hearer » matrix Q derived from A), but also by a chromosome containing a single integer specifying the k_learn parameter. All the members of population were initialized with the k_learn=0, but in every generation, a mutation of this value could have occured with probability p=0.1 which could increment or decrement the value. The value was bounded by interval <0,10>, it could not increase nor decrease above, respectively below these bounds. Values of other parameters were identic to those presented in simulations 1-4. Results of the 5th simulation Identically to simulations 3 & 4, all runs have attained the proximity of globally optimal attractor state (the longest run took 361590 generations to converge). While Wilcoxon rank sum test has not indicated any significant difference between the distribution obtained in this simulation when compared to the « multiple teacher » one obtained in the fourth simulation (W = 4587, p-value = 0.3722 ), nor any difference in regards to distribution obtained for « parental learning with k_learn=4> » from the third simulation (W = 4385, p-value = 0.1646 ), there was nonetheless a significative difference observed between the distribution obtained in this simulation and the distribution for « parental learning with k_learn=1 ». Somewhat contrary to the expectations we had when we were launching the model, the most simple variant of the parental learning seems to converge faster, i.e. in less generations, to the global optimum (one-sided : W = 3765, p-value = 0.001773 ) than the «bounded Baldwin » variant we presented in this simulation. For completeness, we consider it worth noting that the further analysis of the values of evolvable k_learn parameter indicates that within the executed 100 runs, the mean of the parameter value was 3.57 and median 3.351 in the moment when a given simulation had attained the proximity of the optimal state. Given the fact that that possible values of k_learn were bounded into the interval <0,10> this could seem surprising since one would expect the values to be closer to the middle of the interval. All is explained, however, when one realizes that the random drift responsible for evolution of the k_learn from its initial value 0 does not have time to do so in cases of «lucky fast simulations » which converge to global optimum in few thousand generations,. This can be clearly seen when one compares the distribution of temporal lengths of such « fast simulations » which converged to optimum in less than 10000 generations, with the rest. While the mean value of the k_learn parameter for the « fast simulations » is 1.92, the mean value for others is 3.73 and the difference between the two subgroups is, of course, significative (W = 819, p-value = 8.384e-07 ). Discussion The elegance of ELG is so high, that one is immediately tempted to state that « ELG offers THE mathematical formalism explaining the emergence of shared communication system in the population of agents whose sound-meaning couples are randomly initialized». On the other hand, it could be easily reproached that the initial ELG model is too much reductionist and, what is worse, that it is based on assumptions which are contradictory to the real state of things which had to be the case when the human language evolved. For example, the assumption that the teacher->student information transfer can be modelled by the sampling of the WHOLE teacher’s associative matrix, or that the fitness of any individual I in generation G is defined as an average payoff of all possible communication acts with all other individuals of the population, both these assumptions seem to us to be highly unrealistic in relation to functioning of groups of primates in the period where coordinated sign-meaning communicative systems emerged into existence. But ELG is interesting even if all relations of ELG to human sciences and linguistics would be considered as non-relevant. Verily, we believe that ELG is worth of scientific interest even if it is considered as a solely mathematic, informatic and/or game-theory problem. More concretely : as a stochastic framework able to converge into well-defined global optimum state (i.e. the state where in all rows of of all members of the population, there is only one entry having value 1 with zeroes in all other entries of the same row), ELG can furnish a useful toolbox for evaluation and comparison of diverse evolutionary computing approaches. Within this paper we have introduced, in experiments 3-5, an evaluation method based on nonparametric Mann-Whitney-Wilcoxon rank-sum test. We have taken as granted an analytical result published in (REF), indicating that if ever there is reasonable amount of noise present during the learning process, the population shall sooner or later converge to optimally coordinated communication system. This being granted, we were asking : « How fast shall the global optimum attained ? » and considered an evolutionary algorithm and/or a set of given parameter (k_learn, p_mutation) values as better if, for a given configuration, the algorithm converged significantly faster to global optimum (i.e. in less generations3) than in case of other configurations. 3 In this paper we had used the term « generation » which is common to Evolutionary Computing domain. But since in the sELG model, a generation consists of 1) choice of teacher(s) 2) choice of ONE student 3) information transfer from teacher to student, i.e. replacing student’s mindmatrix with a new one, determined by the teacher; it seems to be more appropriate to label such coarse-grained time steps as « lessons » or « days ». Finally, it has to be stated that in order to transform ELG into a full-fledged evolutionary algorithm evolutionary toolbox, the notion of time has to be somewhat refined. From coarse-grained notion of generation, which, in case of Nowak’s or Kvasnicka’s work, is equivalent to k_learn>1 acts of the sampling of the whole matrix, we propose to found further work on a more finer notion of a ostensive definition (Wittgenstein, 2009). One can easily understand that within the ELG model, every internal step of an envelopping matrix sampling loop 4 can be interpreted as such « definition by pointing » whereby teacher associates the sound with the meaning within the mindmatrix of the student. Thus, under the conditions that 1) diverse models are evaluated by statistical non-parametric tests comparing number « of time steps needed to attain global optimum » 2) a time-step is defined like an ostensive definition ; one could propose such fine-grained ELG as a possible evaluation toolkit not only for diverse evolutionary computing techniques, but possibly even as a more general method to assess the performance of game-theory approaches to attain Nash-like (Trapa & Nowak, 2000)equilibria . And if assumption 3) the parameters like P_mutation, K_learn, number of teachers, parents etc. can also evolve by means a genetic meta-evolution governing the subordinated linguistic evolution, it could be expected that the phenomenon interpretable as Baldwin effect shall be discovered in the world defined by ELG-framework. Bibliography Hromada, D. D., (2012). Hromada, D. D. Evolutionary insight into spontaneous emergence of shared sound-meaning mappings in multi-agent communities. Accessible at: http://localhost.sk/~hromi/academic/2012/evolutionary_insights.pdf Kirkpatrick, S., & Vecchi, M. P. (1983). Optimization by simmulated annealing. science, 220(4598), 671–680. Kvasnička, V., & Pospíchal, J. (2007). Evolúcia jazyka a univerzální darwinizmus. Myseľ, inteligencia a život (Slovenská Technická Univerzita.). Bratislava. Nowak, M. A., Plotkin, J. B., & Krakauer, D. C. (1999). The evolutionary language game. Journal of Theoretical Biology, 200(2), 147–162. Sekaj, I. (2005). Evolučné vỳpočty a ich vyuzitie v praxi. Trapa, P. E., & Nowak, M. A. (2000). Nash equilibria for an evolutionary language game. Journal of Mathematical Biology, 41(2), 172–188. Wittgenstein, L. (2009). Philosophical investigations. Wiley-Blackwell. 4 for my $row (0..$Matrix_height) { #matrix sampling for my $column (0..$Matrix_width) { $Student_A_Matrix[$row][$column]+=1 if rand()