Title: The new arboretum of Indo-European “trees”. Can new algorithms reveal the phylogeny and even prehistory of Indo-European?*
Abstract: Abstract Specialization in linguistics vs. biological informatics leads to widespread misunderstandings and false results caused by poor knowledge of the essential conditions of the respective methods and data applied. These are analyzed and the insights used to assess the recent glut of attempts to employ methods from biological informatics in establishing new phylogenies of Indo-European languages. Acknowledgements I owe thanks for helpful comments and corrections from many sides, most of all Sheila Embleton, Joe Felsenstein, and Johann Wägele. Of course, all remaining mistakes are my own responsibility. Notes 1E.g. presence vs. absence of the Indo-Iranian or Balto-Slavic group. 2cf. also Nakhleh et al. (Citation2005). 3"Rates" express a relationship of a variable to a constant time unit, e.g. velocity for changes of distance in time, e.g. km/h. 4This is in turn part of quantitative linguistics (cf. HSK, vol. 27, see Holm, Citation2005). 5Today the variations of 14 C-dating are "calibrated" by different measures. 6"A species is a population of interbreeding individuals that is reproductively isolated from other species" (Croft, Citation2000, p. 196). 7See e.g. Forster et al. (Citation2002, pp. 13950 – 13954). 8Cf. e.g. Labov (Citation1994, p. 9f). 9I am obliged to Prof. St. Zimmer, Bonn, for this French source. 10The unpredictable decay of single radioactive atoms must not be compared with single socio-historical events of language change. That would mean to confuse the macro- with micro-level. Moreover, it must be clear that 14 C decay varies only within certain limits (see above). This contrasts sharply with languages, which can change to any degree and in any time, as I am not the only one to have amply demonstrated. 11Cf. e.g. the counting of Bird (Citation1982). 12Cf. Bird (Citation1982); Haarmann (Citation1990), passim; Hamp (Citation2002, pp. 682 – 683). 13E.g. French has a Gaulish substratum and then different (Gothic, Burgundian, & Frankish) superstrata; cf. e.g. Anttila (Citation1989, p. 171); Polomé (Citation1990, pp. 331 – 338); Kontzi (Citation1982), in general; Chap. V of Ernst et al. (Citation2003). 14E.g. the London dialect at "Early Modern English". 15For details, see e.g. Lehmann (Citation1992, p. 266ff). 16For another line of argument, see A. & R. McMahon (Citation2000). 17The terms in biological informatics used here go back to Hennig (Citation1984, passim). 18Cf. Georg (Citation2004) on Nostratic attempts. In order to limit space, I shall not cite the special sources on Hamito-Semitic etymology. 19Cf. e.g. Porzig (Citation1954, p. 55); Hamp (Citation1992; Citation1998, p. 307ff), with additional criteria; Croft (Citation2000, p. 15); Ringe et al. (Citation2002, p. 66). 20Descendence of manuscripts. 21Where possible, I write the unambiguous "k" for the respective phoneme, here along with the authority of Hamp (Citation1998) and others. 22Cf. Hamp (Citation1998, p. 342). 23Term for variables under study, e.g. (groups of) languages or dialects, species, genome sequences "S", the "leafs" or tips in a "tree". 24Cf. e.g. Wiesemüller et al. (Citation2002, p. 59ff). 25Comparable with linguistic "meaning lists". 26Problem of how concepts/meanings are named. Cf. e.g. Anttila (Citation1989, s. 7.2). 27Onomasiological lists or dictionaries, as e.g. Buck (Citation1949), continued for European languages by the late Schröpfer (Citation1979, passim). 28In fact, there were different ones by Swadesh alone, cf. Embleton (Citation1995, p. 267), with references. Additionally, there exist at least a dozen attempts at improvements. 29Usually marked by a preceding star or asterisk. 30"The basis must always remain the … material identity. It keeps workability even when the functions show greater divergences. The reverse cannot hold and only leads to unfounded assumptions and confusions." 31E.g., up to 25% in the case of the four nucleotides, one of which will be replaced by any change. 32In technical terms, "a phenetic similarity contradicting a phylogeny". 33Within the same species. 34Of course, the intersection of agreeing cognates "a" between L 2∩L 3 cannot be 57, because these retentions are situated at different places ("sites" in genome sequences) of L 2 and L 3. 35Alternatively "long branch attraction", cf. Swofford (Citation1996, p. 427); Felsenstein (Citation2004, p. 120f). 36Again, even if strongly supported by high bootstrap values, which only test the consistency by amounts of supporting features. 37In biological systematics, this effect is known as the difference between observed phenetic "D-distances". Quantitative phylogeny tries to transform these into evolutionary "d-distances". This is not possible in language research. 38Or define, e.g. "Indo-European". 39Cf. e.g. Aikhenvald (Citation2001, p. 4ff). 40Cf. e.g. Felsenstein (Citation2004b, p. 6). 41Taxa/languages that do clearly not belong to the "ingroup" under study. 42E.g. Szemerényi (Citation1990, p. 7ff); Meier-Brügger (2000, E509); or Seebold (Citation1981, s. 40, p. 322); in particular Eska & Ringe (Citation2004). 43Origin and development of individuals, which are assumed to reiterate their phylogenesis. For seldom counter examples, cf. Wägele (Citation2001, p. 180). 44Compatibility and "established aspects of IE history". 45The term seems somewhat misleading, for here a search algorithm by star-decomposition is meant, obviously unaware of synonyms in the older neighbour-joining procedures of hierarchical agglomerative clustering methods. 46Note however that the results may resemble reality if by chance the environmental circumstances are not too far from the conditions of the methods and/or the signals are strong enough. 47In particular Hittite, Albanian, and English, which naturally then behave "recalcitrant". 48E.g. Swofford et al. (Citation1996, p. 528). 49So-called 1-, 2-, 3-, or 6-clock assumptions (cf. Swofford et al., Citation1996, p. 434; Wägele, Citation2001, pp. 232, 267). 50This is sometimes classified as a distance method. Nevertheless, it works character by character, and distances are the output. 51Of 25 November 1949, published (Kendall, Citation1950, p. 49) but never since cited. This is the reason why this author was unaware of this approach when detecting these relations through working on Indo-European material of Bird (Citation1982). 52For detailed proof and explanation cf. Holm (Citation2003). 53Changes in biology would most times not fulfil the conditions for this hypergeometric distribution, as k i + k j should exceed 0.2 N, where acceptable spread can only be expected with above 0.9 N. 54The publication in PNAS is astonishing, since this is neither read by linguists nor evaluated by linguistic search engines. 55Forster naturally holds that this is detectable by reticulations; but what, then, is correct? 56E.g. izoki(n) "Salmon", cf. Pijnenburg (Citation1983, p. 240). 57Eska & Ringe professionally criticized the data of F&T, and, less convincingly, the glottochronology, but the network method with only poor understanding. The following clash (Language 81(1) 2005, pp. 2 – 3) made this even clearer. 58"Garbage in – garbage out" (old programmer's wisdom). 59After being converted from the original agreement percentages, of course. 60Dyen, Kruskal & Black (Citation1992) did address this question. 61The relatively complicated coding of different types of "cognation" obviously deters specialists from reviewing the decisions. 62in his words, "cognate only in this class" here Albanian internal. 63Via www.indo-european.nl 64"Phenetic" means "judging after (superficial) phenomena", i.e. things that appear but the cause of which is in question. 65Obviously in the default setting of PAUP v. 4.0b4a (Swofford, 2000). 66Now published in Atkinson & Gray (Citation2006, p. 93). 67G & A deny doing glottochronology, because they are narrowed to the Swadesh method. 68All integral components of the MrBayes package (Huelsenbeck & Ronquist, Citation2001, passim). 69Note that there are different terms for the subgroups, where sometimes OHG belongs to another "W-Germanic" group. 70Gimbutas (1992, p. 6) gives 4400 – 4300 BC for her first wave, corresponding to 6400 – 6300 calendar years, or seventh millennium ago. 71 http://www.psych.auckland.ac.nz/psych/research/Evolution/Response%20to%20Trask%20Take2.doc. 72The difference represents the uncertainty of the 14C determination plus the "wiggle" areas of the calibration curve. 73Unaware of the technical definition of "BP" in archaeological science, they seem to mean "sun-years ago", thereby referring to 7800 to 5800 BC, when in fact agriculture expanded from Asia Minor. 74This journal again typically has no reviewer in the field of historical linguistics. Only April McMahon, in a later article (Nature, Science update, 18 Nov. 2003), regrettably remarked, "This kind of study is exactly what linguistics needs." 75Of course, PHYLIP offers ML-programs, e.g. DNAML. 76We are not told how the cognations were found. 77The formula presented by Lohr on p. 214, as taken from the manual to the "FITCH" Program in PHYLIP 3.5 (improved in V 3.6 of July 2004), is not the one of the least squares option. 78However, contrary to scientific rules and, for example, the Dyen list, these data have never been completely published. 79 www.cs.rice.edu/∼nakhleh/CPHL/#software 80"Our latest results suggest that it falls somewhere within the Satem core!" The URL changed in 2004 and was cancelled in 2005. 81The University of Leiden project of an update is still far from being completed.
Publication Year: 2007
Publication Date: 2007-08-01
Language: en
Type: article
Indexed In: ['crossref']
Access and Citation
Cited By Count: 38
AI Researcher Chatbot
Get quick answers to your questions about the article from our AI researcher chatbot