Language, Linguistics, Logic, and Life . . . . . . . . . . . . . by Teresa Elms

  • Subscribe

  • Legal

    Copyright © 2008, Capitalist Dawg Enterprises™, running dog lackeys of capitalist imperialism since 1954. All rights reserved. See the Legal page for terms and conditions of use.

Lexical Distance Among the Languages of Europe

Posted by Teresa Elms on 4 March 2008

Lexical Distance Network Among the Major Languages of Europe


This chart shows the lexical distance — that is, the degree of overall vocabulary divergence — among the major languages of Europe.

The size of each circle represents the number of speakers for that language. Circles of the same color belong to the same language group. All the groups except for Finno-Ugric (in yellow) are in turn members of the Indo-European language family.

English is a member of the Germanic group (blue) within the Indo-European family. But thanks to 1066, William of Normandy, and all that, about 75% of the modern English vocabulary comes from French and Latin (ie the Romance languages, in orange) rather than Germanic sources. As a result, English (a Germanic language) and French (a Romance language) are actually closer to each other in lexical terms than Romanian (a Romance language) and French.

So why is English still considered a Germanic language? Two reasons. First, the most frequently used 80% of English words come from Germanic sources, not Latinate sources. Those famous Anglo-Saxon monosyllables live on! Second, the syntax of English, although much simplified from its Old English origins, remains recognizably Germanic. The Norman conquest added French vocabulary to the language, and through pidginization it arguably stripped out some Germanic grammar, but it did not ADD French grammar.

The original research data for the chart comes from K. Tyshchenko (1999), Metatheory of Linguistics. (Published in Ukrainian.)


Posted in Linguistics | 1,225 Comments »

Nonlinearity in Language: Chomsky Was Right

Posted by Teresa Elms on 2 February 2008

When we talk about “nonlinear systems” today, we generally mean complex dynamic systems that have self-organizing properties under conditions near the boundary of chaos. “Chaos theory” is the popular shorthand term for the study of such systems. But “nonlinearity” can be used in a different technical sense to refer to dimensionality. A linear system is, in this sense, one-dimensional. A nonlinear system may be two-dimensional, three-dimensional, or n-dimensional; that is, a nonlinear system has dimensionality greater than one. (This definition happens to work for fractional dimensionality as well as integer dimensionality, which will be convenient for us later.)

As it happens, Noam Chomsky twigged to the multidimensionality of language rather early. In a paper published back in 1956 by the Institute of Radio Engineers (now the IEEE), Chomsky demonstrated that human language, with its embedded constituent hierarchy, is inherently nonlinear in the dimensional sense. He then used this finding to discriminate among three possible models of language generation: (1) Markov processes; (2) phrase structure grammars; and (3) transformational grammars.

Transformational grammars won, of course.

But to me, the nonlinear structure of language is a fact of greater import than any theoretical conclusions it might have supported in 1956. To see why, it may be useful to recreate the fundamental insight.

Imagine that you are one of several beads on a string. Each bead represents the current output of a production system that emits language one unit at a time. All the beads are roughly the same in size or granularity or scale; that is, they all consist of phonemes, or morphemes, or words, or similar units of coded language output. The scale is not important, so long as the same scale is maintained throughout the production process.

Now, being a bead on a linear string, you can exchange information with the beads immediately before or after you, but there is no way for you to peak around these adjacent beads to learn about the beads further up and down the line. There is no “up” or “down” or “left” or “right” in which you can extend a head or hand and take a peak. You can detect which morpheme is carried by the bead in front of you. You can detect the fact that the bead behind you is empty. You can use these two facts and some internal transitional probabilities to generate a new morpheme to fill the empty bead that follows. But there are certain things you can’t do. For example, you can’t:

  • Repeat the contents of the previous n beads. (After all, you can’t see back n beads.)
  • Emit the contents of the previous n beads in reverse order. (Again, you can’t see back n beads.)
  • Repeat the content of the immediately previous bead n times. (After you emit the first repetition, you can’t see back beyond it to count how many times you’ve repeated it, so you never know when to stop.)

These restrictions derive from three factors inherent in the nature of information and linearity:

  1. Linear, point-to-point connectivity restricts the  information flow that can occur in the world.
  2. A bead is finite in size, and so has a finite (possibly very small) memory for the contents of preceding beads or following beads; that is, there is a restriction on the storage of information at any one point in the world. 
  3. A “bead reader/recorder” that sees all beads in sequence might have a memory for any sequence that passes through it, even an infinite memory — but any such memory must exist outside the linear bead world, as does the bead reader/recorder system itself.

Consequently, if you happen to believe the human language system is capable of repeating a prior sequence of a certain length, or of repeating the immediately preceding form n times, or of inverting the order of a prior sequence, you are tacitly acknowledging the nonlinearity of language. I think that’s wonderful.

Chomsky doesn’t state these notions in terms of dimensional connectivity and information flow, like I do, but the facts are implicit in what he does say. Chomsky’s full paper is available online at–.pdf. Citation information is available on the IEEE Web site, and IEEE members can get free full-text access as well, at

Posted in Linguistics | 3 Comments »