Etymologikon™

Language, Linguistics, Logic, and Life . . . . . . . . . . . . . by Teresa Elms

  • Subscribe

  • Legal

    Copyright © 2008, Capitalist Dawg Enterprises™, running dog lackeys of capitalist imperialism since 1954. All rights reserved. See the Legal page for terms and conditions of use.

Lexical Distance Among the Languages of Europe

Posted by Teresa Elms on 4 March 2008

Lexical Distance Network Among the Major Languages of Europe

 

This chart shows the lexical distance — that is, the degree of overall vocabulary divergence — among the major languages of Europe.

The size of each circle represents the number of speakers for that language. Circles of the same color belong to the same language group. All the groups except for Finno-Ugric (in yellow) are in turn members of the Indo-European language family.

English is a member of the Germanic group (blue) within the Indo-European family. But thanks to 1066, William of Normandy, and all that, about 75% of the modern English vocabulary comes from French and Latin (ie the Romance languages, in orange) rather than Germanic sources. As a result, English (a Germanic language) and French (a Romance language) are actually closer to each other in lexical terms than Romanian (a Romance language) and French.

So why is English still considered a Germanic language? Two reasons. First, the most frequently used 80% of English words come from Germanic sources, not Latinate sources. Those famous Anglo-Saxon monosyllables live on! Second, the syntax of English, although much simplified from its Old English origins, remains recognizably Germanic. The Norman conquest added French vocabulary to the language, and through pidginization it arguably stripped out some Germanic grammar, but it did not ADD French grammar.

The original research data for the chart comes from K. Tyshchenko (1999), Metatheory of Linguistics. (Published in Russian.)

About these ads

1,224 Responses to “Lexical Distance Among the Languages of Europe”

  1. [...] This is a damn cool map of distances between European languages as measured by the commonality of their vocabulary.  It also confirms my sense that German is a [...]

    • I understand this posting is from 2008 and I don’t expect any more recent comments…. but

      I want to develop a way to measure distance among and between languages / dialects, showing numerically that, for example the dialect in Oslo is closer to the dialect in Tromsø than the dialect in Bergen. I would want to take into consideration not only lexical items, but morphology, phonetics, (and sub- super-phonemic phenomena like tones.

      It will be complex, I know, but any help or ideas is appreciated!

      Louis Janus
      janus005@umn.edu

      • Kjetil Rå Hauge said

        That is a tall order, but you might begin with this article: Peter Houtzagers , John Nerbonne & Jelena Prokić (2010) “Quantitative and Traditional Classifications of Bulgarian Dialects Compared”, Scando-Slavica, 56:2, 163-188 (the U. of Minnesota library ought to have it [men i nødsfall: mail meg]), where “Levenshtein distances” between modern Bulgarian dialects within the Republic of Bulgaria are measured, mainly on criteria of diachronic phonetic/phonemic development, rather than on lexical correspondences (which may be due to borrowing and may be more or less masked by phonetic and semantic developments in the receiving language). Let me also quote a few more sources from the references of that article:

        Heeringa, W. 2004. Measuring Dialect Pronunciation Differences using Levenshtein Distance. Groningen: PhD thesis, University of Groningen. Available at http:// irs.ub.rug.nl/ppn/258438452.

        Nerbonne, J. and W. J. Heeringa. 2010. “Measuring Dialect Differences”. In P. Auer and J.E. Schmidt (eds.), Language and Space. An International Handbook of Linguistic Variation. Vol. 1: Theories and Methods, Berlin/New York: de Gruyter/ Mouton, 550–567.

        Nerbonne, J. and W. Kretzschmar (eds.). 2006. Progress in Dialectometry. Special issue of Literary and Linguistic Computing 21(4).
        Prokić,J.andJ.Nerbonne.2009.“RecognizingGroupsamongDialects”.International Journal of Humanities and Arts Computing, Special Issue on Language Variation, edited by John Nerbonne, Charlotte Gooskens, Sebastian Kurschner, and Renée van Bezooijen.

        Prokić, J., J. Nerbonne, V. Zhobov, P. Osenova, K. Simov, T. Zastrow, and E. Hinrichs. 2009. “The Computational Analysis of Bulgarian Dialect Pronunciation”. Serdica Journal of Computing 3 (3): 269–298.

        Lykke til!

      • hei og takk Kjetil Rå Hauge. I will get some of the suggested resources and read them. Aldi hadde jeg håpet at noen skulle besvare mitt spørsmål… men jeg setter pris på det.

    • DMQ said

      Looking at the closeness between English & French perhaps gives insight to your leanings towards being slightly Romantic? (D)

    • Colette said

      Despite the fact that Englih has a German origin, 29% of modern English words come from Latin and the same percentage come from French. http://en.wikipedia.org/wiki/File:Origins_of_English_PieChart.svg

      • Well, coming from Latin and coming from French sounds like 60% Romance language?
        I favour the view of English as a hybrid language

      • 88costa said

        I agree with Colette. The chart is a blunt instrument. The language groups (e.g. Romance) should be in a circle, and the links between groups would make sense. As it is, French is shown as related to Greek, but Italian is not. There should be an average of the relationship of all sub-langauges with another language (or language group) and use that. Otherwise, it should be language-by-language. Also, it shows Albanian as related to Slovak, when it means to show that Alb is related to Slavic. In that case though, it should be shown as related to Bulgarian and Serbian, because it has weak statistical links to Slavic , as a group.

      • Gary Rickard said

        @88costa: I believe the chart is showing Albanian as connected to Slovene, not Slovak.

      • @Manuel Herranz
        Lexically. But everywhere else in the language, it’s clearly not a hybrid. It just happens to be a very distant language.

        Besides, look at French’s pronunciation, which is extremely different from the other Romance languages. If anything, French is very distant, and doesn’t really represent being a Romance language.

    • Andreas Melichar said

      I would see a connection between German and western slavic languages, since they lived close to each other: Austrian Hungarian,…

      • Marius said

        Geographic neighborhood is irrelevant. E.g. hungarians and romanians live close together for more than 1000 years, have some regional linvgistic exchanges but are still totally unrelated.

  2. Very interesting. There could be more variation to the size of the circles: now a language with 100 million speakers has a circle that’s just slightly bigger than language with 5 million speakers. But, overall, very fascinating with interesting analysis. There certainly is a very strong connection between English and French.

  3. Matt said

    Very interesting! BTW, is there a key for the language abbreviations used in the graph? Most are self-explanatory, but there are a few that aren’t obvious (Rm, Pro, Sr).

    • Paul Held said

      (RM) Romanian, (PRO) Provencal, guessing from the proximity to Polish that SR is Sorbian.

      • astheart said

        Well, I would say you are wrong, sorry. I have no idea what the author meant by Sr, but there is no language like Sorbian; you can find Serbian (SRB), but it is far from Polish. Moreover, I am a language teacher who grew up and lives on Czech-Polish border, so I can speak both languages very well. I can assure you that there is no language between them. Czech and Polish are very close, but Czech and Slovak are even closer; they are so close that any Czechs and Slovaks don’t need any interpretors, and the can speak their mother languages during conversations without any problems. Still, they are different languages. I think something like Sorbian doesn’t exist, anyway, and Longman Dictionary doesn’t know such an expression. :)

      • astheart said

        Sorry, missed a letter: …. they can speak their mother ….

      • carson said

        the sorbs(sp) live ont eh board to germany and poland or germany and the czech republic. They do have there own language. http://en.wikipedia.org/wiki/Sorbian_languages

      • pjt said

        Astheart, what do you mean ” there is no language like Sorbian”?

        See http://en.wikipedia.org/wiki/Sorbian_languages and think why ISO 639-2 has code wen.

        Or do you actually say this in the meaning of the English phrase “there is no language like Sorbian”, i.e. Sorbian is the best of languages?

      • Karel said

        Astheart, Sorbian is what we call Lužická srbština in Czech, it is correctly placed in the diagram. A language (actually two slightly different languages) nearly died out but still spoken by a few thousands people in Saxony around Bautzen/Budyšin. It is a nice language that Czechs and Poles also understand well, and it is not much further from Czech than Slovak. We are used to understand Slovak and often understand Polish, yet most Czechs probably never heard Sorbian. I only read a few texts in it and I understand most of the words.

      • astheart said

        If you had read all comments of mine, you would know I have found out what Sorbian is. Still, I don’t agree it is placed correctly as I heard it and was lost, :). I usually don’t have any troubles with languages that are close to Czech. I keep my opinion that Polish is much closer since I can speak Polish and I learnt it just by listening to it very often during my childhood ( I grew up in Silesia.), :) .

    • ddh74 said

      “Rm” probably is Romansh, which is spoken in Switzerland; “PRO,” Provençal; and “Sr” is probably Sorbian.

    • Barna said

      rethoroman (Rm) in Swiss, provancal (Pro) in South-France, sorb in South-East-Germany.

    • Gerard M said

      GerardM
      Matt, I guess Rm means Latin (Roman), PRO would mean Provençal, I don’t know about SRD but it should be something close to Italian
      CAT should be Catalan
      GLC should be Galician.

    • astheart said

      @Pjt : Sorry I confused you. I had never heard about Sorbian before I wrote that comment, and couldn’t find any information. I mean I didn’t know that expression. :) Then I found some, and I wrote it in my later comments. Now I know what is meant by Sorbian; in my language it is lužická srbština. Still, I keep my opinion that language (if it really is a language, not only a dialect – I am not sure) cannot be put between Czech and Polish as it is totally wrong. I am Czech and I live on Czech-Polish border. I have no troubles with Polish, but when I hear Sorbian/lužická srbština, I have terrible troubles with understanding. Also, Silesian is not a language, it is just a dialect (I can speak it as well.); it changes from a place to place, has not any written code, and no other signs of language. Somebody wrote here that Silesian is Germanic… Well, it cannot be at all (and frankly, it makes me smile). Silesian is a dialect with expressions taken from Czech and Polish, and its “grammar” (It cannot be called grammar in fact) is the mixture of those languages, not similar a bit to German. Though, there are some expressions of German origin, but it’s because of the existence of former Austro-Hungarian Empire. Silesian hasn’t got its written form as well. :)

      • mirime said

        As I said before Sorbian belongs to western slavic language group and in history the country Lužice was the part of the Czech kingdom till 1814.In some parts of German Sorbian is official language and it is taught at schools.

      • david said

        @astheart:

        Your argument is wrong, I believe.

        The reason you understand Czech and Polish well is because you are familiar with those languages as they are both spoken in Český Těšín, or Třinec, or wherever you live. Your argument would be valid if you spoke Czech, had no prior exposure to Polish whatsoever and then encountered both Polish and Sorbian and could understand Polish better (also, this talks about vocabulary only and not about grammar).

        I had no idea what Sorbian is either…

      • Katinka said

        “Somebody wrote here that Silesian is Germanic… Well, it cannot be at all (and frankly, it makes me smile)”

        The term “Silesian” refers to two different language varieties:

        1) a dialect of the German language that used to be spoken in Silesia. Due to the expulsion of almost the entire German population of Silesia after the Second World War and their subsequent resettlement in various parts of today’s Germany, this dialect is almost extinct. It’s only spoken by very few and very old people.

        2.) a variety of the Polish language with some German influence, which is still spoken in the Silesian region of Poland today.

        “because of the existence of former Austro-Hungarian Empire”

        - Silesia belonged to the Austrian-Hungarian Empire only until the mid-18th century. Then it became a part of Prussia and, after the foundation of a united Germany in 1871, was a part of Germany until 1945 (except for some parts of Upper Silesia, which were granted to the newly established Polish State after WW1). Lower Silesia used to be almost exclusively German-speaking; Upper Silesia had a mixed population of German and Polish speakers. After the Second World War, the entire territory was given to Poland and, after the expulsion of the Germans, resettled with Poles, many of whom had themselves been expelled from Eastern Poland by the Soviets.

      • astheart said

        David, sorry, but I don’t think I am wrong. My view is not based just on my place of living. I am an educated linguist, and I am sure I know what I am talking about. Your opinion is different; okay then, but it doesn’t mean you’re right. I can speak several languages on a decent level, and because of my linguistic education I can see relations among them deeper than somebody who just grew up in Silesia. (It is neither Třinec, nor Český Těšín, :) )

      • astheart said

        Mirime, from the Middle Ages the ownership of Lužice changed many times, and there were some periods when it was ruled by the Czech King. (Anyway, there were times when the Czech Kingdom was much bigger than Czechia is now, and also some Czech Kings were Holy Roman Emperors.) But, in 1632 it was given to the Czech Kingdom as a Saxon pledge, and in fact, it was only a formal act. Czech Kings respected it because of the Catholic religion of its inhabitants, and that is also the reason, why Sorbians weren’t assimilated and their language survived. This is the only thing they have common with Czechs; no common history, no common culture.

      • mirime said

        As an educated linguist to other educated linguist I only want to say that if you compare basic vocabulary you can see that Sorbian as a language is in the same language group and it relates to Czech, Polish and Slovak.It was my basic thought. And if you look at historical or linguistic maps you can see that the land inhabited by Sorbians wasn´t insignificant in past.
        On the grounds of your incomprehension of language you can´t say what you said.It isn´t scientific approach.

      • mirime said

        Only some interesting details:http://vlast.cz/luzice/ :-)

      • mirime said

        And last thing do you really think that almost 260 years common history didn´t influenced both nations?Maybe you can explain me why Sorbians tried to connect to our country everytime when the real possiblity appeared (after WWI or WWII) if we didn´t have something in common.

        To sum up from the point of view of history and linguistics Czechs and Sorbians are much closer to each other than you assumed.

    • Soft Accent said

      Rm is most likely Romantsch (or Rheto-Roman), while Romanian is abbreviated as ROM.

  4. Vato said

    Would be nice to see how distanced Georgian, Armenian and Turkish languages are from all the rest of these European languages.

    • Observer said

      Armenian is at least an Indo-European language, but is missing on the chart.

      • Aleks said

        Sorry but is no Turkish language, Pseudo-Turkish language is a mix of Mongolian root, mixed with : Arabic, Persian, Assyrian,Greek and Albanian.

      • Martin said

        it has no european roots at all, it comes from the caucas region so is influenced by arabic languages to the south …

      • Karl said

        Actually, Armenian is not much influenced by the arabic languages. It’s an indo-european language whose roots are in the Persian Highlands, like the other indo-european languages, which spread from Persia to Europe since about 2000-1500 BC, nearly totally wiping out the ancient european languages with the exception of the Basque language. Persian itself is also an indo-european language. Armenian is thus much closer to English than to Arabic.

  5. Nice! It would be greeat, though, to have a legend that explains what the names of the different languages are. I can’t, for the life of me, guess a language that’s somewhere between Spanish and French that could be called “Pro”. Or the Srd…

  6. Oooooh, yeah, just figured it out! Provençal and Sardinian, isn’t it? Still, it would have benn better if I could have found it in the capture… Anyway, very cool graph!

  7. Andris said

    Reblogged this on Tā dzīvojam. Ikšķilē. Latvijā..

  8. […] Sursa: elms.wordpress.com […]

  9. DanielR said

    Romanian has more loanwords from Hungarian and Turkish than from Albanian, but the graph doesn’t show that

    • Barna said

      because of the trakian – illirian relationship! :P not dacian- roman!!! ;)

    • Gabriela said

      It has actually very few words from Hungarian. Of course, it has from Turkisch, Slavonian and French a great deal. But we share a lion’s share of our ACTIVE vocabulary with Albanians through thraciana –> Dacians, thracians – Illarians. There will be yet a lot to be discovered. Future genetics research will reveal it.

  10. kalot said

    And the distance between the romanian and hungarian?

    • Barna said

      it is correct! I don’t know rumanian words in Hungarian languages, may be oláh <— vlach (earlier name of Rumanian nation).

    • Daniel said

      Just about accurate in the image above. There are some loanwords but they aren’t even close otherwise.

  11. Tineaux said

    Basque?

  12. Omar said

    Reblogged this on Blogging to discover and commented:
    Lexical Distance Among the Languages of Europe

  13. Kasia said

    What country is this between Polish and Czech?

    • alexander said

      No Country but language : Sorbian

    • It probably refers to the Sorbian languages, spoken in in the Lusatia region of eastern Germany.

    • Marek said

      I can´t find out either. Śląsk – the only possibility. But is there any specific language? Dialect, I suppose. However, it can not be closer than Czech and Slovak.

      • astheart said

        Agreed. I don’t think some language could be between Czech and Polish. I have just found out what Sorbian is, which I didn’t know when I was writing my previous comment. But, I am persuaded Sorbian is spoken by a very small group of people, and it is nearly dead. Anyway, then Sorbian is close to Serbian, but not between Czech and Polish, no way. Moreover, being Czech I can say I don’t understand Sorbian at all, but I understand Polish without any problems. :)

      • Karel said

        Astheart, the Upper Lusitanian Sorbian is closer to Czech while Lower Lusitanian Sorbian is closer to Polish. Some words in ULS are closer to Czech than some Slovak words, some not, but you would understand perfectly if you heard the language as often as Slovak.

    • Marek said

      Łużyce (język serbsko-łużycki, po niemiecku Sorbisch)

    • maybe Sorbic, a minority in germany close to the polish and czech borders.

      • astheart said

        Not close to Czech borders. It could be the language of Sorbs, a little group of nearly assimilated minority in Germany close to Polish border.

      • mirime said

        But the country Lužice where people spoke Sorbian was part of the Czech kingdom till 1814. And Sorbian belongs to western slavic language group like Czech, Slovak and Polish.And if you compare words you can see similarities like večer(czech)- wječor/wjacor(sorbian) – wieczór(polish) – večer(slovak) or sníh-sněg/sněh-śnieg-sneh…

    • ddh74 said

      Not a country–a language, Sorbian.

    • Barna said

      no country just a little nation, perhabs sorbs (wends)… ?????

    • Fernando said

      It’s not countries but languages. Silesian is the asnwer.

      • astheart said

        Silesian is a dialect spoken in the region I live in. It cannot be called a language, and it is not Sorbian for sure.

  14. AKMA said

    I wonder where Basque fits into the picture….

    • Karl said

      It doesn’t. Basque doesn’t fit any picture. It’s a language on its own, not related to any other living language.

    • Gabriela said

      Basque should be separate since it is the language of Iberians, at the time in Europe there was Latin as well, but we do not see it either. Basque should be there, Latin in exchange vanished.

  15. Ada said

    Like a lot the idea of building such a map. 2 questions: 1) what is exactly “vocabulary divergence”? 2) If I understood corectly “vocabulary divergence” then there should be more links between Greek and oth languages. What about Romanian and Slavic languages? As far as I know at least 30% of Romanian words are Slavic. Thanks!

  16. Basque said

    When you say “Among the Languages of Europe” you mean “Among the Languages of Europe”, isn’t it? I can’t see the basque language in the chart…

    • Basque said

      Sorry, I correct:

      When you say “Among the Languages of Europe” you mean “Among SOME OF the Languages of Europe”

      • Pilar said

        Pay attention to the text: she said “among the MAJOR languages of Europe” [in number of speakers, I suppose]. Furthermore, Basque do not belongs to Indo-European language family (like Finno-Ugric group – represented, in the diagram, by three different languages).

      • infallible and modest said

        Well, it says “major” languages, and Basque certainly is majorer than most of the Celtic languages that are listed, for example.

      • Bittor said

        Basque has around a milion of speakers!

  17. Rick said

    Fascinating chart, very nicely done. Question: why did you name the Latin language group ‘Romance’? I mean, if you’re referring to the Roman roots of these languages, the “ce”-affix still seems superfluous and unintentionally indicative of a style-period in art history.

    • Daniel said

      Because that’s what it’s called: http://en.wikipedia.org/wiki/Romance_languages

    • Jakob said

      This is the conventional way of referring to the descendants of Latin. See http://en.wikipedia.org/wiki/Romance_languages.

    • Séa said

      Because that is the correct technical term.
      I once was looking at an email of French colleague and it said “recieved at 11.15 Romance time”. I had never encountered the term with that usage before. A quick scan of wikipedia shows multiple entries….
      Speaking of wikipedia:
      “The term “Romance” comes from the Vulgar Latin adverb romanice, derived from Romanicus: for instance, in the expression romanice loqui, “to speak in Roman” (that is, the Latin vernacular), contrasted with latine loqui, “to speak in Latin” (Medieval Latin, the conservative version of the language used in writing and formal contexts or as a lingua franca), and with barbarice loqui, “to speak in Barbarian” (the non-Latin languages of the peoples living outside the Roman Empire). From this adverb the noun romance originated, which applied initially to anything written romanice, or “in the Roman vernacular”.

      The word romance with the modern sense of romance novel or love affair has the same origin. In the medieval literature of Western Europe, serious writing was usually in Latin, while popular tales, often focusing on love, were composed in the vernacular and came to be called “romances”.”

      tada

    • Tom said

      That’s the standard broadly-accepted, traditional name for the Latin-origin language group. This sense of the term has the same origin, and is older than, the other sense to which you refer. While Wikipedia is never an authoritative source, their writeup on the topic isn’t bad: http://en.wikipedia.org/wiki/Romance_languages#History and of course following their footnotes is a good idea.

    • Fernando said

      It is the scientific name for that language group.

  18. Reblogged this on systems perestroika – éminence grise.

  19. Where are Corsican and Basque?

  20. Where is the Basque language?

  21. Mykola said

    I met Kostyantyn Tyshchenko
    in 2004, then in 2007 years. It was very interesting to be on his lecture. Also I love his humor. Real scientist.

  22. ….and where is Malta’s native language : Maltese :) ?

  23. I wish there was a key showing what the abbreviations stand for. Most of them I can guess, but, for instance, I do not know what Germanic language is “Fri” or “Bok”…am I ignorant?

  24. Jesper said

    Yes very cool map! Two things I don’t understand:

    1. Shouldn’t there be lines between all languages, as each pair all must over 71 in common. / what does no line mean?

    2. Shouldn’t there be lines between all in each group? Eg german and swedish? And what about German and French? Why no line?

  25. What about the Basques?

  26. Andrew Constable said

    I can’t find Basque…

  27. selikal said

    what does FRI mean?

  28. Reblogged this on My Twirly Blog and commented:
    Language twirls and clusters. Constellations of words like stars.

  29. […] and quantum physics; ‘Spooky action’ builds a wormhole between entangled particles; Lexical differences among the languages of Europe. And, if you’re interested in learning new things in 2014, Buzzfeed have put together a handy […]

  30. […] via Hacker News http://elms.wordpress.com/2008/03/04/lexical-distance-among-languages-of-europe/ […]

  31. tr said

    where is hungary , węgry?

  32. Nicola Tronci said

    Which language is “SRD” ? The one close to Italian in this chart

  33. Steph said

    I can’t see bask language in the diagram…

    • Ash said

      Basque is not related to these Indo-European languages, and that’s why.

      http://en.wikipedia.org/wiki/Basque_language

      • That’s an unsatisfactory response because other non-Indoeuropean languages are listed. The reason might be that there is almost no loan words exchanged between Basque and its neighboring Indoeuropean languages. The only loan word I know of that the Spanish imported into Castellano from Basque is the word “izquierda” meaning “left.” The source is the Basque “izkerdu.” The Spanish were terrified of the word “siniestra” which derived from Latin “Sinister,” and the Basque word did not include the semantic feature of “evil, unholy, satanic” that “siniestra” held.

  34. Jerome said

    Slavs are not human beings.

  35. Marcin said

    I can’t find Maltese here (maybe I am dumb), can you help?

  36. […] you to Teresa Elms at Etymologikon for putting this fascinating information together, which is based on original research data from K. […]

  37. Antonio said

    What does it mean exactly “vocabulary divergence”? I mean: two languages sharing the same words but with different meanings all of them, would they still be close to each other?

    Cheers

  38. Geir Atle Ekaas said

    Very cool map!
    But not everything is right… one wrong is that Norway shows with to few that speak the language. Norwegian language (No not NN) should be marked as a language spoken by more than 3 million citizens.

    • Geir Atle Ekaas said

      …I got i now: New Norwegian is marked as NN and Bokmål as Bok :-)) Then I guess the dots are in the right size for both of them.

    • Jostein Greibrokk said

      Some more details about Norway: In Norway there are two official languges: The majority – bokmål (“Book language”, BOK) derives from Danish, and the minority – nynorsk (“New Norwegian” NN) made in the mid 19th and based upon the dialects and the old norse language.

      • Trude said

        Sorry, but that is wrong. Norway has two official languages, Norwegian and sami. Norwegian is splitt in two forms (målformer); Bokmål and Nynorsk.

        I miss the sami languages on the map, they are highly relevant in Scandinavia.

  39. Nesib said

    Bosnian language is missing between Croatian and Serbian

    • Karl said

      Until a few years ago, there even wasn’t a Croatian and a Serbian language, it was a common language called Serbocroatian. Language politics in Serbia and in Croatia tried to make a difference where no difference was before.

  40. Of course! said

    So, apparently Turkish is not an European language?!

  41. Where is Latin? The positioning of certain linguas is missleading…Real life is 3 dimentional…

  42. xme said

    and basque?

  43. Missing Basque

  44. […] distance between languages, and language families. I’m definitely going to pull out this infographic on the Lexical Distance Among the Languages of Europe the next time I have to explain […]

  45. Alex said

    Yes, the original research data for the chart comes from K. Tyshchenko (1999), Metatheory of Linguistics, but this book was published in _UKRAINIAN_.

  46. I’m missing Basque

  47. joe said

    ignoring Turkish and Turkic languages was a fatal mistake as they are the dwellers the of the Europe over 800 hundred years and currently över 6 million Turkish leaves un Europe.

  48. And Basque (Euskera) must be on another planet, right?

    • corcharelli said

      Big mistake. Euskera must be in the middle, without connections with others languages because nobody knows its origin.

  49. what “srd” stands for?

  50. Phaedra Royle said

    What about Basque (a non-European language of Europe)?

  51. Piotrek said

    Sr – circle refers to Silesian?

  52. Jonathon said

    Where’s Basque?

    • Quarion said

      I think because they can’t really place it.

      http://en.wikipedia.org/wiki/Basque_language

    • Ay S. said

      It’s likely left off because it’s not Indo-European; it belongs to none of the language families shown. In fact, it belongs to no language family at all: it’s an “isolate” language. That said, there’s a great deal of lexical sharing between Basque and Spanish, and Basque and French.

    • Marc said

      Thats what I asked myself, too!

    • Monkey said

      Somewhere out of the map, playing the language isolate game.

    • Carla said

      very far away from everything we know :)

    • beca said

      Did you imagine Basque is the only regional language missing?

      For Romance languages alone—where’s Piedmontese (1 million speakers), Lombard (2 million speakers), Ligurian (500000 speakers), Corsican, Aragonese, Leonese, Neapolitan, Sicilian, Emiliano-Romagnolo, Gascon, Ladin, Galician, Fala, Mirandese, and many others—not to mention all the dialects of these languages (which in most parts of Italy varies considerably from town to town).

      Unfortunately, if a language is not “official” it may as well be given a death sentence. God knows how many of these languages will be around in 100 years.

    • Laura said

      Bai! (means “yes” in Basque) How would Basque connect with these other languages?

    • Juan said

      exactly.

    • fccoelho said

      Basque is probably disconnected from this graph. see: http://en.wikipedia.org/wiki/Basque_language

    • Cédric said

      Basque is not Indo-European, man, it’s an isolate.

    • Cédric said

      And… Maybe it’s not considered to be a “major language of europe” ;)

    • I was wondering that myself…

    • Michael said

      Basque isn’t part of the Indo-European family.

    • bartiddu said

      It should be in a circle that’s not connected to any of the others at all!

    • Sean said

      On it’s own little linguistic island, driving everyone around them mad!

    • IMO the chart displays mostly Indo-European languages except some Finno-Ugric (Fi,Hu,Ee) which have had influenced some Indo-European languages. Basque is not IE.

    • Brad said

      Basque is a language isolate, so it wouldn’t be included in an Indo-European language chart.

    • Mak said

      Basque is not a indoeuropean language, basque is an isolated language which has no real connections to other languages.

    • Hannah Gold said

      Basque is not Indo European as far as anyone can tell. It doesn’t fit in with any known family, eg, Semitic (Arabic, Hebrew, various N African languages) or Turkic (Turkish, Central Asian languages).

    • Good question. Basque is not an indo-european language, but neither are Hungarian and Estonian, however shown on the map.

    • bob said

      This is either a very good question or a subtle ironic joke. Nice work!

    • Jules said

      Most theories about its origin seem unable to link it with other existing languages spoken in Europe. I suspect that’s the reason they didn’t include it.

      • Jeff S said

        Best theory I’ve ever encountered regarding the origins of Basque: It was the only language spoken in much of western Europe until Indo-European speakers, emigrating (or fleeing) from eastern lands in the area of Boaz Koj, (near current Turkey) overran Basque-speaking territories and pursued the peaceful Basque speakers, pushing them out of all their communities. Fleeing toward the Pyrenees, found refuge in the shelter of that rocky and inhospitable environment. Below them, the speakers of Indoeuropean dialects which eventually developed into distinct language families, as well as speakers of proto Altaic, and proto Finno-Ugrian eventually carved out areas of settlement. Thus, the Basques were linguistically, geographically and socially isolated from the rest of the European world, surviving as a closed and remotely located culture into modern times.

    • giovanni cacioppo said

      in your ass

    • Peter Rutenberg said

      Basque is unrelated to anything except may distantly to Etruscan.

    • Karlos said

      Basque is not in that chart just becouse it is unrelated to any of the languages in it. Basque is older than any other language in Europe and there is no study that could yet confirm its origin beyond any doubt

    • Basque has no relation to any other language. Which is significant.

    • Admin said

      should be a medium sized blob, probably hidden by the key below Spanish :-)

    • Pedro Paulo Krahenbuhl. said

      O Basco e o Lapão não são linguas indo-européias,

    • John said

      Agreed. Basque is clearly a European language. Hungarian, Estonian, and Finnish originated in Asia, so they’re less European than Basque.

    • Erwan said

      Basque is not an indo-european langage. The fact that this langage is still alive is amazing.

      http://en.wikipedia.org/wiki/Basque_language

    • Katie said

      Basque is a language isolate. It’s “mother” language has died out leaving it alone. It is considered “celto-iberic”

      • Xaime M said

        Sorry but it is not true. Basque is not related on any way to celtic languages or to the ancient Celtiberic language.

    • Isissibus said

      Basque doesn’t belong to any linguistic family, and it doesn’t have any relationship with european languages.. It is a “mysterious” language.

      • Aleks said

        So is Albanian . Nobody knows anything about Albanian language, many think that is the very mother of all languages in Europe. I m not Albanian by the way.

    • David said

      Basque is an isolate, with no known cognates.

    • Basque isn’t related to the others at all. It’s a language isolate.

    • Basque is unrelated to any of these, which are all Indo-European languages, and thus I suppose could not be “placed”: it would be equally distant from all of them, I imagine.

    • Altay said

      You count hundreds languages that this so-called scholar has forgotten.

    • Edurne Alegria Aierdi said

      Beacuse the origin of basque language or euskara is still unknown. It does not belong to any known language family.

    • Kim said

      That’s a good question, and I’m pretty sure the researchers themselves don’t know either. As a matter of fact, Basque is a language isolate. So whereas it should have been put on this map, because it IS an European language, where to put it is another question.

    • If you can’t find Basque on the chart, that just means what is obvious – Basque has virtually nothing in common with other European languages (it’s a language isolate).

    • Gerardo Romero said

      El Vasco es una lengua aislada, aparentemente sin relación genética con ninguna de Europa.

    • el jota said

      yes, where is it?

    • i just wondered about the Basque Country. Euskera it supposed not to the linked to any other language.

    • beco said

      I don’t see it, but should be a completely isolated island somewhere in the graphic, since Basque people arrived Europe (iberic peninsule) long before most of other cultures (NatGeo: http://newswatch.nationalgeographic.com/2012/03/06/basque-origins-predate-arrival-of-farmers-in-iberian-peninsula-dna-analysis-finds/) and some grammatical similarities have been found between Basque language and a tiny asian dialects.

    • 10 cm left from your screen ;-)

    • Benjamin Conway said

      Basque is an language isolate, so (I believe) it shares no connection with any other living language.

    • Floating around somewhere in space, totally unrelated to anything ever.

    • Basque is not an Indo-European language. Hindi shares more with these languages than Basque. Interesting isn’t it.

    • sonja said

      Norther Spain.

    • Doniskicheck said

      Basque is not a language… it’s just a spanish dialect

      • Wendy said

        No it’s not.

      • Jaume Saumell said

        With all due respect, you don’t know anything about Basque and Spanish, do you? In simple words, a language is a dialect from another language when a native speaker from one and a native speaker from the other are able to understand each other. Not everything is equal but there’s enough similarities to allow a fluid communication. Basque is completely dissimilar to ANY OTHER LANGUAGE IN THE WORLD, so it’s absurd to say it’s a dialect from any other language.

        On the other hand, Spain is a country with FOUR different official languages (that is Spanish, Catalan, Galician and Basque). And I said DIFFERENT LANGUAGES, not one language (Spanish) with three dialects. And I should know what I’m talking about since I speak three of them and know a lot about the fourth.

        So please, next time at least read the Wikipedia before saying such nonsense.

    • Frank said

      Basque isn’t connected to any of the other languages, so it drifted away off the chart.

    • Silvia said

      Basque doesn t belongs to Indo-European languages family.

    • Rhina said

      Basque is one of the Finno-Ugric languages, unrelated to Spanish, despite the location of its speakers, except for current borrowings from it.

    • Alexandros said

      Basque is a pre-indoeuropean language and has nothing to do with the rest. But so is Finno-Ugric…

    • AMAIUR said

      As far as I know (with the right of Basque speaker) I know that the basque is a non-Indo-European language, as the Finish and the Hungarian mean to be. But if the article is about the “Lexical Distance Among the Languages of Europe”, I feel the basque should appear too (more over 50% of the basque lexicon come directly from the Latin according to some researches). So if the article would be about the relation between the Indo-European language we can dispense with the Basque but not when we want to speak about the lexical relations between the languages in Europe (nothing said about Indo-Europeans there).For the rest I like very much the idea of the graphic.

      • Jeff S said

        The presence of loan words in a language is no demonstration of genetic relationship at all. English has borrowed lexical items from a wide variety of languages, many of which aren’t even Indoeuropean. Basque, as I have stated in another entry, is a language isolate which a number of researchers now think was spoken by a population that originally had lived well spread out within Europe and was millenia ago pushed further westward and,ultimately, into Spain by Indoeuropean speakers, and up into the Pyrenees where no one else wanted to settle. This theory was published in the Science section of the New York Times a few years ago. It is a conjecture, but no one has come up with a better one.

      • formiga said

        Jeff S said
        15 January 2014 at 10:45 pm

        The presence of loan words in a language is no demonstration of genetic relationship at all. English has borrowed lexical items from a wide variety of languages, many of which aren’t even Indoeuropean. Basque, as I have stated in another entry, is a language isolate which a number of researchers now think was spoken by a population that originally had lived well spread out within Europe and was millenia ago pushed further westward and,ultimately, into Spain by Indoeuropean speakers, and up into the Pyrenees where no one else wanted to settle. This theory was published in the Science section of the New York Times a few years ago. It is a conjecture, but no one has come up with a better one.

        Except for the fact that Bask is spoken around the Basque Mountains and not around the Pyrenees, where Catalan is spoken…

      • Jeff S said

        Formiga is not quite right. There is no such entity as the Basque mountains. The Pyrenees separate Spain and France. Catalonia (also spelled Cataluña) occupies the extreme northeast of Spain, so it does touch upon the Eastern Pyrenees. But San Sebastian, the capital of the Basque region, touches the base of the western Pyrenees, and a large percentage of the Basque people live within that mountain range, some on the Spanish side, some on the French. Read the following Wikipedia excerpt:

        Basque (endonym: Euskara, IPA: [eus̺ˈkaɾa]) is the ancestral language of the Basque people, who inhabit the Basque Country, a region spanning an area in northeastern Spain and southwestern France. It is spoken by 27% of Basques in all territories (714,136 out of 2,648,998).[1] Of these, 663,035 live in the Spanish part of the Basque Country and the remaining 51,100 live in the French part.[1] Basque is considered to be a language isolate.[2]

        In academic discussions of the distribution of Basque in Spain and France, it is customary to refer to three ancient provinces in France and four Spanish provinces. Native speakers are concentrated in a contiguous area including parts of the Spanish autonomous communities of the Basque Country and Navarre and in the western half of the French département of Pyrénées-Atlantiques. The Basque Autonomous Community is an administrative entity within the binational ethnographic Basque Country incorporating the traditional Spanish provinces of Biscay, Gipuzkoa, and Álava, which retain their existence as politico-administrative divisions.

      • formiga said

        Jeff S: of course there are Basque Mountains; I’ve seen them myself: http://en.wikipedia.org/wiki/Basque_Mountains

      • Jeff S said

        Thank you for your comment. Note that the very article you sent me said: “some consider that the Cantabrian Mountains and the Pyrenees are a single greater range and the Basque Mountains are just part of both [1]” So I suppose both of us have some justice in our conclusions. The major point of what I´ve written regarding the Basque language (Euskara) is the hypothesis that they once inhabited a much larger area of Europe and that millenia ago were pushed further and further west by incoming peoples of Indoeuropean and Finno-ugrian linguistic stocks, and that the Basque speakers could not resist them and retreated further and further west, into the highlands of Southern France and Northern Spain where the terrain was so unfriendly that their pursuers had no further interest in chasing them. Thus, they became the indigenous population of those highland areas. It´s an interesting conjecture and would explain why they are a language isolate.

  53. That’s right, everyone. The most utilitarian Romance language to learn is Italian.

    Take *that*, Spanish!

    (Now, how do I get my lawn guy to stop mowing down my hedges?)

    • Russel said

      I dont completely agree with this chart. I think that it is missing several links, such as Portuguese and Italian, Portuguese and Romance, and others.

      Also, the size of the ballons showing the quantity of speakers is misleading. They needed to create more categories (such as >100 million, >300 million, >500 million). Looking the chart, you think that Polish, Ukraine, German, Italian, Portuguese are in the same level of English, French, Spanish.

      Although I am Portuguese speaker, I still strongly believe that if you learn Spanish, you will be able to communicate with waaaay more more people in the globe than if you learn Italian.

    • carlosarepa said

      Actually not. I think it would be Romanian, since it preserves most original Latin features than any other Romance language. So with Romanian, it would be easier to understand other Romance languages.

      • Anna said

        just learn latin if you want to understand romance languages. plus because its a dead language it should be eaisier to learn

    • Ricardo said

      Looking at that chart, where all bubbles with over 30MM speakers are the same size, you would think so. However, there are about 60MM native speakers of Italian, 200MM of Portuguese and 390MM speakers of Spanish. So the Por-Spa duo has 10 times more speakers than Italian…

      Cheers.

    • Someone said

      According to this graph, Spanish is closer to more Romance language than Italian (even though Italian is closer to the classical Latin). Also, the number of native Spanish speakers is nearly 390 million, while only 60 million people speak Italian as a native language. I don’t see why it would be so as you said.

    • Aidan Vey said

      It really depends what you mean by the most utilitarian. For learning other Romance languages, yes, for speaking to many people around the world or learning Germanic languages, no.

    • David said

      Only if you don’t count the millions of South Americans, who also speak Spanish.

    • Rafa said

      It depends on how you define “utilitarian”: either you want to learn more languages or you want to communicate with the greater amount of people in their first language.

      From the image one can infer that if your mother tongue is not in the group of Romance languages, your best bet is to learn French, as it acts as an “entry point” to the group of romance languages. If you already speak a romance language and want to learn another language, then yes, learning Italian seems to be the best option (on average).

      That situation only holds in the case that your goal is to LEARN another language (or many of them). If, on the other hand, your goal is to communicate to a wider audience, you’ll probably want to learn a language that is widely used, then Spanish is the best option of it’s group, followed by French, then Italian.

      On the other hand, Spanish have a lot of loanwords from Arabic, which actually introduces you to a different set of languages. For instance, Spanish “algodón” stands for Italian “cotone”, or “cotton” in English. Dissecting “algodón” you get “al-godón”, which is some how a variation of “il cotone”.

      As for your lawn guy, I prefer not to give any opinion)

    • Roko Ono said

      Spanish is the second most spoken language in the world take that English, Italian might be useful to make a fool out of yourself when going to your local pizza joint, maybe you should learn to mow down your hedges yourself

    • Sans Cosm said

      Ha ha ha…. Not. I would be amazed if you speak other language than english. Dumbass!!

    • Javier said

      Not sure proximity always means ease to learn another language.
      Among latin languages, portuguese speakers are said to be the ones that can learn other latin languages the easiest (spanish and italian at least)

      • Sorin Pop said

        I think Romanian competes very well with this this monopoly of portuguese. Actually, I think a Romanian can learn much easier Italian than a Portuguese. By the way, Romanian and Portuguese seem to sound the most similar to each other from all the romance languages, interesting coincidence.

    • Actually a Portuguese understands Italian, Spanish, Catalunian and Gallegan and French much easier than Italian.
      So learn PT and you’ll speak all the others in a few months.
      :p

    • Girl said

      OBVIOUSLY. it’s the direct derivation of latin.

    • bordari said

      Or Catalan.

    • yo mismo said

      F**k off and die

      PS: it is a joke jejejeje

      PS2: no!!!! xD

  54. Is there a legend for this? I’m having trouble with BOK and SRD.

  55. Very nice, thanks!

  56. SRB = Serbian? BOK = something Northern Germany?

  57. Tropylium said

    What’s the metric of distance used here? I’m wondering where the Baltic-Hungarian lines come from. Those two groups aren’t even in contact (nor have they been in the past), it’s about as unexpected as seeing, I dunno, Danish-Albanian would be.

    Come to think of it, Dutch-Greek looks kinda out of place as well.

    • Oscar said

      Not that strange, we do have (adapted) Greek words in the Dutch language.

      Another remarkable fact is that although Dutch is pretty close to German, there are not that many Germans that understand Dutch but a lot of Dutch people understand (or speak) German.

  58. Kurt said

    Bok is bokmal (Norwegian), srd is Sardinian, basque is a language spoken in Europe but it is not part of the European languages families, it is an isolate.

  59. […] Lexical Distance Among the Languages of Europe This chart shows the lexical distance — that is, the degree of overall vocabulary divergence — among the major languages of Europe. The size of each circle represents the number of speakers for that language. Circles of the same color belong to the same language group. All the groups except for Finno-Ugric (in yellow) are in turn members of the Indo-European language family. English is a member of the Germanic group (blue) within the Indo-European family. But thanks to 1066, William of Normandy, and all that, about 75% of the modern English vocabulary comes from French and Latin (ie the Romance languages, in orange) rather than Germanic sources. As a result, English (a Germanic language) and French (a Romance language) are actually closer to each other in lexical terms than Romanian (a Romance language) and French. So why is English still considered a Germanic language? Two reasons. First, the most frequently used 80% of English words come from Germanic sources, not Latinate sources. Those famous Anglo-Saxon monosyllables live on! Second, the syntax of English, although much simplified from its Old English origins, remains recognizably Germanic. The Norman conquest added French vocabulary to the language, and through pidginization it arguably stripped out some Germanic grammar, but it did not ADD French grammar. […]

  60. Julia said

    No Occitan? I figured it would be joined with Catalan.

  61. matt said

    BOK = bokmaal = Norwegian
    NN = nynorsk = Norwegian
    both are recognized official languages in Norway.

  62. matt said

    SRD = Sardinian

  63. Ay S. said

    Are you sure that the grammatical simplification from old Germanic is really “pidginization”?

  64. dm said

    I knew it.

    Hungarians really ARE aliens!

  65. Julius said

    @Laura
    I guess it’s Bokmål (Norwegian) and the Sardinian language.

  66. Jean-Marc said

    The Basque language is lacking. It’s spoken by more than 1 million people, including spanish provinces, and isnt indo-european. It was probably spoken in south of France and northenrn Spain before all of these.

  67. Julius said

    By the way: a version with ISO 639-3 codes would be nice.

  68. Zsolt said

    Sr – could be Sorbian?
    Bok – that’s Bokmål, the traditional version of Norwagian (as compared to Nynorsk – NN).
    Srd – could that be Sardinian?

  69. Renton said

    SRD is Sardinian

  70. […] See on Scoop.it – levin’s linkblog: Knowledge ChannelThis chart shows the lexical distance — that is, the degree of overall vocabulary divergence — among the major languages of Europe. The size of each circle represents the number of speakers …See on elms.wordpress.com […]

  71. Carl said

    Seem to be missing Turkish on there. It is a fairly major language in Europe.

  72. BOK stand for Bokmål. In Norway 2 related official written languages are being spoken: Bokmål (translated “book tongue”) and Nynorsk (translated “new Norwegian”).

    SRD stands for Sardinian- the language spoken in Sardinia

  73. mr sandman said

    Luxembourg is missing :( stupid graph

  74. Gabor said

    Why is Hungarian closer to Ukrain than Serbian or Slovak? I am pretty sure there is some slight Turkish connection. Bty where is Turkish? Similarly why alban is connected with the furthest south Slavic sountry Slovenia instead of the closest Serbian, or Macedonian? It is almost sure in these cases it is the common subset of those groups that influenced instead of a specific member.

    • Aggelos said

      There is no Macedonian language. This country that says they are Macedonians is mostly Serbians, Bulgarians, Albanians and Greeks. Their language is Slavic and they didn’t even live in this area when the real Macedonian race (part of the Greek nation) conquered the world.

  75. Anna Holmén said

    I would guess “BOK” stands for “bokmål”, which is the name of one of the 2 types of Norwegian. “SRD” – Sardinian?

  76. Very nice map ! But I’d like a legend too : I’m not sure what language I should see behind some of the abbreviations… Thanks !

  77. @Laura:

    Bokmål and Sardinian :)

  78. Juha Uski said

    Laura, BOK must mean “bokmål” whereas “NN” stands for “nynorsk”. SRD, I suppose stands for Sardinian.

  79. Anonymous said

    […] […]

  80. nfrankel said

    Basque is unique in that it bears no similarity to any known language, it is an orphan language (see http://en.wikipedia.org/wiki/Basque_language)

  81. jabinskyi said

    BOK is Norwegian Bokmal, and SRD, I assume, is Sardinian.

  82. Musungu said

    BOK is Bokmål, one of the two official standards of the Norwegian language (the other one, marked here as NN, is Nynorsk). The status of Silesian as a separate language is debated, so Sr is probably Sorbian, a small Western Slavic language spoken in eastern Germany, while I’d guess SRD is Sardinian.

  83. pjt said

    Laura, I assume BOK is bokmål and NN is nynorsk (the official written standards of Norwegian). SRD must be Sardinian, and I suppose SR is Sorbian (Wendish, spoken in eastern Germany), but why is it shortened like that? The author would have done well to stick to ISO 639.

  84. Spyros said

    SRD stands for Sardinia and BOK stands for Bokmål Norwegian

  85. Sim said

    Polish, unlike many Slavik languages, has an incredible number of Latin words which is not reflected in the graph at all

  86. SRD is Sardinian (and Sardinia is my land). I think its more near to spanish and catalan then Italian. (sorry for my terrible english)

  87. Fotoa said

    Basque??????

  88. BOK is Bokmal Norwegian; DSH is Danish.

  89. Jeffrey Shallit said

    BOK is probably Bokmal, one of the two versions of Norwegian. SRD is probably Sardinian.

  90. SRD I think it refers to Sardinian language. I know it’s considered an autonomous language and I think it’s sardinian since it’s linked to ITA and Catalàn. But a legend would be very useful (I also missing Rm and PRO

  91. Nice but.. said

    Once again SVK and SLO got mixed up :/ Slovenia (SLO) has something over 2 mil. people while Slovakia (SVK) has 5,5.. Thus the circles does not size appropriately.

  92. Based on context, I infer that
    NN = NyNorsk
    BOK = BokMål
    SRD = Sardinian

    But I am highly sceptical about the Finno-Ugric results – they aren’t even Indo-European!

  93. Flanders said

    BOK is Bokmål, or standard Norwegian. It think

  94. clau said

    http://upload.wikimedia.org/wikipedia/commons/4/4f/IndoEuropeanTree.svg and http://en.wikipedia.org/wiki/List_of_ISO_639-1_codes
    Rm is Romansh, the fourth national language of Switzerland (would however be missing Friulian and Ladin in Italy). Fri probably Frisian, spoken in Northern Germany. Pro = Provençale, Srd = Sardinian, Fa seems to be Faorese.
    A table would really be helpful, as there seems not to be a clear abbreviation system…

  95. Georgi said

    The only mistake is that there is no independent Macedonian language, it is made by the Serbian nationalists after the second World War, to make it much different fron the Bulgarian.

    • Aggelos said

      The biggest mistake is that there is no such a lamguage because the real Macedonians are a Greek race and Alexander the Great was speaking Greek.

  96. John F said

    I’m assuming BOK is Bokmal from Norway, which as the diagram shows is very close to Danish. Is SRD Sardinian? Spain is much more linguistically diverse than I realised if GLC is Galician and CAT is Castilian. I didn’t realise there were so many speakers of Frisian.

    I’d also love to see Arabic, Hebrew and the Indian Subcontinent’s languages plotted alongside these. Don’t stop there! Plot them all!!!

    • Cluny said

      GLC is Galician. It is touching Portuguese because they were born as the same language, but later differentiated mainly due to political reasons, when Portugal became independent and its “center of gravity” moved south as the “Reconquista” advanced.

      CAT is not Castilian, it’s Catalan. “Castilian” is in fact the real name of what you call “Spanish” in English, because it is the language originated in the Castile region of Spain, which (again for political reasons) along the centuries became established as the “lingua-franca” in the whole territory of Spain. Only a percent of Spaniards (about 75%) have it as their mother tongue, but all of us are required to understand and speak it fluently.

  97. David said

    Legend with ISO 639-1 (and occasionally 639-2) language codes and ISO 3166-1 country codes. I’ve had to guess a couple of times

    ALBANIAN
    ALB = Albanian (sq-AL)

    BALTIC
    LAT = Latvian (lv-LV)
    LIT = Lithuanian (lt-LT)

    CELTIC
    BRE = Breton (br-FR)
    GA = (Scottish?) Gaelic (gd-GB)
    IR = Irish (Gaelic) (ga-IE)
    WE = Welsh (cy-GB)

    FINNO-UGRIC
    EST = Estonian (et-EE)
    FIN = Finnish (fi-FI)
    HUN = Hungarian (hu-HU)

    GERMANIC
    BOK = Norwegian Bokmål (nb-NO)
    DSH = Danish (da-DA)
    DUT = Dutch (nl-NL)
    ENG = English (en-GB)
    FA = Faroese (fo-FO)
    FRI = (West) Frisian (fy-NL)
    GER = German (de-DE)
    ICE = Icelandic (is-IS)
    NN = Norwegian Nynorsk (nn-NO)
    SWE = Swedish (sv-SE)

    ROMANCE
    CAT = Catalan (ca-ES)
    FRE = French (fr-FR)
    GLC = Galician (gl-ES)
    ITA = Italian (it-IT)
    POR = Portuguese (pt-PT)
    PRO = (Provence) Occitan (oc-FR)
    RM = Romansh (rm-CH)
    ROM = Romanian (ro-RO)
    SPA = Spanish (es-ES)
    SRD = Sardinian (sc-IT)

    SLAVIC
    BLR = Belarusian (be-BY)
    BUL = Bulgarian (bg-BG)
    CRO = Croatian (hr-HR)
    CZE = Czech (cs-CZ)
    MA = Macedonian (mk-MK)
    POL = Polish (pl-PL)
    RUS = Russian (ru-RU)
    SLO = Slovenian (sl-SI)
    SR = Sorbian (wen-DE) – technically regarded as a language group consisting of Upper and Lower Sorbian
    SRB = Serbian (sr-RS)
    SVK = Slovak (sk-SK)
    UKR = Ukranian (uk-UA)

  98. Le_Chat_Noir said

    A beautiful illustration, indeed! Thank you, Teresa and Dr. Tyshchenko! To reply now, if I may: “Laura Blumenthal”: BOK stands for Bokmål (literally meaning “book tongue”), the preferred written standard of Norwegian and similar to Danish, while SRD stands for the Sardinian language. To “St. Izzy O’Cayce”: I respectfully disagree: among the Romance languages, the best to start with is Latin for the basis (vocabulary and roots), then continue with Romanian (that’s a tough one!) and French or Portuguese. Why is that? Well, a fluent Romanian and/or Aromanian speaker would rather easily understand Corsican, Sardinian, Romansh (Rhaeto-Romanian), Italian and Spanish (particularly the Catalan dialect). To “Piotrek”: SR stands for the Sorbian languages in the Lusatia region of eastern Germany, and are closely related to Polish, Czech and Slovak. To “Jonathon”: the Basque language (Euskara) is a language isolate, a remaining descendant of the pre-Indo-European languages of Western Europe, and probably dates to the Stone Age or Neolithic period; otherwise, in Basque there are only a few words borrowed from Spanish and French.

    • Dídac Busquets said

      @Le_Chat_Noir: just a correction. Catalan IS NOT a dialect of Spanish (or any other language). It’s a language on its own, having itself many dialects.

  99. @Laura: Bokmal (Norwegian) and Sardinian, I suppose.

  100. Mick Woods said

    BOK= bokmål (norwegian system of writing, spelling, semi-dialect) as opposed to nynorsk. SRD could well be Sardinian.

  101. Maxus said

    Also, Sami is missing.

    FRI = Frisian,
    BOK= Bokmal (one of two Norwegian languages, closer to Danish) and
    NN = New Norwegian (the other one).

  102. axel said

    What about basque?????? http://en.wikipedia.org/wiki/Basque_language

  103. egyeske said

    SRD is the language spoken in Sardinia, I think.

  104. mike said

    For two blog posts, borrowed research, and a needlessly hostile “about” section, this blog is getting some decent traffic. Good work.

  105. Cunnilinguist said

    I personally think that this chart has a lot of mistakes. But I’m focus “just” on these I really know, because I come from Slovenia. Our nation (SLO) has population of 2 million and Slovakia (Svk) has population over 5 million, but the circles’ diameter don’t represent that – is it possible that Tereza Elms switched us just like many people do? I think so :P The other thing … slovenian language is on this chart connected to croatian, but not to serbian!? And the distance between croatian and czech should be smaller than the distance between slovenian and czech. Also, why is russian so nicely connected to almost every slavic language but not SLO and CRO? And why isn’t there a connection between CZE and POL??? Oh, gosh … :S :D

  106. Vlad Iorga said

    Sardinian and Bokmal (Norway).

  107. Wendy said

    Reblogged this on Kiss the Translator and commented:
    Manca l’euskera…

  108. @Laura: I believe BOK is Norwegian Bokmaal http://en.wikipedia.org/wiki/Bokm%C3%A5l I’m guessing NN is Neo Norwegian http://en.wikipedia.org/wiki/Nynorsk and SRD is Sardinian http://en.wikipedia.org/wiki/Sardinian_language

  109. Is not Neo Norwegian (NN) more closely related to Dansih(DSH) than Norwegian Bokmaal(BOK)?

  110. James said

    Sr is probably Sorbian
    BOK is Bokmål – standard written Norwegian
    SRD must be Sardinian

  111. Vaughan said

    Yes, can someone help with the abbreviations? BOK has a similar number of speakers to Danish, Swedish or Dutch, and looks too big to be an obscure minority language (it’s not Afrikaans, is it? Come to think of it, where IS Afrikaans? Very similar to Dutch…).

  112. Jakob said

    @Laura

    BOK: Norwegian Bokmål as opposed to NN: Norwegian Nynorsk. Two standardized forms of written Norwegian, BOK is heavily influenced by Danish, whereas NN is based on North Western Norwegian dialects.

    SRD must be Sardinian.

  113. Closet Linguist said

    I would presume BOK to mean Bokmal Norwegian, SRD to Sardinian and Sr to be Sorbian. Is there a connection between languages labelled in all-caps and ones which only have a capital initial? (I suspected them to be “official language of a country” vs others, but surely Estonian [Est] must be the official language of Estonia — so there goes that theory!)

    Also to St. Izzy O’ Cayce: This is only the lexical distance, meaning how much history of vocabulary the languages share amongst each other. So yes, Italian words look most “average” amongst Romance language. A valuation of “utalitarian” has to take into account other factors as well, complexity of grammar, syntax and pronunciation, geographical proximity, number of speakers worldwide and in a restricted geographical neighbourhood, historical relations etc. All these parameters are dependent on where the learner is situated, what their native language is, which other languages they already know and how proficient they are in them (in order to make meaningful connections between their foreign languages). When again taking the average over all of these parameters, then Spanish, or more precisely Castilian, fares much better, as more people are much more like to have a Spanish-speaking country nearby rather than being close to Italy (being close to both is tie-broken by the fact that there are 406 million native speakers plus another 80 million non-native speakers versus Italian’s total of 60 + 25 million or so). Just my two cents.

    Disclaimer: I am not a professional linguist, but here’s my interpretation of the abbreviations by language group.

    ALBANIAN: Albanian (ALB).
    BALTIC: Latvian (Lat), Lithuanian (LIT).
    CELTIC: Breton (Br), Irish Gaelic(Ir), Gaelic/Scottish Gaelic (Ga), Welsh (We).
    FINNO-UGRIC: Estonian (Est), Finnish (FIN), Hungarian (HUN).
    GERMANIC: Bokmål (BOK), Danish (DSH), Dutch (DUT), English (ENG), Faroese (Fa), Frisian (Fri) [Rem: unclear which variant, maybe all of them], German (GER), Icelandic (Ice), Nynorsk (NN), Swedish (SWE).
    GREEK: Greek (GRK)
    ROMANCE: Catalan (CAT), French (FRE), Galician (GLC), Italian (ITA), Portuguese (POR), Franco-Provençal (PRO), Romansh (Rm), Romanian (ROM), Sardinian (SRD), Castilian Spanish (SPA).
    SLAVIC: Belarusian (BLR), Bulgarian (BUL), Croation (CRO), Czech (CZK), Macedonian (Ma), Polish (POL), Russian (RUS), Slovenian (SLO), Slovakian (SVK), Sorbian (Sr), Serbian (SRB), Ukrainian (UKR).

  114. That’s interesting, but how should we interpret Lithuanian standing between German and Polish? Of course, taking into consideration the geographical proximity, these languages must have some vocabulary in common, but I’m not sure whether putting LIT between those two languages is entirely correct.

  115. Søren Harder said

    BOK must be Bokmål i.e. the Norwegian dialect that comes from the Danish-Norwegian spoken in the cities before NN, New Norwegian was recreated from the more original rural dialects (at the time when Denmark lost Norway to Sweden). SRD must be Sardinian. Some of the links are spurious: Irish Portuguese, Dutch Greek. Why is (Scots) Gaelic not linked to English and why are the other germanic languages not related? (Viking vocabulary in Irish e.g.).

  116. sheaseer said

    Ah, I see that Finno-Ugric Hungarian out on the wing !

  117. Fenno said

    Bok = Norsk bokmål
    Srd = sardininan
    But yeah, I had to google some of the languages too.

  118. Very good and interesting! Thanks a lot!
    Few points:
    I believe there should be a strong link between catalan and provensal since in middle ages it was a single language.
    Another point of confusion is a number of speakers of Slovak compared to Slovenian – seems that circles should be changed.

  119. dipdowel said

    Reblogged this on ::Keep in Dutch::.

  120. Sebo said

    greek is foreveralone

  121. Lori Cole said

    Euskera?

  122. Daniel said

    An attempt at a small legend:
    Srd = Sardinian
    Bok = Bokmål (regular Norwegian)
    NN = Nynorsk (new Norwegian)
    Fri = Frisian
    Sr = Sorbian (unsure about this one, but it seems reasonable)
    Rm = Romansh (again, not sure, but considering it’s between French and Italian it seems likely)
    Pro = Occitan (often known as Provencal in English)

  123. KBN said

    SRD may be Sardinian. BOK is ‘bokmål’, that is, standard Norwegian

  124. Peter said

    There are approx. 5,5 mil. innhabitants living in Slovakia and almost 5 mil. Slovaks living abroad, so it should be in bigger circle, and Slovenia should be in smaller. Also, Cze and Pol should be connected with (<25) line.

  125. Kaa said

    It’s not published in Russian, but in Ukrainian.

  126. Martin Vidner said

    Sr: Sorbian, Bok: Bokmal, Srd: Sardinian(?)

  127. A. said

    Albanian connected to Slavic (Slovene)? Couldn’t be more off.

  128. Goran said

    @ Laura Blumenthal BOK is Bokmål, one of the standard variants of Norwegian, and SRD is Sardinian.

  129. BOK=bokmaal and SRD stands for Sardinian, i believe

  130. Maszyna said

    My guesses:
    Sr – Sorbian (spoken in Lusatia)
    BOK – Norwegian Bokmål (as opposed to NN – Norwegian Nynorsk)
    SRD – Sardinian

  131. niefpaarschoenen said

    Did you make this chart yourself? May I ask how? The number of Slovak speaking people is more than the number of Slovenian speaking people, so I’m wondering why the circle is smaller?

    Also, according to my Russian colleague, the book was written in Ukrainian, not Russian…

    Pretty cool though that this chart is becoming viral 6 years after you posted it :-).

  132. Sabrina S said

    @Laura Blumenthal:
    SRD = sardinian
    BOK = bokmål (one of the two forms of Norwegian, the other is Nynorsk = NN in this chart).

    http://en.wikipedia.org/wiki/Bokm%C3%A5l

    http://en.wikipedia.org/wiki/Nynorsk

  133. Csudi said

    Something is not right here. Hungarian is very very distant from grammar point of view, but in fact, only 10% of the vocabulary is original Hungarian. If the chart shows origins of words, it should be much closer all the three major groups.

  134. Laura, I think BOK refers to Bokmål and SRD to Sardinian.

  135. Basque is unrelated to any of them. One theory is that the Basque were the first Europeans, then retreated into their mountains as others arrived later. Basque has a few word in common with both Aztec and Finn-Urgic, so there’s a puzzle for you.

  136. Mia said

    How did you determine the number of speakers? Slovenian has MORE than 3.1 million speakers (in a country with a population of 2 million), whereas Slovak has LESS than 3.1 million speakers (in a country with a population of 5.4 million people). Also, the lexical distance between Slovenian and Albanian should be higher, I think.

    • Cunnilinguist said

      They obviously mistook Slovakia for Slovenia or vice-versa … nothing new :p
      But – why do you think the lexical distance between Slovenian and Albanian should be higher??? I’m surprised there even is a connection …

  137. Jędrzej Tomasz Flic-Matuszewski herbu Trąbka z Gwizdkiem said

    “Sr” is Sorbian.

  138. Jongseong Park said

    Here’s the key as far as I can work out:

    We: Welsh, Bre: Breton, Ga: (Scottish) Gaelic, Ir: Irish

    Eng: English, Ice: Icelandic, Fa: Faroese, NN: Nynorsk (New Norwegian), Bok: (Norwegian) Bokmål, Swe: Swedish, Dsh: Danish, Fri: Frisian, Dut: Dutch, Ger: German

    Fin: Finnish, Est: Estonian, Hun: Hungarian

    Lat: Latvian, Lit: Lithuanian

    Pol: Polish, Sr: Sorbian, Cze: Czech, Svk: Slovak, Slo: Slovenian, Cro: Croatian, Srb: Serbian, Ma: Macedonian, Bul: Bulgarian, Blr: Belarusian, Ukr: Ukrainian, Rus: Russian

    Alb: Albanian

    Grk: Greek

    Rom: Romanian, Srd: Sardinian, Rm: Romansh, Ita: Italian, Cat: Catalan, Spa: Spanish, Glc: Galician, Por: Portuguese, Pro: Provençal, Fre: French

  139. Jędrzej Tomasz Flic-Matuszewski herbu Trąbka z Gwizdkiem said

    “Srd” is Sardinian and “Bok” is Norwegian Bokmal.

  140. Anna said

    Great chart but some links and languages are missing I think…
    Where’s the link between catalan with gallec, portuguese and french… sometimes, catalan it’s lexicaly more closed to gallec or french than to spanish….
    And where’s basque? I heard that basque had some links with albanian…

  141. Reblogged this on The Monster's Ink and commented:
    Oh, look, some porn for language nerds.
    I think it’s hilarious how Albanian and Greek are sitting there all alone, like, “Who are all THESE assholes?” Though I also think Albanian would be insulted to hear that it’s lexically closer to the Slavic family than to the Romance family.

    • George m said

      Greek (Modern Greek: ελληνικά [eliniˈka] is an independent branch of the Indo-European family of languages. Native to the southern Balkans, western Asia Minor, Greece, and the Aegean Islands, it has the longest documented history of any Indo-European language, spanning 34 centuries of written records. Its writing system has been the Greek alphabet for the majority of its history; other systems, such as Linear B and the Cypriot syllabary, were previously used.

      The alphabet arose from the Phoenician script and was in turn THE BASIS of the Latin, Cyrillic, Coptic, and many other writing systems.

      The Greek language holds an important place in the histories of Europe, the more loosely defined Western world, and Christianity; the canon of ancient Greek literature includes works of monumental importance and influence for the future Western canon such as the epic poems Iliad and Odyssey.

      Greek was also the language in which many of the foundational texts of Western philosophy, such as the Platonic dialogues and the works of Aristotle, were composed; the New Testament of the Christian Bible was written in Koiné Greek. Together with the Latin texts and traditions of the Roman world, the study of the Greek texts and society of antiquity constitutes the discipline of classics.

      Greek was a widely spoken lingua franca in the Mediterranean world and beyond during classical antiquity and would eventually become the official parlance of the Byzantine Empire.

      Greek roots are often used to coin new words for other languages;
      Greek and Latin are the predominant sources of international scientific vocabulary.
      (WIKIPEDIA)

  142. heidilnd said

    Reblogged this on london is my modern babylon. and commented:
    just to prove my point to everyone again – estonia is NOTHING like russian!

  143. My guess is: Sr = Sorbic
    “BOK” certainly is Norwegian “Bokmål”
    SRD presumably Sardinian

  144. Dimitris said

    its bokmal (norwegian) and sardenian

  145. maryxmas said

    it was published IN UKRAINIAN.

  146. Know How said

    Where the hell is Turkish?

  147. Dobroslav said

    there is no Macedonian as there is no Bosnian and Montenegrian!!!

  148. Chris Bal said

    SRD – Sardinian
    BOK – Standard Norwegian

  149. Kiko said

    I think SRD may be Sardinian (http://en.wikipedia.org/wiki/Sardinian_language) and BOK may be Bockmål Norwegian in order to distinguish it from Nynorsk Norwegian. I also think Sr is Silesian due to its relation with Polish and Czech (http://en.wikipedia.org/wiki/Silesian_language) My doubt is PRO in the Romance Languages, may it be Occitan?

  150. Rick said

    SRD is likely Sardinian. I can’t figure out BOK. Anyone got an insight?

  151. Vuperi said

    What is the original work of Tishchenko? He had not a book in 1999 by name Metatheory of Linguistics. (Published in Russian). Hier are his works: http://www.langs.com.ua/contacts/1/Bibliography.htm

    1999:

    80. Морфологічна структура сучасної перської лексики // ІІІ сходознавчі читання А.Кримського. Тези міжнар. наук. конф. – К. (0,2 д.а.). Співавтор О. Бєдов.
    81. Службові дієслова у підсистемі діє­слів перської мови // ІІІ сходознавчі читання А.Кримського. Тези міжнар. наук. конф. – К. (0,2 д.а.). Співавтор А. Півторак.
    82. Лінгвістичний навчальний музей // Київс. нац. ун-т ім. Т. Шевченка. Довідник. – К.: КНУ. (0,2 д.а.).
    83. Лекції з генетичного мовознавства (Передісторія мовлення. Палеосигніфіка. Історична синтактика.) – К.: КНУ. (3,0 д.а.).

  152. Marcus said

    BOK = probably Bokmal and SRD Sardinian

  153. cyberj said

    There is a mistake in the size of the Slovak and Slovenian language speakers

    • astheart said

      I think Slovak is okay, but I cannot believe there’s so many Slovenian speakers as there are only 2 mil inhabitants in Slovenia, and only about 83% of them are Slovenians.

    • Cunnilinguist said

      They always make a mess with Slovenia and Slovakia … too similar name, too similar flag, nothing new ;)

  154. Joakim Ringblom said

    Laura Blumenthal: BOK almost certainly stands for “Bokmål”, which perhaps is more known as standard Norwegian/”Dano-Norwegian”. It is slightly different from New Norwegian (NN). Both are spoken in Norway. As a non-linguisticly skilled Swede I would guessed that they are just diffferent dialects, but apparently the real linguistics sees it otherways. I have no idea what SRD is though.

  155. Rollo said

    Where is the Basque language?

  156. czg said

    I have some doubts about Albanian so close to Slovenian (!?) and closer to Romanian than from Italian (not to mention Turkish)

    • Joni said

      The Albanian language is an Indo-European language in a branch by itself, sharing its branch with no other extant language.

    • Gabriela said

      Romanian shares HUGE active vocabulary with Albanians thorugh thracians. Romanian was highly influenced by Latin, not by Italian EVEN IF they are very similar.

  157. Sceptical Badger said

    I know it says ‘major’ languages of Europe, but Manx and Cornish could also be in there with the other Celtic languages

  158. Someone said

    SRD = Sardinian
    BOK = Bokmal (classic Norwegian)

  159. Magnus Markling said

    I would assume that
    NN = Nynorsk
    BOK = Bokmål

  160. edesorban said

    Fascinating! Clearly illustrates my experience with Hungarian!

  161. BOK is probably a variation of Norwegian. Often it is one of the font variations you can get on a computer. From Wikipaedia:
    There are two official forms of written Norwegian – Bokmål (literally “book tongue”) and Nynorsk (literally “new Norwegian”). The Norwegian Language Council is responsible for regulating the two forms, and recommends the terms “Norwegian Bokmål” and “Norwegian Nynorsk” in English. Two other written forms without official status also exist, the major one being Riksmål (“national language”), which is somewhat closer to the Danish language but today is to a large extent the same language as Bokmål. It is regulated by the Norwegian Academy, which translates the name as “Standard Norwegian”. The other being Høgnorsk (“High Norwegian”) that is a more purist form of Nynorsk, which maintains the language in an original form as given by Ivar Aasen and rejects most of the reforms from the 20th century. This form of Nynorsk has very limited use.

  162. Fernanlee said

    where is valencian language in that network?. This is one big mistake. A second mistake, I don’t see a link between french and catalan… why?

    • Dídac Busquets said

      @Fernanlee: well, I know this is a rather more political issue than a linguistic one, but IMHO (and the opinion of most of the linguists) Valencian and Catalan is the same language. The differences between what is spoken in Catalonia and what is spoken in Valencia are so minimal that they are (should) be considered as being dialects (whether the “overall” language should be called Catalan, Valencian, CatValan, that’s also a political thing). Not to mention including the Balearic dialects…

  163. Marko said

    Bok – bokmal? Srd – Sardinian?

    Any comments on Albanian – Slovenian (?) connection? That seems *very* odd.

  164. Jester said

    Spamming ‘what about Basque’ won’t add it to the diagram as data is just not there. If you want to make a new one (with Basque language), please do so.

  165. niklosz said

    What about kashubian language?

  166. Kefalo said

    I believe Sr is Sorbian. SRD-Sardinian? No idea what BOk is.

  167. Cédric said

    I’d be interested in any reference describing the methodology (how — and with what data — the inter-languages distances have been calculated). Is everything only described in the original publication, in Russian? Any translation out there? Is it a book or a paper? Cheers!

  168. Elise said

    Where is Basque?

  169. CgX said

    SRD is Sardinian :)

  170. Maury Incen said

    @ Laura Blumenthal: BOK = Bokmaaal, the oldest of the two languages spoken in Norway (the other one is called Nynorsk = New Norwegian). SRD = Sardinian, a language spoken in Sardinia (one of Italy’s biggest islands). :) Hope that helps!

  171. Paul said

    Bok = Bokmål (Norwegian) ; NN = Nynorsk (Norwegian) But I’m having troule with Fri.

  172. Paul said

    …SRD has to be Sardinian…?

  173. Sergi Monreal said

    Dear Ms Elms,
    Respectfully requesting to review the links of CATalan. It is very similar to PROvençal, there should be a black continuous line. I would like to rememeber that they were considered almost the same language in the upper medieval age. However, no line joins them in the graphic today. Similarly, there should be a line between CATalan and FREnch.
    Regards,
    Sergi Monreal

  174. kata-ana said

    To Piotrek:

    SR refers to Sorbian

    To Laura Blumenthal:

    BOK refers to Norwegian Bokmål – SRD refers to Sardinian

  175. […] Per the accompanying article: […]

  176. Paul said

    …Ah, Fri must be Frisian.

  177. Stuart said

    @Chris – At a guess, BOK is probably Bokmål and SRD Sardinian.

  178. Jonah Shepp said

    Bok = Bokmål. SRD = Sardinian. I think Sr = Sorbian.

  179. Seb said

    I would like a legend too, but I think BOK is book Norwegian as opposed to Neo Norsk which is the spoken tongue, SRD is Sardinian? Rom is Romanian, but what is Rm?

  180. liburni said

    Albanian,the language of gods

  181. Philip said

    interesting diagram ! why doesn’t it show ROMANI / ROMANES ? there are at least 6-8 Million Roma people living in Europe so I don’t think you can exclude them from your considerations. I mean Iceland has 320thousand inhabitants which is 25 times less than Roma people all over Europe.

  182. This is wrong…

  183. John Doe said

    It would be nice to see the Turkish language

  184. tom said

    I believe BOK refers to bokmal, or book tongue, the written language of Norway, as opposed to NN or Nynorsk

  185. Chris said

    Wracking my brains but I can’t come up with a Germanic language spoken by 3 million plus people that could be signified by BOK. Surely not Afrikaans, given its lexical distance from Dutch and proximity to Danish and the other Norse languages.

  186. NN and BOK are presumably Nynorsk and Bokmål, respectively.

  187. Roslyn Raney said

    BOK = bokmaal, a version of Norwegian. SRD is probably Sardinian

  188. Why is Irish connected to Português instead of Gallego?Both modern gallego and portuguese descend from the same gallego-portugues medieval language, in that sense why isn’t gallego also connected to irish?

  189. cristina said

    ” As a result, English (a Germanic language) and French (a Romance language) are actually closer to each other in lexical terms than Romanian (a Romance language) and French”.
    What sources did you use for this information? My native language is Romanian and I’m fluent in both English and French and I can assure you that this information is totally inaccurate, you can’t even compare English and Romanian in terms of Latin terms

  190. Séa said

    Where is Klingon? Where is glossolalia? Where is Esperanto?
    Oh that’s right, like Basque, they are not from the Indo-Eurpoean family, and so are not connected to major European languages.
    It is not part of some campaign to ignore all things Basque. Basque just doesn’t happen to be connected to other languages. There are no dots in the graph all on their own.
    Imagine a dot with BAS, not connected to any other dot…adds nothing to the graph.

    Clearly this piece of research was meant to politically define Europe, and if you are not present you obviously don’t count…lol

    Do you honestly expect someone that does research to cover absolutely every possibility, or is it not acceptable to do some research which covers the ‘major’ languages or Europe?

    Also FRI is likely Frisian (Netherlands / German direction)

  191. catbert836 said

    A legend for the chart (I’m pretty knowledgeable about languages, however of course any mistakes are mine)

    Germanic: Eng = English, Ger = German, Dut = Dutch, Swe = Swedish, Dsh = Danish, Ice = Icelandic (obviously). Less obviously: Fri = Frisian, Bok = Norwegian (Bokmal), NN = Nowegian (Nyenorsk), Fa = Faroese
    Romance: Por = Portuguese, Spa = Spanish, Ita = Italian, Fre = French (obviously). Less obvious: Rm = Romanian, Cat = Catalan, Srd = Sardian, Pro = Provencal, Glc = Galician
    Celtic: Bre = Breton, We = Welsh, Ir = Irish Gaelic and Ga = Scots Gaelic
    Baltic: Lat = Latvian, Lit = Lithuanian
    Finno-Ugric: Fin = Finnish, Est = Estonian, Hun = Hungarian
    Slavic: Rus = Russian, Ukr = Ukranian, Pol = Polish, Bul = Bulgarian, Cro = Croatian, Srb = Serbian, Cze = Czech (obviously). Less obvious: Slo = Slovene, Svk = Slovak, Ma = Macedonian, Blr = Byelorussian/White Russian, Sr = Sorbian.

    • Arturo Malvestito said

      It’s Belarusian, not Byelorussian or White Russian. The latter is a political movement against Bolsheviks in Russia, the former is totally offensive and at least 20 years outdated.

  192. Kalle said

    BOK=Bokmål=Norewegian (NN=Nynorsk), SRD=Sardinian

  193. Soulios Christos said

    I find it very interesting! Where is Turkish language? I think that also belongs to the Finno-Ugric arow!

  194. Vuperi said

    This is Tishchenko’s original in Ukrainian:

    http://nado.znate.ru/%D0%A2%D0%B8%D1%89%D0%B5%D0%BD%D0%BA%D0%BE_%D0%9A%D0%BE%D0%BD%D1%81%D1%82%D0%B0%D0%BD%D1%82%D0%B8%D0%BD_%D0%9D%D0%B8%D0%BA%D0%BE%D0%BB%D0%B0%D0%B5%D0%B2%D0%B8%D1%87#link4

    “После защиты докторской диссертации на тему “Метатеория языкознания” (1992) принял предложение ректората возглавить кафедру теории и практики восточных языков, впоследствии реорганизована в кафедру восточной филологии, из которой 1995 выделилась кафедра Ближнего Востока. Заведовал кафедрой 9 лет. С 2001 года К. М. Тищенко является заведующим и ведущим научным сотрудником основанного им 1992 Лингвистического учебного музея.

    2. Научная деятельность

    2.1. Метатеория языкознания” (Ukrainian: movoznastva) 1992 (sic!)

  195. Ian said

    I suspect NN and BOK are the two variants of Norwegian (Nynorsk and Bokmål) but I’m at a loss with SRD…

  196. Ian said

    Sardinian maybe?

  197. Davide said

    Most of languages spoken in Italy are missing… And it’s a pity because, for examples, Gallo-Italic languages are a bridge between Italian and the languages spoken in France and Spain… And moreover the border between West and East Romania crosses along the Gothic line La Spezia-Rimini (or more exactly Massa-Sinigallia)…

  198. ogodon said

    Is Basque just off-chart for difference?
    Laura, SRD must be Sardinian, I can’t figure out what BOK is…

  199. anne mcd said

    BOK = Bokmål Norwegian; SRD = Sardinian.

  200. The relationship between English and French is similar to the relationship between Persian and Arabic. Persain (an Indo-European lranian language) has absorbed a lot of Arabic vocabulary thanks to the Islamic conquest. Around 40-50% of Persian vocabulary comes from Arabic. Persian syntax, though, still retains its original Iranian features.

  201. To Laura – yeah, I can’t find the legend either. “Srd” I will guess is Sardinian. FA is spoken on the Faroe Islands. Maybe this for BOK: “As established by law and governmental policy, there are two official forms of written Norwegian – Bokmål (literally “book tongue”) and Nynorsk (literally “new Norwegian”)”.

    Not complete, but a list of abbreviations here: http://www.mathguide.de/info/tools/languagecode.html

  202. Simply amazing that Afrikaans with four million speakers, in a very important country, is somehow forgotten as a Germanic language……………………. and the Boers are being slowly murdered! But hey, who cares? http://www.democratic-republicans.us/white-south-african-tragedy

  203. Charlie Alpers said

    I’d be interested to see Yiddish and Hebrew represented

  204. Carlo Persiani said

    SRD should be Sardinian (and Corse) isn’t it? And Basque must be quite outdoor, here.

    • Carlo Persiani said

      Replying to myself, for the basque friends. Not being a linguist, nevertheless I think that the most ancient european language (basque) has no real ties with the Indoeuropean group and for this reason the original author did not put it in this diagram.

  205. Emelie said

    Basque isn’t in because it’s not considered part of any linguistic group, due to its origins.

  206. Sebastià Giralt said

    It is not correct regarding Catalan, which is the closest language to Occitan (Pro) and closer to French than Spanish and probably Italian. It should be placed among Spanish, Occitan (Pro) – French, and Italian. Therefore, its central situation makes it the most suitable to learn all the Romance languages.

  207. Emelie said

    @Laura
    BOK is Bokmål (one version of Norwegian) and SRD is Sardinian language (spoken on the Isle Sardinia)

  208. kate butling said

    SRD is Sardinia, but I also don’t know what BOK is except that it is probably a Belgian area dialect….

  209. Jacobo said

    Very cool! Some odd things here… Don’t more people speak Slovak than Slovenian? Aren’t Slovak and Czech very close? And I have read that Catalan and Provencal are closely related. Is there an analogous technique for measuring grammatical distance?

  210. Roger GS said

    I’m guessing that Sr is Sorbian and Norwegian is split up into Nynorsk (NN) and Bokmal (Bok).

  211. Robert said

    And where are other language? For example from Slavic group: Moravian, Silesian, Kashubian, Ruthenian, Resian, Polesian, Siberian, Lachian, Polabian, Lower Serbian (Lower Lusatian)?

  212. CsendesMark said

    The author had to feel awkward about Hungarian, because those lines not really representing the actual facts!
    It’s Finno-Ugric, but isn’t related to any Baltic languages.
    Hungarian is more closely related to the West Slavic languages (and less to the Eastern Slavic Ukrainian), and also influenced by _German_ Greek and Latin languages, not to mention the Turkish languages.

  213. Pam said

    But the French added the grammar for the comparative and superlative of adjectives with two or more syllables that don’t end in y (for example, more intelligent/most intelligent), didn’t they?

  214. Matthew said

    @Laur Blementhal Bokmal and Sardinian

  215. piggee said

    It is little bit bullshit. Where is the connection between Czech and Slovak language with German?

  216. piggee said

    And the connection of Slovak language with Swedish? (have the same grammar)

  217. F said

    BOK = Norwegian Bokmal, one of the two standards along with NN=Nynorsk.

    SRD = Sardinian

    Sr = Sorbian, Slavic minority language(s, there’s an Upper and a Lower) of eastern Germany

    I want to see the distance between Romanian and Slavic languages — Romanian has a lot of Slavic vocabulary.

  218. Jon Andersson said

    I’m guessing BOK = Bokmål (Norwegian) and SRD = Sardinian? And I agree, there should have been a legend with the map.

  219. xabier said

    where is basque??????

  220. Janusz said

    Piotrek, no, I suppose it’s Sorbian (Łużycki)

  221. Jaume S. said

    Very interesting post and graphic, but there’s two little mistakes. First of all, where’s Basque? (already noted by Jonathon). And second, I’m a native Catalan speaker and I miss a connection line between my language and French, since both are extremely similar. In fact, Catalonian is kind of equidistant between Spanish, Italian and French.

  222. Luljeta Koshi said

    Would you mind adding Bosnian language to the Slavic group of languages? Thank you very much.

  223. CoolKoon said

    BTW I have to dissent on Hungarian. Sure, it’s pretty far from everything, but vocabulary-wise it has tons of words adopted from German (just like Czech and Slovak BTW) and then Latin, so vocabulary-wise I’d place it much closer to German and the Romance languages. I’d also place it closer to Czech, Slovak and Serbo-Croatian too (and wouldn’t place it anywhere near Ukrainian, to which it’s just as distant as to Russian for instance). I’d also place Slovak closer to Czech, because nothing’s closer to Czech than Slovak (the two languages are mutually pretty much intelligible).

  224. xabier said

    Teresa Elms, you should read this:

    http://en.wikipedia.org/wiki/Basque_language

  225. Jim Nail said

    Nice, but there’s room for quibbling. For instance, on this graph Albanian is as close to Slovenian as English is to Dutch. WTF?

  226. berta said

    Where’s Armenian,also ^_^ ? (though looks nice)…

  227. And where is the ARMENIAN language?

  228. BOK is a form of Norwegian (‘Bokmal’). NN is the other form (‘Ny Norsk’). SRD, I think is Sardinian.

  229. Fabio said

    The link ITA – GRK is missing!! About 30% of italian words have greek origin, and for sure there’s much more in common between Italian and Greek (Magna Graecia was bassical southern Italy) than between Duch & Greek (????) Lithuan & Greek (???) and French & Greek!

  230. […] * Lexical Distance Among the Languages of Europe. […]

  231. Marcus Graly said

    Could you provide a table for the abbreviations you used for the languages? Alternatively, using ISO codes would make it easier to look up, at least. Thanks!

    http://en.wikipedia.org/wiki/List_of_ISO_639-1_codes

    Also, Bosnian is now considered a separate language.

  232. Stephan said

    As pointed out by a number of other people, it would be good to use a more standardized set of language codes/abbreviations.

    One possible source is below; in the language translation and localization industries, we use these codes, and even though they may be new to folks outside of the industry, at the very least we will be able to base our discussions on the same nomenclature:

    http://www.science.co.il/Language/Locale-codes.asp

  233. VK said

    To Piotrek and Laura: Sr is not Silesian, but Sorbian (spoken in Lusatia, Germany), SRD means Sardinian and BOK Bokmål, one of the two official standards of Norwegian (the second one is Nynorsk, here NN).

  234. ocschwar said

    BOK: Bokmal a dialect of Norwegian (the other main one is Nynorsk – NN.)
    Srd: Sardinian, which IIRC is the Italian dialect closest to Imperial Latin.

    Is Rm Romansch?? Does it really have that many speakers?

  235. Christophe Sims said

    English did not ADD French grammar : hopefully, the french grammar being particularly weird, even from latin origin.

  236. Matti Virtanen said

    BOK is bokmål, the standard Norwegian, different from NN or nynorsk, spoken upcountry. SRD is most likely sardinian. Sr would be sorbian, the slavic minority language in Germany. But what’s PRO? Besides Basque, others missed are Letzeburgish (Luxemburg), Moselfranken and Karelian.

  237. BrD said

    What about Sami languages? Some other small languages are missing too.

  238. Tim said

    BOK is Bokmål (one of two written standards for Norwegian, the other being Nynorsk [NN]). I would guess SRD is Sardinian.

  239. Huibdos said

    Bok = Norsk(Bokmal) Srd= Sardinia (Italian isle)

  240. Gentos said

    What have ALBANIAN with Slavic Language? We have more with Romance, Latin!

    • Aleks said

      Albanians have with NO ONE. IT IS A SEPARATED LANGUAGE. It is the mother language of all Europeans. Even ancient Greeks philosophers spoke Albanian, even today the “ancient Greek language” has nothing to do with the modern (KATHAREVOUSA) Even today Albanians can understand what was written 3000 years ago from Greeks. ( Because was pure Illyrian=Albanian language)

      • Aggelos said

        When the ancient Greeks were using their lkanguage Albanians were just a small group af people living next to the Albanos river over Ukraine. How can their language be the origin of the Greek?!!!!

        THE 10 FIRST LINES OF ODYSSEIA IN ANCIENT GREEK

        ἄνδρα μοι ἔννεπε, μοῦσα, πολύτροπον, ὃς μάλα πολλὰ
        πλάγχθη, ἐπεὶ Τροίης ἱερὸν πτολίεθρον ἔπερσεν·
        πολλῶν δ᾽ ἀνθρώπων ἴδεν ἄστεα καὶ νόον ἔγνω,
        πολλὰ δ᾽ ὅ γ᾽ ἐν πόντῳ πάθεν ἄλγεα ὃν κατὰ θυμόν,
        ἀρνύμενος ἥν τε ψυχὴν καὶ νόστον ἑταίρων.
        ἀλλ᾽ οὐδ᾽ ὣς ἑτάρους ἐρρύσατο, ἱέμενός περ·
        αὐτῶν γὰρ σφετέρῃσιν ἀτασθαλίῃσιν ὄλοντο,
        νήπιοι, οἳ κατὰ βοῦς Ὑπερίονος Ἠελίοιο
        ἤσθιον· αὐτὰρ ὁ τοῖσιν ἀφείλετο νόστιμον ἦμαρ.
        τῶν ἁμόθεν γε, θεά, θύγατερ Διός, εἰπὲ καὶ ἡμῖν.

        AND THE MODERN GREEK TRANSLATION

        Τον άντρα τον πολύπραγο τραγούδησέ μου, ω Μούσα,
        που περισσά πλανήθηκε, σαν κούρσεψε της Τροίας
        το ιερό κάστρο, και πολλών ανθρώπων είδε χώρες
        κι έμαθε γνώμες, και πολλά στα πέλαα βρήκε πάθια,
        για μια ζωή παλεύοντας και γυρισμό συντρόφων.
        Μα πάλε δεν τους γλύτωσε, κι αν το ποθούσε, εκείνους,
        τι από δική τους χάθηκαν οι κούφιοι αμυαλωσύνη,
        του Ήλιου του Υπερίονα σαν εφαγαν τα βόδια,
        κι αυτός τους πήρε τη γλυκιά του γυρισμού τους μέρα.
        Απ΄ όπου αν τα ΄χεις, πες μας τα, ω θεά, του Δία κόρη.

        Where is the Albanian in the Ancient text?
        Where is the total difference between the Ancient and the Modern Greek text that you believe in?

        Stop confusing the others with lies.

  241. Il CAPO said

    That’s not representative. Catalan is much more similar to French than Spanish.

  242. Don Baragiano said

    SRD = Sardinia ?

  243. Irhad Babic said

    And what about Bosnian in Slavic group?

  244. George said

    Greek : there is something mystical and powerful seeing my language stand alone on the center ! One of the most ancient languages still spoken !

  245. Salvo said

    http://www.loc.gov/standards/iso639-2/php/code_list.php

  246. Pierre-Olivier said

    SRD stands for Sardagna / Bok could be the Bornholmsk: a mix of Danish-Swedish language talked in a small island, Bornholm.

  247. Allan Lachlan said

    Where’s Turkish, Azeri, Maltese, Armenian and Georgian?

  248. erhan said

    no turkish?

  249. reosarevok said

    BOK would be Norwegian Bokmal I assume, and SRD is probably Sardinian

  250. Martin said

    Basque is a language isolate-not related or not demonstrated to be related to another language so it wouldn’t figure in this representation

  251. Ilkka Salo said

    Where are the other Finno-Ugric languages (Moksha, Mari, Komi and so on…)? They are printed, have newspapers, widely spoken and have even universities.

  252. Altin said

    Thank you Tim, lonely but proudly to be Albanian.

  253. Me and a couple of friends of mine have been digging around this chart for a bit after one of us wondered what “Ir” in the language codes might mean. Right now, even though one of us reads Ukrainian — which seems to be the language the work is written in, instead of Russian — we haven’t been able to find the source data for the graph. A version of it does seem to appear in the background of the cover of the print edition of the book, but nothing else has surfaced. More than a few people who know something about their linguistics have also noted highly suspicious omissions in lexical overlap, and a weird absence of certain languages over all, like Russian, minority Finnish-Ugric languages and Turkish.

    Thus, could somebody help us trace down where the chart *precisely* came from, what its underlying data sources are, and how precisely it came to be associated with a work in linguistic *meta*theory which doesn’t appear to deal in this kind of lexical minutiae at all?

  254. Sr is Sorbian – two Slavic minority languages spoken in Eastern Germany

  255. Carlos Bravo Villalba said

    Where’s Basque?

  256. @Laura Blumenthal:
    SRD – sardinian
    BOK – bokmal

  257. @Laura Blumenthal, BOK is Norwegian (apparently there are two different written forms of Norwegian, one of which is called Bokmål, who knew?) and I’m assuming SRD is Sardinian. Because yes, I am a nerd.

  258. […] Writer and self proclaimed “etymologikonoclast” Teresa Elms of Eytmologikon answers this question in a blog post. […]

  259. armand said

    Albania the root

  260. For Piotrek and Laura Blumenthal:

    BOK – Bokmål (a variant of Norwegian)
    SRD – Sardinian
    SR – Sorbian

    • ALBANIAN LANGUAGES
      Alb – Albanian

      BALTIC LANGUAGES
      Lat – Latvian
      Lit – Lithuanian

      CELTIC LANGUAGES
      Bre – Breton
      Ga – Gaelic
      Ir – Irish
      We – Welsh

      FINNO-UGRIC LANGUAGES
      Est – Estonian
      Fin – Finnish
      Hun – Hungarian

      GERMANIC LANGUAGES
      Bok – Bokmål
      Dsh – Danish
      Dut – Dutch
      Eng – English
      Fa – Faeroese
      Fri – Frisian
      Ger – German
      Ice – Iceland
      NN – Nynorsk Norwegian
      Swe – Sweden

      GREEK LANGUAGES
      Grk – Greek

      ROMANCE LANGUAGES
      Cat – Catalan
      Fre – French
      Glc – Galician
      Ita – Italian
      Por – Portuguese
      Pro – Provençal
      Rm – Romansh
      Rom – Romanian
      Spa – Spanish
      Srd – Sardinian

      SLAVIC LANGUAGES
      Bul – Bulgarian
      Bur – Belorussian
      Cro – Croatian
      Cze – Czech
      Ma – Macedonian
      Pol – Polish
      Rus – Russian
      Slo – Slovenian
      Sr – Sorbian
      Srb – Serbian
      Svk – Slovakian
      Ukr – Ukranian

      There’s a lot of comments asking about Basque. Since it isn’t related to any known language, I’m sure that’s why it hasn’t been included. I would make a guess, however, that there may be a measurable lexical distance between Basque and its geographic neighbors (such as Spanish and Galician).

  261. Star said

    I love how Hun is way off in the corner, further away from the finno Uralic than Celtic is to English or romance, but somehow people keep lumping them together.

  262. […] Distance Among the Languages of Europe” http://elms.wordpress.com/2008/03/04/ … via […]

  263. Patrick said

    “but it did not ADD French grammar”

    It did in my brain. Living in France for a few years will do that to you.

  264. Laura Blumenthal – SRD is Sardo, the Sardinian tongue, which my brother in law speaks – he describes it as a cross between italian and Catalan, and when we were in Barcelona he had no problem being understood or understanding, despite never having been to Spain or spoken Spanish / Catalan before. BOK is Bokmal, not a type of Danish as you might think from the map, but the official written form of Norwegian ( NN is Nynorsk) the purest or purist form of Norwegian. As in any post-colonial situation there are as many arguments around as you care to find! A really interesting depiction i must say. Noted the lack of Basque, but again it is unconnected to any other language so maybe it would have felt a bit lonely sitting there with no lines connecting it to anyone else. And I notice the entire Scots language family is missing from the Germanic family too, which would doubtless infuriate our own Ulster-Scots Agency were any of them actually interested enough in language or linguistics to check this diagram out…

  265. Piotrek said

    How have you prepared this this chart? Is there a tool for it?

  266. BOK should be bokmal (a Norwegian dialect), SRD is Sard, the native language of Sardegna.

  267. tolkarover said

    Anyone know how to get the raw data for this chart? I would love to run different clustering and dimensionality reduction algorithms on it.

  268. Wayne Johnson said

    Laura B, I’m guessing bokmal (Norwegian) and Sardinian.

  269. Lee Friedman said

    Basque, Maltese, and Turkish certainly qualify as European languages as much as some of the others on the graph. However they are outliers with regard to lexical distance which seems to be the primary characteristic to be depicted. They could be included as circles on the outer edges.The graph is also not restricted to Indo-European languages nor those written in the Roman alphabet.
    There is another omission that hasn’t yet been mentioned: Yiddish is a European fusion language closest to the Germanic group but with significant Slavic and Romance components as well as Semitic components and it is usually written in a Semitic alphabet. There is even empty space available in the graph between Germanic and Slavic (closer to Germanic) with room for links to Italian and French.

  270. Didier Suomi said

    Bokmål , a standardised norvegian, and Sardu, as spoken in Sardinia

  271. hecate said

    SRD probably stands for Sardinian. NN is Nynorsk Norwegian and BOK is Bokmål Norwegian.

  272. I think it’s also important to note what the contribution of a given language can be to Overall European Babel… If one would like to to find as many words for sports, England will be the best supplier. Like for winds, sailing, rail travel, politics, pop music… The word power of English words must be recognised walking along the streets of London, the largest village of the globe, reading the names of some streets. Oxford Street, Haymarket, Threadneedle Street (in the very City)… These names and plenty
    would sound funny if translated into other languages… And let’s stop here. In some languages the team of best players playing versus the team of a specific country is called the World’s Selection… The English simply call it The Rest of the World… Dignity.

  273. These names and plenty more…

  274. Yes, too bad that there is no legend. I think Sr must be Sorbian, Srd Sardinian. What is Bok and Dut though? And where is Sami?

  275. Elenor said

    BOK and NN are Boksmal and Nynorsk, the two varieties of Norwegian language. The former is basically Danish with some Norwegian localisms. The latter is the standardized version of the old dialects which maintained a more thorough West Norse character.

    SRD is presumably Sardinian, well regarded for its conservative features preserving aspects of Roman Empire era vulgar Latin.

  276. I can’t find turkish…?

  277. andreas said

    basque does not have ties to any of those languages and probably that is why it is not charted

  278. What about Euskera (Basque), one of the oldest languages in Europe and one of the few that does´t come from indo-european languages?

  279. Tonje Folkestad said

    BOK is Norwegian bokmål, (lit. “Book language”), i.e. The variety write by the majority of Norwegians. NN then refers to Norwegian nynorsk (“new norwegian”), a written form developed In the 1800s based on rural dialects.

  280. Tonje Folkestad said

    *written* by the majority of Norwegians… :-)

  281. Hanna said

    Please add a legend, there are plenty of abbreviations I cannot interpret without one..

  282. Jessica said

    BOK – bokmal Norwegian
    NN – Nynorsk (new norwegian)

  283. Steve said

    I could just add one thing to this.
    There is no such thing as Finno-Ugric languages. There has never existed such a people as Ugric, no remains, no archeological or genetical evidence has ever been came to light that would support this. It is merely a theory that was invented by the Hapsburgs in the 18.-19. century for political purposes (to discredit Hungarian history).
    The truth is, as recent genetical experiments reveal, that the Hungarian
    is the original population of Europe that survived in the very heart of the continent. Hungarian is so different from any other languages due to its age: it’s a neo-paleolithic language, a living fossile, that pre-dates all Indo-European languages around it. Ancient Sumerian (Mesopotamia) texts, and Etruscan scriptures can be still in traces understood even in modern
    Hungarian. Moreover the first human writing ever, the so-called “Tartaria Tablets” from 5200BC are written in Hungarian Runic script. They were found in the valley River Maros by female archeologist Torma in Transylvania (then Hungary, now under Romanian occupation). Scriptures written by Hungarian runic script are allegedly also found in the Bosnian piramyds in Visoko, Bosnia.

    • bulibashescu said

      “Under Romanian occupation” :))))) Yeah, right, Pista, since 75% of the population is Romanian, and the historical percentage of the Romanian population ranged from 62% to 53.8% (source: http://en.wikipedia.org/wiki/Transylvania#Population). Get your facts straight, Pista, before you start posting your nationalistic crap here

      • Gabriela said

        And think of the millions Romanians killed by Austro-Hungarian Empire and those before them. So, the Hungarians lived here thousands of years BC and, after going we don’t know where, they came back in about 11th century. BUT hey, no Hungarian was still living there… HMM … Logic… Yeah…

  284. John Smith said

    BOK == Bokmål, one of the two written variants of Norwegian.
    SRD == Sardinian, the language of Sardinia.

    Or so I think, based on the correlation to other stuff.

  285. Leif Pareli said

    The Sami languages (some ten or so) should have been up there near Finnish somewhere. Three of them are official languages in Norway.

  286. Ferko Mrkvicka said

    (My guesses)

    SRD – Sardinian
    BOK – Bokmal (a variant of Norwegian)
    NN – Nynorsk (another variant of Norwegian)
    Sr – Sorbian (languages of the Sorbs)

    Basque is not present because it is not even remotely related to any other known language :)

  287. Franz said

    Yeah… what about basque?

  288. Marc Costa said

    Catalan has 7 million speakers and it is quite bound to French and Portuguese, specially Portuguese… So catalan is quite wrong there

  289. Fran said

    the graphic should be tridimensional, horizontal distance should show common origen of a word, for example agua in Spanish and eau in French both came from acqua in latin, and vertically how far the words are from original (close for Spanish far from French). In that case, in the horizontal axis, French will be closer to Spanish (both Gallo-Iberian languages) than Italian, but Italian will be closer in total. Other example: camino in Spanish, chemin in French, from Celt, camminus and via in Italian latin via

  290. Jarmo L said

    Sr probably means Sorbian, Srd means Sardinian, I believe. BOK means Bokmål, one of the two official languages of Norway, and NN means Nynorsk, the other one. And yes, where is Basque? It would be interesting to see what methods have been used. JL

  291. Fran said

    Probably a lot of English word are Latin origen (“to continue” for example) but the most used and normal used one are German ones (“to go on” for example). If you count the number of different entries in a dictionary , English looks Latin but when you count all the words, including repetitions (like the word “the” or “of”) of most of the texts in English (other that scientific), it is definitively a German language

  292. Luljeta Koshi said

    Would you mind adding Bosnian language to the Slavic group of languages next to Croatian and Serbian? Thank you very much.

  293. BN said

    The statement “All the groups except for Finno-Ugric (in yellow) are in turn members of the Indo-European language family” is a bit of a myth, isn’t it? I attended a national history seminar in Finland last year, where this “fact” was contested, at least. Certainly finnish is uncomprehenasable to other scandinavian language speakers (unlike swedish, danish and norwegian, which can communicate quite easily in-between with only little training), but when you analyze it word by word, you find lots of bits that are common. Finnish is also a so-called kasus-language, unlike any other scandinavian language, but it has this in common with russian, german and italian, to name a few other, indeed Indo-European languages. Finnish grammar isn’t alien, like for instance eastern asian laguages. Geographically it doesn’t make sense that one language group, Finno-Urgic, should remain as a vertical island in almost mid-Europe, unaffacted by the huge movement of people from east to west the last few thousand years. For basque, it makes a little more sense that it could remain a language-island, as it lies in an outskirt. As with other “young” countries, like Norway, there was a process in Finland starting in the second half of the 1800s where the nation’s own language was constituted, and where the difference from others was indeed important to stress. This can explain the very different spellings of some words, which has the same origin as the same term used in for instance sweden, the arch enamy. This constructed difference / national myth isn’t as important now.

  294. Altay said

    This chart is totally meaningless! The author had to consider the issue from the perspective of origin and reality rather than political considerations! Example: just in Russia there are so many representatives of different languages, take various Ugor languages. Altai family: Azerbaijani, tatar, bashkird, Volga bulgarians, karachays, balkars, kumiks, nogays. If he thinks that the large space of Russia is only about the Russian language, then his paper work looses its value..

    What about Chechen, Ingush, Adigey, Avar and other languages!

    What about the South Caucasian languages?

  295. Neil said

    SRD is probably Sardinian. And BOK would be bokmal, one of the varieties of Norwegian.

  296. Luisa said

    “BOK” must be Bokmal, one of the two versions of Norwegian. And “SRD” – Sardinian?
    (Just hypotheses)

  297. Søren Bentzen said

    BOK is Bokmål which is one of the two official languages in Norway. The other one is Ny Norsk (NN). http://en.wikipedia.org/wiki/Bokm%C3%A5l

  298. Jan said

    Laura Blumenthal, I assume BOK is Norwegian Bokmål and SRD is Sardinian.

  299. Tina said

    SRD – Sardinian
    BOK – Bokmål (one of the official Norwegian languages) (NN – nynorsk/new Norwegian)

  300. Brian Kane said

    I’m guessing we’re “Germanic” due to the vast hordes of furrinners who were “Viking” around the area way back when.

  301. Natalie said

    I’m surprised. Ukrainian and Hungarian somehow connected? They don’t have a single common word! And yes, where is basque? Even not connected to any language it must be on the map

  302. BOK is standard norwegian which is based on danish. NN is “nynorsk”, new norwegian, based on old west-coast norwegian dialects. Plattdeutsch (low german) is missing. Modern german is not like low german that often can be understood by a swede if spoken. It provides a link between the scandinavian languages and frisian, dutch as well as german. The three existing lapponian (sami) languages are missing as well. SRD most be sardinian.

  303. Sarah said

    Would be helpful to have a broader definition of “lexical distance”? What’s the math behind this calculation?

    A legend wld help me as well…

    Thx a lot, if available…

  304. Jo Gessner said

    BOK is for Bokmål, which is an official written standard for the Norwegian language and spoken by ca 90% of the population. SRD is for Sardu (Sardinian language). SR maybe for Sorbian?

  305. Frode said

    Laura: My guess is that BOK is Bokmål Norwegian and NN is Nynorsk Norwegian. Also guessing that SRD is Sardinian.

  306. Bulgarian and Macedonian must be in the same circle. English and Scottish English are more different, than Bulgarian and Macedonian ;-)

    • Aggelos said

      What they baptised as Macedonian is Slavic. The true Macedonians were speaking, speak and will keap speaking Greek.

  307. Web Owl said

    Strange that Polish would have greater lexical distance to German than to Lithuanian – doesn’t seem right

  308. Anne said

    SRD is sardinian and I guess BOK is a kind of norwegian. http://en.wikipedia.org/wiki/ISO_639_macrolanguage

  309. Why is Basque not shown? Is it because it is an ‘Out-On-Its-Own’ language unrelated to anything in Europe and even in the rest of the world? I do like this chart though.

  310. It’s a nonsense that Albanian is closer to Slovenian than to Romanian. It is much closer to Romance languages and has no connection at all with Slovenian(?!), because Slovenians had never got in touch with Albanians throughout history. On the other side, “modern” Albanian borrowed many words from Serbo-Croatian, so I think this is where Albanian should be connected to the Slavic language group.

    • James said

      Goran, for the sake of the truth, neither “modern” Albanian or old one didn’t borrow any words from Slavic language group (Albanians didn’t come from Carpathian mountains like Serbs), so please do some research and read more before you make any comment.

    • Cunnilinguist said

      I’m from Slovenia and I couldn’t agree with you more!

    • Antonio said

      Scholars thoughts are that Albanian must be closer to Romanian due to the Thraco-Dacian-Illyrian connection but never ever to slavic language group. Are you kidding?! Just because of some words borrowed due to the slavic invasion?!
      I really don’t know on what basis the connections here are assumed but if there is any scientific meaning I would say that maybe the Illyrian substrate makes Slovenian and Albanian closer. But I really don’t think there is any real research on this map.

  311. I have trouble with the node “ROM” not being connected with the Slavic group.
    Romania has been under Soviet/Russian influence for several centuries.

  312. SRD stands for Sarda, language of Sardegna, I believe

  313. Wim Peters said

    Interesting!
    Since the original source was published in Russian, a short description of the methodology, in particular the computation of the lexical distance score would be helpful.

  314. petrudamsa said

    SRD – Sard (or Sardinian), some peopel regards it as a dialect of Italian, although seems is not.

  315. Balazs Kiss said

    Does the length of the different lines (“lexical distance”) also count? I do not quite understand why some lines (of the same dotting style) are shorter or longer…

  316. Martha said

    BOK would be Bokmål (spoken in Norway) I guess, and SRD Sardinian?

  317. damndyd said

    There’s about 2 million of Slovenian speakers, not over 3.1 million! And more importantly – in what way is Slovenian connected to Albanian?! I don’t think so!

  318. Sven Pin said

    Sr = Sorbian
    BOK = Bokmål (a form of Norwegian)
    SRD = Sardinian

  319. adriano said

    SRD might be Sardinian (which is a language not a an Italian dialect)

  320. Farid Belkhatir said

    Jonathon, Basque is not an Indo-european language, therefore it is not related to any of the cited languages. Laura Blumenthal, BOK should be Bokmal, the educated dialect of Norwegian, and SRD should be Sardinian.

  321. Tessa said

    I think SRD would be Sardinian about BOK no idea. Does anyone know “Pro”?

  322. Dearan said

    SRD – Sardinian I would think

  323. Piotr said

    @Piotrek Lesser and Upper Sorbian from Lusatia. But how Hungarian is related to Lithuanian or Latvian? Ukrainian maybe, thanks to Carpathian Ruthenia but it’s weak. They may have rather some connections with Turkish or German thanks to historical domination of the Sublime Porte and Habsburg Empire.

  324. Mats Sjöblom said

    BOK would be Norwegian Bokmål (an Eastern Scandinavian language actually based on Danish rather than spoken Norwegian, as opposed to Nynorsk, a Western Scandinavian language reconstructed from spoken Norwegian dialects in the 19th century), SRD would be Sardinian.

  325. curious bro said

    What is pro? :(

  326. Jeff said

    BOK = Norwegian Bokmal… NN = New Norwegian… What’s FRI?

  327. Viktorie said

    I really dig your concept, well done!!!! But please tell me, why there is no Bosnian? Dont get me wrong, Im nor angry nor Bosnian, but I just wonder because Ive been studying South-Slavic languages and I know for sure that Croatian, Serbian and Bosnian may be all very similar languages in general principle, but since this should be the lexical distance and there are a lot of words of Turkish descent Bosnians use and the other two groups dont use, Id assume Bosnian would have its place. Thank you for the answer!

  328. Emanuele said

    There should definitely be a direct link between Italian and Greek, even displaying little or medium distance…!

  329. eva said

    Srd is probably Sardinian, older language than Italian.

  330. MathieuB said

    Interesting!
    Yep, a legend would be handy.
    Also, knowing the official “major languages of Europe” chart, knowing where the considered languages start.
    best regards

  331. Yes, where *is* Euskara? I was surprised that surprised Albanian, long touted as another isolate, is closer to Romance than Greek is to, say, Lithuanian. But it appears that Basque is literally “off the chart.”

  332. Fernando said

    I categorically disagree with the apparent lack of common vocabulary between PRO and CAT. They are so close we used to study their joint medieval literature together back at high-school.
    It is only through the Occitan linguistic continuum that CAT and FRE find themselves linked.

  333. Aggelos said

    You use words as “etymologikon”, “monosyllables”, etc. but you do not refer to the Hellenic (Greek is absolutely wrong term) language that is the base of all the others. Very “good” ANALYSIS (another Hellenic word in the English language)!!!!!!!!!!!!!!!!!!!

  334. Jens said

    @ Laura: I’m pretty sure BOK stands for the Norwegian variety Bokmål (as opposed to Nynorsk which was apparently abbreviated as NN). SRD is probably Sardinian.
    But a legend would certainly help, I agree! I’m wondering what PRO is supposed to be.

  335. […] this via Tumblr yesterday. It’s basically a map that depicts the vocabulary commonality across Indo-European […]

  336. Elisa Peresbarbosa said

    Do you know which softwere was used to built the graphical representation?

  337. What about the Sami languages? Seem to be all forgotten, just like the Basque.

  338. David Person said

    SRD = Sardinian, guessing that SR = Sorbian, and BOK I have no idea…

  339. David Graber said

    Why would Slovenian be lexically closer to Albanian than any other Slavic language? This is very curious.

  340. Antti Tarvainen said

    I would guess that SRD = Sardinian and BOK = Bokmål (Norwegian).

  341. N Kibre said

    Can we see the underlying data?

  342. Salvatore said

    SRD = Sardinian language

  343. vasdecabeza2 said

    Where are the asian languages? xD

  344. vasdecabeza2 said

    Where are the asian and arabiclanguages? xD

  345. I’m assuming SRD is Sardo, the language of Sardinia??

  346. cameron said

    @Laura Blumenthal
    I am assuming BOK= Bokmål –> NN= Nynorsk
    and SRD = Sardinian

  347. Tamas Mohai said

    BOK is Bokmal (Norwegian) perhaps, while SRD is Sardinian I guess

  348. I believe SRD is Sardinian. (Wikipedia: Sardinian (Logudorese: sardu/saldu, limba sarda, Campidanese: sardu/sadru, lingua sarda) is a Romance language spoken on most of the island of Sardinia (Italy). It is the most conservative of the Romance languages in terms of phonology and is noted for a Paleosardinian substratum.)

  349. orjanvilen said

    BOK must be Norwegian Bokmål and SRD must be Sardinian. But what is PRO and Rm?

  350. andrea said

    Sardinian is a (useful) language?

  351. Mario said

    Why there is
    1, no link between CZE and POL?
    2. no continuos line between RUS and BLR? (the Whiterussian language just vary by <10% from the Russian)
    3. some more strange things in the slavic corner :-o

    • Arturo Malvestito said

      Mario, are you sure about these 10%? Try speaking Belarusian to a Russian – no way they understand it. And do you really want to call Belarusians Whiterussians? It is offensive and historically not justifiable (no matter how Russians want you to believe in it)

  352. Tony Chapman said

    I noticed someone said “where is Basque on the chart”. Well, Basque is a “remnant” language and doesn’t really seem to fit anywhere

  353. Richard Wassell said

    Sr = Sorbian (not the same as Serbian). I think. Srd = Sard(inian). Bok = (Norwegian) Bokmål. Apologies if not correct!

  354. Albert said

    SRD refers to sardinian language. BOK, I don’t know.

  355. Paul said

    BOK – Norwegian Bokmal
    SRD – Sardinian, I guess?

  356. Jiles said

    Where is Latin? Arguably–German, English, French, Italian all get quite a bit of vocabulary from Latin!

  357. Dominika said

    Fri is Frisian. Srd (next to Italian) is Sardinian. Sr (between Polish and Czech) is Sorbian, which is spoken by a small Slavic group in East Germany. Bok (in Scandinavia) might be Bokmal, which is a dialect of Norwegian/Swedish? Not sure about that one.

    • Martin Ž. said

      bokmaal is official norwegian language and it is practically the same as danish cos it was derived around the beggining of 20st as far as I remember. Nynorsk is “new norwegian” that was developed by nacionalists who didnt like the feeling that their official language is based on a country that used to rule over their motherland. I guess it is official but sparsely spoken.

  358. If I had to guess, BOK and NN are dialects of Norwegian, Bokmal and NyNorsk

  359. tanja caric said

    SRD is Sardo, language on Sardinia, Italian island. BOK is Bokmal, Danish language.

  360. That’s Bokmål and Nynorsk … SRD must be Sardinian, Wikipedia estimates at ~1mil speakers in 2007.

  361. slobodan said

    Slavic Bosnia and Herzegovina

  362. Ali said

    Albania unical

    • Sorin Pop said

      Not only Albanian, Greek as well. But you dononly wanna see Albanian as unique, since I assume you are albanian… By the way, I am neither Albanian nor Greek.

  363. […] lämnar er med en sjukt upphetsande överraskningspresent: Det lexikala avståndet mellan språken i Europa. Om jag visste hur man säger totally mind-blowing på svenska skulle jag säga det. Så, klicka […]

  364. David Collins said

    Romance family: SRD would be Sardinian, wouldn’t it? ROM = Romanian, as you might expect, and Rm = Romansch. PRO = Provençal.
    Germanic family: BOK = Bokmål (“book tongue”), the standard classical Norwegian, whereas NN = Nynorsk.

  365. Georgios m said

    It is kind of mystical and very powerful that Greek my Language is in center of this.. Alone .. connecting to every group .. makes me feel proud to speak one of the most Ancient languages in the world .