Universiteit Leiden

nl en

Research project

The Tocharian Trek

A linguistic reconstruction of the migration of the Tocharians from Europe to China

2018 - 2023
Michaël Peyrot

The migration of the Tocharians from Europe to China is one of the most disputed issues in the migration history of Eurasia. The key objective of this project is to provide an integrated assessment of the crucial but often neglected linguistic evidence. To this end, the project brings together a highly qualified team of doctoral and postdoctoral researchers along with world-eminent experts. Due to the contacts of Tocharian with other language families, The Tocharian Trek will be of essential importance for the linguistic and migrational prehistory of Eurasia as a whole.

The Tocharian Migration Hypothesis

Project description

The long trek of the Tocharians from Europe to China is one of the most disputed issues in the migration history of Eurasia. Tocharian is an extinct branch of the Indo-European language family, which includes a.o. English, Latin, Greek and Sanskrit. The Indo-European languages stretch in one uninterrupted belt from Ireland to the Sea of Bengal, but Tocharian, discovered in manuscripts from the Tarim Basin in Northwest China dating from c. 500–1000 CE, is a notorious exception to this geographic distribution.

The common ancestor of the Indo-European languages, Proto-Indo-European, can be hypothetically reconstructed and is often located in the east of present-day Ukraine. Therefore, speakers of early Tocharian must have made a long trek eastward before they settled in the Tarim Basin. Archaeological and genetic evidence suggests that they first moved east to southern Siberia around 3500 BCE, where they are to be identified with the archaeological Afanas’evo Culture, and then south to the Tarim Basin in China, where they may have arrived as early as 2000 BCE. The arrival of the Tocharians in the Tarim Basin is possibly linked to ancient corpses found there: the so-called Tarim Mummies.

Curiously, linguistic evidence has mostly been neglected. Therefore, The Tocharian Trek aims to provide an integrated linguistic assessment of the hypothesised migration route of the Tocharians. Languages preserve precious information about their prehistory through the effects of language contact. Through close scrutiny and periodisation of the different layers of contact of Tocharian and its prehistoric neighbours, the project will reconstruct the migration route of the Tocharians from the Proto-Indo-European homeland all the way to China.

Peyrot, Michaël. 2019. “The deviant typological profile of the Tocharian branch of Indo-European may be due to Uralic substrate influence”. Indo-European Linguistics 7: 72–121.
Peyrot, Michaël, Georges-Jean Pinault & Jens Wilkens. 2019. “Vernaculars of the Silk Road – A Tocharian B–Old Uyghur bilingual”. Journal Asiatique 307: 65–90.
Sikora, Martin et al. 2019. “The population history of northeastern Siberia since the Pleistocene”. Nature 570: 182–188.
Peyrot, Michaël & Guus Kroonen. 2019. “The formation of the Siberian linguistic landscape”. Section 9 (p. 132–148) of the Supplementary Information to “The population history of northeastern Siberia since the Pleistocene”. Nature 570: 182–188. 
Dragoni, Federico, Niels Schoubben & Michaël Peyrot. 2020. “The Formal Kharoṣṭhī script from the Northern Tarim Basin in Northwest China may write an Iranian language”. Acta Orientalia Academiae Scientiarum Hungaricae 73: 335–373.

The older phase of Proto-Indo-European is often dated to ca. 4500–3500 BCE, and the later phase to ca. 3500–2500 BCE. If early Tocharian is to be identified with the Afanas’evo Culture in Southern Siberia, dated ca. 3500–2500 BCE, the Tocharian branch must have left the protolanguage relatively early. Indeed, it is often claimed that, after Anatolian, Tocharian was the second branch to split off the protolanguage.

In this subproject, we investigate the position of Tocharian in the Indo-European language family; when did the Tocharian branch split off the protolanguage, and are there any closer connections to other branches? 

Compared to the reconstructed Indo-European protolanguage, Tocharian has undergone heavy structural change: the vowel system considerably, the stop system has been reduced to only a voiceless series, and the noun has acquired agglutinative case inflexion.

In this subproject, we test the hypothesis that the deviant typology of Tocharian is due to substrate influence from Uralic. Contacts with Uralic appear to have taken place in southern Siberia, originally home to the Samoyedic branch of the Uralic language family, which provides the crucial link between the Tocharian language and the Afanas’evo Culture of southern Siberia.

Tocharian literary culture has exerted strong influence on the early phase of Old Uyghur (Old Turkic) Buddhism. Thus, it is not surprising that there are loanwords from Tocharian to Old Uyghur. There are also indications of earlier contacts that predates this literary layer, but it is unclear where and when these earlier contacts took place.

In this subproject, we investigate prehistorical contacts between Tocharian and early Turkic on the basis of lexical borrowings. We also attempt to outline a scenario for the time and place of these contacts.

In the Tarim Basin, Tocharian has been claimed to have influenced the Middle Indian language Niya Prakrit or Niya Gāndhārī. However, this claim has never been widely accepted and Niya Prakrit is known to have been influenced by other languages, notably Iranian.

In this subproject, we test the hypothesis that Niya Prakrit has been influenced by Tocharian. We also examine foreign influence on Niya Prakrit, in particular from Iranian.

Tracking the Tocharians

Tocharian has also been in heavy contact with Iranian languages. These are not the topic of this project, but of a related research project called Tracking the Tocharians. In this NWO-funded project, Michaël Peyrot (Principal Investigator), Chams Bernard and Federico Dragoni investigate the contacts between Tocharian and Old Iranian and Middle Iranian languages.

This website uses cookies.  More information.