Research project

Tracking the Tocharians from Europe to China: a linguistic reconstruction

This project intends to provide an integrated linguistic assessment of the hypothesised migration route of the Tocharians.

2016 - 2021
Michaël Peyrot

Tocharian is known through manuscripts from Northwest China dating from 500–1000 AD. It is an Indo-European language, related to a.o. Latin, Greek, English and Dutch. The speakers of Tocharian must therefore have made a long trek from Europe to China. On the basis of contacts with other languages we aim to establish their migration route.

The Tocharian Migration Hypothesis

Tocharian is an extinct branch of the Indo-European language family, which includes a.o. English, Latin, Greek, Persian and Sanskrit. Tocharian was discovered in manuscripts from the Tarim Basin in Northwest China, which date from ca. 500–1000 CE. The Indo-European languages stretch from Ireland to the Sea of Bengal in one uninterrupted belt, but Tocharian, found in an isolated region of China, is a notorious exception to this geographic distribution. This project intends to find out how Tocharian ended up there.

The common ancestor of the Indo-European languages, Proto-Indo-European, can be hypothetically reconstructed and is often located in the east of present-day Ukraine. Consequently, speakers of Tocharian must have made a long trek eastward before they settled in the Tarim Basin. Archaeological and genetic evidence suggests that they first moved east to southern Siberia in the 3rd millennium BCE and then south to the Tarim Basin, where they may have arrived as early as the 2nd millennium BCE. The arrival of the Tocharians in the Tarim Basin is possibly linked to ancient corpses found there: the so-called Tarim Mummies.

This project intends to provide an integrated linguistic assessment of the hypothesised migration route of the Tocharians. If the Tocharians had already moved east to southern Siberia in the 3rd millennium BCE, this requires an early departure from the Proto-Indo-European homeland. We will clarify the starting point of the migration of the Tocharians through a systematic analysis of the affinities of Tocharian with the other Indo-European languages.

Languages preserve precious information about their prehistory through the effects of language contact. Through close scrutiny and periodisation of the different layers of contact of Tocharian and its prehistorical neighbours, we aim at reconstructing the migration route of the Tocharians from the Proto-Indo-European homeland all the way to China.

Tocharian has been in contact with various Iranian languages. Some of these are known, such as the Bactrian, Sogdian and Khotanese, while others are not directly attested, so that Tocharian is an important, if not the only source for these varieties. While the Middle Iranian languages Bactrian and Sogdian appear to have had a clear but smaller impact on Tocharian, an otherwise unattested Old Iranian variety has exerted more profound influence. This Old Iranian variety seems to have been part of the Iranian steppe dialects largely forming “Scythian”.

In this subproject, we examine the contacts between Tocharian and Old and Middle Iranian, with special focus on the otherwise unattested Old Iranian variety. A major goal is to establish features to carefully distinguish between the different Iranian sources.

In the Tarim Basin, the oldest known linguistic neighbours of Tocharian are Khotanese and the closely related but only fragmentarily known Tumshuqese. Surprisingly, few instances of linguistic contact between Tocharian and Khotanese have been proposed, and many of these have not been widely accepted.

In this subproject, we examine the contacts between Khotanese and Tocharian, and investigate selected topics of Khotanese historical grammar.

Although Chinese influence was not always strong in the Tarim Basin, the stronger Chinese presence in especially the Hàn and Táng periods obviously has left its traces in the Tocharian language. Because of the uncertainties in the phonetic interpretation of older Chinese and the marked typological difference between Tocharian and Chinese phonology, contacts between Tocharian and Chinese are difficult to judge and to establish.

In this subproject, we examine contacts between Tocharian and older Chinese.

Alexander Lubotsky has attributed shared vocabulary in Indo-Iranian that does not allow a Proto-Indo-Iranian reconstruction because of irregular phonological correspondences and morphology to an otherwise lost language of the ancient Bactria-Margiana Archaeological Complex (BMAC) in present-day Turkmenistan and Afghanistan. In some cases, Tocharian seems to have preserved lexical items from this prehistoric source as well.

In this subproject, we examine contacts between Tocharian and the prehistoric BMAC language.

The Tocharian Trek

We originally planned to investigate a number of relevant topics within this project, but these are now part of the related project The Tocharian Trek. In this ERC project, Principal Investigator Michaël Peyrot and collaborators Louise Friis, Stefan Norbruis, Niels Schoubben and Abel Warries investigate, among others, the phylogenetic position of Tocharian and contacts between Tocharian and Uralic, Turkic and Niya Prakrit.

