The Iranian Language Family

The Iranian Language Family is part of the Indo-Iranian (or Aryan 1) language group, itself the major eastern branch of the Indo-European languages. Comparative linguistic suggests a common ancestor for the Indo-Aryan languages of Northern India[2] and the Iranian Languages. This common ancestor, known scientifically as Indo-Iranian is not attested in any form, oral or inscribed. However, with the comparative study of the earliest forms of Indo-Aryan and Iranian languages, Vedic Sanskrit and Gathic Avestan respectively, a conclusion on its possible form can be reached.

Following a grand migration from their Central Asian homeland, large groups of Iranian tribes settled in various regions of the Iranian Plateau[3] while others continued to follow a nomadic life-style in the Eurasian Steppe. Our reconstruction of their shared characteristics (largely in form of their languages) shows that each of these tribes shaped their own respective dialects of the original Iranian language and in time, created the distinct languages grouped together as the Iranian Language Family (ILF). In this process, the languages of the earliest tribes to settle in the Iranian Plateau (Medes, Parthians, and Persians) came to form the “Western Iranian” branch of this family, while the languages of most of the nomadic tribes and a few who settled in Central Asia are categorised as “Eastern Iranian”. The latter is often most characterised by more conservative features in terms of both grammar and phonology. Linguists have further divided the Iranian languages into three stages, Old Iranian, Middle Iranian, and New Iranian[4].

Old Iranian Languages

 This stage includes all of the Iranian languages from their earliest stages until roughly 300 BCE. The only language of this stage from which contemporary written examples are available is Old Persian. Compositions in Avestan, another language of this stage, were transmitted orally for many years and written down only during the Sasanian period (224-632 AD) or possibly earlier, although most likely in a fragmentary manner. A major language of the period, *Old Median, has so far rendered no written evidence and this is unlikely to change. Of the existence of *Old Sogdian or *Old Parthian we only know through a process of deduction from their known Middle Iranian forms.

Old Persian was the native language of the Achaemenid Emperors and at least one of the official languages of their empire. It can be studied from the inscriptions of the Achaemenid kings such as the grand inscription of Darius the Great in Behistun, where the name of the language is given as Aria-, an obvious choice based on the common use of the designation both among the Indic and Iranian branches. Like other Iranian languages, Old Persian did not initially possess a writing system. After the organisation of the Empire under the leadership of Darius the Great in the late 6th century BCE, the need for a native writing system became apparent. At this time, Old Persian cuneiform, based graphically on Elamite cuneiform but incorporating the alphabetic system of Aramaic[5], was created. The Old Persian cuneiform was used exclusively for writing royal inscriptions and no example of its everyday use has been found, other than a few cases for seal inscriptions. For administrative purposes, Achaemenid emperors first used Neo-Elamite, soon replacing it with the Aramaic script and language, making it in effect the lingua franca of their vast empire, at least for the first half of its existence[6]. The few words we know from Old Median and Old Saka (mostly names and titles) are available through Old Persian inscriptions.

Avestan is the other Old Iranian language from which sufficient evidence is available. Avestan is the name given to the language since the Avesta, the sacred book of Zoroastrianism, was composed in it, while the native name of the language is unknown. Avestan belongs to the Eastern branch of Iranian languages and the only language of this stage of Eastern Iranian known to us, apart from a few Saka words.

Avestan is often divided into two (or sometimes three) stages, based on the philological study of the various texts. The earliest stage, often called Old Avestan or Gathic, consists of the 17 poems of theGathas and the devotional piece known as Yasna Haptanghaiti, both currently forming the central part of the Avestan hymn corpus known as the Yasna (“devotional hymns”). It is commonly believed that Zarathushtra (Zoroaster)[7] himself composed the Gathas, while the latter parts were composed during a span of time that might have lasted until the 4th century AD, reflecting the changes in the language. The Old Avestan is closely related to Vedic Sanskrit, to the point that the grammar and even stylistic features of the two languages often match and resemble each other. It has been dated, mainly through comparative philology, to the period between 1,300 and 1,000 BCE.

The second stage of Avestan is known as Younger Avestan (YAv.) which represents a later development of the language and might have been a living language down to the seventh century BC.  The largest bulk of the Avesta is written in this language, including the rest of the Yasnas, the Yashts, and parts of the lesser known book of Visperad. Despite their late date, parts of the Young Avestan texts, including theHom Yasht (“the hymn to Haoma”) are thought to be based on pre-Zoroastrian compositions in honour of various deities, and thus reflecting a stage of Iranian religion prior to Zarathushtra.

Scholars sometimes suggest the existence of another stage of the Avestan language, often labelled the “pseudo-Young Avestan”. This is assumed to be the language used for the composition of the Vidaevdat (“Anti-Demonic Law”), an important part of the Avestan corpus and often the basis of much of Zoroastrian ritual practice. The corrupted grammar and morphology of the text, as well as its obvious attempts at archaism seems to betray it as a very late text, maybe even post fourth century BC, when the Avestan language itself was effectively dead and out of use as a spoken language.

It is widely accepted that due to the Indo-Iranian oral tradition, the contents of the Avesta were carried orally through the ages. This was done by the priests who spent long periods of time memorising every verse of their holy book using special mnemonic techniques. From the testimony of the Avesta itself, we know that at least parts of it were written down in various scripts during the later Parthian period (probably first century AD), and possibly even earlier. However, a final compiling of the sacred texts took place under the Sasanians (226-632 AD), using an alphabet based on the Pahlavi script, but carefully constructed for a very accurate representation of the sounds of the Avestan language.

Middle Iranian Languages

 These include many languages which were spoken from the borders of China to the heart of the Fertile Crescent from the 3rd century BC to almost 10th century AD. The dominant language of this stage was initially Parthian (Pahlawanig) and later Middle Persian, a dialect of which was commonly known as “Pahlavi”. It also included important languages of Central Asia, such as Sogdian, Khwarazmian, Bactrian, and Saka, which in time were replaced by either Persian or the languages of conquering Turkic tribes.

Parthian rose in importance with the founding of the Parthian/Arsacid dynasty in the second century BC, who carved out a kingdom, and later an empire, for themselves from the former Achaemenid lands by Alexander. Parthian was the eastern most of the Western Iranian languages, a position that in many cases resulted in the preservation of archaisms, particularly in morphology. It was written in a script based on Aramaic and it incorporated heterograms[8]. Our main contemporary sources for studying the Parthian language are the few remaining inscriptions and ostraca from Nisa[9] and Hecatompilos[10]. However, a great portion of our available Parthian material comes from the early Sasanian period in form of Manichaean compositions and pieces such as the Parthian version of Shapur’s inscription, and as such represents a very late stage of this language. Manichaean texts used a special script reportedly created by Mani himself, and it did not use heterograms.


Middle Persian was initially the language of the province of Pars (Persis), the heart of the Achaemenid (and later Sasanian) empires, and a semi-autonomous kingdom during the Parthian period. It represents a development of the Old Persian of the Achaemenid royal inscriptions or one of its close dialects. In 226 AD, Ardashir, the local ruler of Pars, deposed and replaced Artabanus IV (Parth. Artawān), the last Parthian Emperor, and founded the Sasanian Empire. The language of the new rulers, Middle Persian, thus became the dominant language of the empire. Much of our knowledge of Middle Persian comes from Zoroastrian exegesis written on the Avestan texts, as well as a few secular literary and sometimes historical-legendary pieces. These tended to preserve a highly stylised and conservative version of the language and as such, show little progress throughout the Sasanian times. However through grammatical and literary mistakes, we can get a glimpse into how the language was changing from Middle Persian to Classical New Persian. It seems that at least from the beginning of the sixth century AD, the spoken version of the language had already become what we know as the most archaic form of New Persian. At the same time, Middle Persian and all its archaisms survived as a literary language even after the fall of the Sasanians to the Muslims and well into the ninth century and possibly even later.

Middle Persian was written mainly in the Pahlavi alphabet, derived from Aramaic much like the Parthian script, and also integrated heterograms. Middle Persian documents written in the purpose-made Manichean script are of utmost importance for the study of the language. Due to avoidance of heterograms, Manichaean Middle Persian helps us understand much of the actual language, while its insistence on recording the common speech helps us learn much about how the language was used and pronounced.

Sogdian, among the less dominants Middle Iranian languages, was surely the most influential because of its important role as a mediating language of the “Silk Road”. Sogdian merchants, who dominated the trade between China and Central Asia, and even between China and India, decidedly preserved their language, even when they settled permanently in other lands such as China. Structurally, Sogdian is notable for its preservation of Old Iranian archaisms, including much of its morphology, as well as saving many of the grammatical structures of Old Iranian.  Most of our sources for studying Sogdian are the inscriptions and administrative correspondences discovered at Mt. Mugh near Panjikent in Tajikstan, as well as some Manichean and Christian texts found during the discoveries at the great monastery ofTurfan (now in Chinese Turkistan).  Among these are fragments of Iranian legends known better in their New Persian forms, as well as fascinating fables and stories which provide us with a window to the rich culture of Sogdiana and its language.

After Islam, either Persian or various Turkic dialects slowly replaced Sogdian as the dominant languages of Central Asia. There are, however, attestations of Sogdian words in the works of post-Islamic authors and poets, as well as dictionaries, showing the survival of the language for a few centuries after Islam. Today, a local language of the Zarafshan River Valley, called Yaghnobi, is thought to be the modern development of a Sogdian rural dialect. Sogdian was also written in a script derived from Aramic which used heterograms, while the Manichaean script was also used occasionally for recording religious texts of that language.

Khwarazmian was a close relative of Sogdian and an important trade and scientific language, spoken primarily in the lower Oxus River region known as Khwarazmia (alt. Choresmia). Evidence of Khwarazmian can be found in the works of the astrologer Biruni[11], and also the Khwarazmian-Arabic dictionary of Zamakhshari[12], which testify to its continuous importance in Central Asia after the conversion of that region to Islam. We know very little of Khwarazmian prior to Islam, but few inscriptions lead us to believe that is was an important language that was written in another script derived from Aramaic. After Islam, Khwarazmian developed a script based on the Arabic one in which special signs was provisioned to represent peculiar Khwarazmian sounds. This seems to have given the language a new found energy that guaranteed its survival until the end of the first millennium AD and even beyond, although like Sogdian it seems to have eventually lost its position to Persian and then Turkish.

Bactrian, another member of the eastern branch of the Iranian languages, was the language of the province of Bactria in eastern Iran (present day province of Balkh, Afghanistan). Following the conquest of the Achaemenid Empire by Alexander, a Greek kingdom was established in Bactria, deeply influencing the culture of the region. In subsequent centuries, Tocharian and Saka tribes caused the demise of the Greco-Bactrian kingdom and established the powerful kingdom of Kushan. In turn, the Kushans adopted the local Bactrian language for their kingdom and took it all around their area of influence, down to northern India (Taxila). After Islam, New Persian gradually replaced Bactrian, in a similar fate as other Central Asian languages. Bactrian is known from both inscriptions of Kushan kings such as Kanishka I, as well as texts such as contracts and land-deeds that have been recently discovered. As such, it is fast becoming one of the best known languages of the Eastern Iranian branch. It was written in an alphabet derived from Greek, and even after the invasion of the Kushans by the Sasanian emperors, Greek alphabet continued to be used for writing the Bactrian language.

Saka is the general name for a group of closely related dialects of an Eastern Iranian language. No written evidence of *Old Saka, save a few names and terms in Old Persian and Greek, has so far been discovered, but its various Middle Iranian dialects suggest a common ancestor. Since different Saka tribes occupied vast territories from the Chinese Turkistan to Eastern Europe, we always have to remember the existence of the various dialects of Saka. For example, in the northern Black Sea region and the Caucasus, later descendants of the Saka tribes such as the Alans[13] probably spoke a language we call Alano-Saka. In other places like the eastern Caspian region or present day Chinese Turkistan, they spoke various dialects like Khotanese Saka and Tumshuq Saka. Nevertheless, it is assumed that all of these dialects were more or less mutually understandable for the speakers of each other, probably due to their nomadic life style which increased contact among the various tribes. Eastern dialects of Saka, most prominently Khotanese, are known to us via various religious texts, often Buddhist in nature, following the fact that the Sakas had largely converted to that religion.

Today, a modern development of Alano-Saka, called Ossetic, is spoken in two dialects by a small group of people in Southern Russia[14] and is written in the Cyrillic alphabet. At least one language of the Pamir region, called Wakhi, can be credited with a Saka substratum, as well as another known as Ishkashimi. Saka was originally written in different scripts, including the Kharoshti script of northern India and eventually in a script derived from Devanagari. Modern descendants of Saka languages often lack a writing system or are written in Perso-Arabic and Cyrillic alphabets, as well as the pieces recorded by modern philologists in modified Latin script.

New Iranian languages

This period is marked by the rise of Islam in the former Sasanian lands and influence of foreign languages such as Arabic and Turkish on Iranian languages. In this period, New Persian developed to be the most widely spoken language of the Iranian family, followed by Kurdish, Pashto, and Baluchi.

New Persian, in its earliest forms, was probably the spoken language of the majority of the population of the Sasanian Empire from the sixth century onwards. After the demise of the Sasanian Empire, great numbers of Arabic speaking tribes migrated to the newly conquered Sasanian lands and influenced the local languages including Persian. In turn, the Arabic speaking bureaucracy was itself influenced by the Sasanian administrative system, including Persian as its primary language for its earlier stages, as well as adopting a great portion of Persian administrative terms.

Persian itself became the common language of the converted Muslim population of Iran who by moving into the areas in the Sasanian periphery, or even those outside its immediate control such as Sogdiana, eventually came to dominate them and establish New Persian as the native language of these regions. Early uses of Persian in literature and everyday speech is known from the accounts such as Chahar Maqale and the geography of Ibn Khurdadhbeh, as well as short pieces written in the Hebrew script, known as Judeo-Persian. New Persian, called the “Classical” New Persian in this period, thus began a Golden Age under the patronage of the Samanid dynasty of Transoxiana in the 10th century AD. Adopted as the language of the administration, it was also supported alongside Arabic as the language of science and literature in the Samanid court and rendered some of its earliest masterpieces.

In the 10th century AD, the poet Ferdowsi composed a monumental epic, named Shahnameh, which told the half legendary history of Iran in more than 120 thousand verses of poetry. This work, today regarded as the most important literary corpus of the Persian language, further established Persian as the language of culture and learning in the Iranian territories. The fact that the earliest blooming of New Persian happened in lands were it was not the native tongue (namely the Sogdian and Khwarazmian speaking Transoxiana) might also have given it its peculiar features in preserving phonological archaisms that even Middle Persian had already lost.

In its most important period, from 15th to 19th century AD, Persian was the administrative language of the Iranian government as well as that of Mugal India (which made its own important contributions to the corpus of Persian literature).  It was also taken as the language of artistic expression by the aristocratic classes of Ottoman Turkey. Since the 10th century AD, with the entry of Mongolian, Turkish, and most recently, French and English words into New Persian, the language has continued to evolve. Today, Persian continues to grow and develop as one of the prominent languages of Central and West Asia and is the administrative and vernacular language of Iran, Tajikistan, and Afghanistan.

New Persian is most commonly written in the Arabic or Perso-Arabic script (so named because of some unique additions to the script in order to represent particular Persian sounds). This script, much like the Pahlavi script, was also an Aramaic derived writing system with varieties that were used to record the language of many north Arabian tribes before Islam. After Islam, the script, in its earlier form called Kufic, was adopted to replace the Pahlavi script in order to record New Persian. Apart from Perso-Arabic, early Persian texts are known to have been recorded in the Hebrew script, while the conquest of Central Asia by the Russian Empire in the 19th century also prompted the adoption of Cyrillic to record the Tajik dialect of Persian. Dari (the name commonly given to Persian spoken in Afghanistan) is also written in Perso-Arabic.

Kurdish is the most widely spoken Iranian language after Persian and it belongs to the Western branch of the family. It is today spoken in two major dialects (Surani and Kurmaji) with minor dialects, mostly Persian or Arabic influenced ones, spoken in major metropolitan areas such as Kermanshah, Sulaymaniyah, and Arzerum. Its territory stretches across west and northwestern Iran, eastern Turkey, and Northern Iraq, as well as various pockets in Armenia, Syria, and eastern Iran. Speculations on the origins of Kurdish have dominated much of the scholarship on the language. The common belief often held by the speakers, that the language is a development of the unattested *Old Median, mainly based on geographic reasons, is probably not valid. Instead, various features of Kurdish make it possible that the language is more closely related to Parthian or one of its related dialects. Kurdish possesses a rich literature, consisting of works of verse and prose, and is currently used as an administrative language in parts of the Iraqi Kurdistan. Kurdish is written in various scripts, using the Perso-Arabic script in Iran and Iraq, Latin alphabet in Turkey and Syria, and Cyrillic in Armenia and Azerbaijan.

Pashto is another major New Iranian language of the Eastern branch whose speakers live mainly in Southern Afghanistan and Northern Pakistan, where it is called Pakhto [xvi]. Pashto might be a modern development of a Middle Iranian language closely related to Bactrian, although no certain written evidence is available from the direct ancestor of this language. There has been a great rivalry between Persian and Pashto speakers in Afghanistan, resulting in Pashto being declared a national language of that nation. Pashto is among the most conservative and archaic of Iranian languages in its morphology and grammar and is an important tool in the study of Iranian dialects. Pastho is mainly written in a modified Perso-Arabic script.

Baluchi is another New Iranian language with many speakers, chiefly living in South-eastern Iran and western Pakistan, with scattered tribes in Afghanistan and even Tajikistan. Despite its current geographic position, Baluchi is thought to be a native of north-western Iran and part of the Western Iranian languages, closely related to Zazaki and more distantly to Kurdish. The speakers probably migrated from their original homeland in western Iran to their current location sometimes in the 10th-12th centuries and in the recent times have highly marginalised the Sistani population, probably descendants of the migrating Saka tribes and currently speakers of Persian. Baluchi is greatly influenced by Persian, but has kept many characteristics of its Middle Iranian ancestor. Baluchi has two main eastern and western dialects, of which the eastern dialect possesses the most speakers.  Baluchi literature is quite rich, although it has remained largely oral and only recently committed to writing. It has no established written tradition, but uses the Perso-Arabic for its everyday writing, although it has sometimes been written in the Devanagari script as well. Many modern recordings of Baluchi literature are done by modern Orientalists and are consequently in various forms of modified Latin alphabet.

Luri is the general name for a varied set of dialects spoken by nomadic and settled populations of south western and parts of western Iran. Two main dialects of this language are known, often referred to as Posht-Kuhi and Pish-Kuhi (“from behind/front of the mountain”), the former thought to be more archaic and less influenced by New Persian. Luri literature is commonly in form of folk songs and poems, although its influence can be seen in the poetry of the prominent poet Baba Taher. However, Luri does not possess an established written tradition and the language is haphazardly recorded in the Perso-Arabic alphabet and sometimes in the Latin alphabet by the philologists. Speculations about the origins of the language have been common in the scholarly circles, some designating it as an independent development of an unattested dialect of early Middle Persian. Close grammatical and phonological similarities between Persian and Luri, as well as preservation of archaisms that can be observed in Middle Persian, has strengthened this argument.

There are many other New Iranian languages whose speakers vary from a few thousand to hundreds of thousands, living in Iran, Transoxiana, Caucasus, Afghanistan, Iraq, Pamir Mountains, and even southern coasts of the Persian Gulf and the Sea of Oman. A few other worthy mentions are Caspian languages[15] and Sivandi in Iran, Ossetic (a descendant of Alano-Saka) in Caucasia, Yaghnobi (descendant of Sogdian) in Tajikistan, Tati (another candidate for the descendant of Old Median) in north and the north western Iran, and Kumzari in Oman. Almost all of the speakers of these minor languages are bi-lingual in the official languages of their respective countries; such is also the case with the speakers of Kurdish, Baluchi, and Pashto. Today, a major concern of linguists is the gradual disappearance of these languages in the face of dominant languages that surround them. Economic situation, lack of proper education, and social demands are continuously discouraging the younger generations from learning these languages, giving us the grim prospect of extinction for many of them.

[1] Aryan from Ariia-, the name with which the author(s) of Rig Veda and Avesta called themselves. Early Indo-Europeanists mistakenly used the term for the entire Indo-European languages. Today Aryan is used only as an alternative to the scientific term Indo-Iranian.

[2] Non-Dravidian languages of the Indian subcontinen

[3] The date of this movement is not universally agreed upon. Archaeological evidence suggest a period between 1600 to 1000 BCE. Some of the Iranian tribes, such as the Sakas, continued to live a nomadic life and eventually controlled all of the steppes of southern Siberia and northern Black Sea. Others settled in close proximity to the original Iranian homeland in Transoxiana.

[4] With a subdivision of early or Classical New Persian.

[5] Semitic language of the Aramaic people whose writing system itself was developed from the Phoenician alphabet.

[6] The efficiency and ease of writing Aramaic and its use of ink and parchment, as opposed to mud-brick for cuneiform, encouraged the Achaemenid administration to use it as the administrative language of the empire. This also had a root in the already established position of the Aramaic (in form of Imperial Aramaic) in the Near East as the administrative language of previous empires such as that of the Assyrians.  Usually, an Achaemenid administrator dictated a correspondence in his language to a scribe. The scribe would then translate and write the letter in Aramaic. In turn, the addressee would also employ the services of a scribe, who translated and read the letter from Aramaic to the native language of the administrator.

[7] Zarathushtra is often considered to be living in the second half of the second millennium BC among nomadic Iranian tribes of Central Asia.

[8] Heterograms or hozwarish, are Middle Iranian ideograms, written in their original Aramaic form, but read and pronounced in their Parthian (also Middle Persian and Sogdian) forms. For example, the logogram BRA was read pus in Middle Persian, and written as BRY, was read as puhr in Parthian, both meaning son.

[9] The first Arsacid capital, near Ashkabad in present day Turkmenistan. It might have originally been called “Mithradatokrta”.

[10] Second capital of the Arsacids, which is located near the modern city of Damqan in Iran.

[11] Athar al-Baqiyya and or Mal ul- Hind

[12] Mugaddimatul-Adab, by Zamakhshari

[13] The westerners knew Sakas of the Northern Black Sea region as Scythians.

[14] Republics of Northern and Southern Ossetia, located north of Georgia and South of Russia.

[15] Caspian languages generally include Taleshi, Gilaki, and Mazandarani.


