The story of the English language
The story of the English language
Kathleen Hubbard’s answer to the question “How do we know what we know about Proto-Indo-European and other languages that died out before they were written down? [Kathleen is assistant professor of linguistics at the University of California, San Diego. She describes herself as a “recovering Indo-Europeanist.”] I have also appended some bibliography at the end.
The hard-core indo-europeanist may be interested in the TITUS Indo-European Resources project in Stuttgart (eventually in many languages, but currently only German and Spanish).
Okay, in 1786 Sir William Jones announced to the Asiatick Society of
Calcutta that Sanskrit had to be related to Greek and Latin, touching
off what would come to be known as the Neogrammarian move from
philology (the comparison of texts) to what we now consider
linguistics.
If you were to see a whole huge raft of cognates like the following,
you might come to the same conclusion (Avestan is an ancestor of
Persian, it’s the language of the Zoroastrian texts):
Sanskrit Avestan Greek Latin Gothic English
pita pater pater fadar father
padam poda pedem fotu foot
bhratar phrater frater brothar brother
bharami barami phero fero baira bear
jivah jivo wiwos qius quick
(’living’)
sanah hano henee senex sinista senile
virah viro wir wair were(wolf)
(’man’)
tris tres thri three
deka decem taihun ten
satem he-katon centum hund(rath) hundred
Now, cognates mean “pair/set of words descended from a common
ancestor”, not just words that happen to look like each other — i.e.
“coffee” is not a cognate of kaffe, kahawa, cafe, etc.; that’s an
instance of lots of borrowing of the same word by various languages.
What we’re talking about here are historically related words. When we
know we’ve got cognates, we can talk about reconstruction.
Reconstruction revolves around the notion that sound change is
mechanical and exceptionless. If a proto-/p/ becomes /f/ in a
daughter language, it does so in regular fashion (that’s the
heuristic you have to use). If there are exceptions, there must be
some other conditioning factor. Using this assumption, we can
conclude that some common ancestor produced Sanskrit /bh/, Avestan
/b/, Greek /ph/ (which is NOT /f/, it’s aspirated /p/ at the stage
we’re talking about), Latin /f/, and Germanic /b/. Now the question
is, what was that common ancestor?
The way we decide what segment must have been there in the proto-
language involves things we know independently about how sounds
behave, based partly on how sounds alternate synchronically in
languages (i.e. rules that operate to change one sound to another in
different contexts during a single stage of a language), partly on
what we know about acoustics and articulation of speech sounds (which
tells us what directionality is more or less likely), and partly on
experience. Pure gold for the historical linguist is ATTESTED
(written) ancient forms.
For instance, we know that the modern Romance languages (French,
Italian, Spanish, Portuguese, Romansch, Rumanian, etc.) are descended
from Latin. And we have lots of attested Latin to work with — so we
have clear, unambiguous examples of how some sound changes have
worked. Likewise in other language families where ancient texts are
preserved (i.e. ancient religious texts in Semitic etc.) So we have
some real-life models on which to build our guesses.
So anyway, you reconstruct Proto-Indo-Iranian, and Proto-Germanic,
and Proto-Balto-Slavic, and Proto-Celtic, and ultimately you have a
pretty good idea of what — on the basis of very rigorous analysis —
must have been the forms of certain words/roots in
Proto-Indo-European, before it split up. Now, this method does NOT
yield reliable results further back than about 10,000 years, because
beyond that, too much change has occurred for there to be any
recognizable remnants (that we can be sure about anyway) in attested
languages. (Pace Greenberg et al. who get lots of popular press.
One real triumph of this method of reconstruction was the Laryngeal
Hypothesis: it was known that there were some troublesome places in
Indo-European where the sound changes seemed not to be behaving in
their usual regular way; things were happening to vowels and
sometimes consonants that couldn’t be easily explained based on what
we saw in the attested languages. Ferdinand de Saussure in the late
19th century said that there had to be a set of three segments in the
proto-language that had not survived in any of the daughter languages
– he was fairly conservative about claiming what they must have
been, but he called them laryngeals and pointed out the precise
locations where they must have occurred. Many years later, when a
bunch of texts in Turkey were finally decoded and we knew we were
looking at the ancient Anatolian language Hittite, the oldest
attested Indo-European language — voila: there were the laryngeals,
exactly where Saussure had predicted they must be just on the basis
of careful reconstruction.
There are other wrinkles, like you can do internal reconstruction
under some circumstances, and there are things other than sounds that
point to common ancestry (morphology, syntax, etc.). And semantic
change is a really neat thing to trace, though much slipperier than
sound change. But the general answer to your question is, we know
what we know about Proto-Indo-European because of the Comparative
Method, which arose in the 19th century and gives us a rigorous way
to compare sounds in daughter languages and determine what the
antecedent sounds must have been.
Oh, and the PIE reconstructions for the above words are (always
preceded by a star to show they’re unattested, followed by a hyphen
if they’re roots that get suffixed, and with hedges if a vowel or
something is uncertain — consonants are much easier to reconstruct
than vowels — oh yes and @ stands for schwa here):
*p@ter- father
*ped- foot
*bhrater- brother
*bher- carry
*gwei- live
*sen- old
*wi-ro- man (derived from *wei@- vital force)
*trei- three
*dekm- ten
*dkm-tom- hundred (derived from *dekm- ten)