This is the notion that instances of actual language use eminate from Platonian representations, of abstract models of the language, and go through a process of sytematic decay, distortion, or realisation when they are produced.
Running speech is a de/re-generated product of idealised speech. Colloquial grammars are de/re-generated from formal models of expression. Hence it’s difficult for learners to faithfully reproduce authentic expressions through mere imitation, because the output itself is the product of a process of decay — they haven’t got the original, and they cannot let it decay the right way.
An analogy is apple juice, you get it from apples, that’s fairly simple. But to concoct an artificial flavour that tries to imitate apple juice, is hard.
I always thought the Germans say ‘den' like 'din’, now I know I’m not alone.
Harding, S., & Meyer, G. (2003). Changes in the perception of synthetic nasal consonants as a result of vowel formant manipulations. Speech Communication, 39(3), 173-189.
"The nasal prototypes /m/ and /n/ were used in all experiments, together with a range of preceding vowels differing only in the frequency and transitions of their second formant (F2). When no explicit transitions were present between the vowel and nasal, the perception of each nasal changed from /m/ to /n/ as the vowel F2 increased. Introducing explicit formant transitions removed this effect, and listeners heard the appropriate percepts for each nasal prototype. However, if the transition and the nasal prototype were inconsistent, the percept was determined by the transition alone. In each experiment, therefore, the target frequency of the vowel F2 transition into the nasal consonant determined the percept, taking precedence over the formant structure of the nasal prototype.”
The genesis of language: a summary based on Arbib’s talk
1.1 Extracting meaning: example of visual processing, from edge detection to thematic analysis — feature extraction and contextual probabilities — snapped onto a schema of recognition.
1.2 Central coherence: from features to themes, with flexibility and tolerance for variations and noise => robust reduction.
1.3 Abstract representations: ability to generalise => robust induction.
2. The repertoire of manual operations: “reach -> grip -> retrieve” => a mental store of available options: sequential actions towards proximal and ultimate goals. See: Alstermark et al. (1981).
3. Mirror neurons: registering operations without performing them, i.e. a mental representation of actions/movement/gestures in others.
4. Implications for fitness: imitation, transmission of skills; competitive advantage in anticipating others’ moves; empathy or theory of mind.
5. Ritualisation: the evolution and emergence of bodily signals — the ability to achieve a function (e.g. determine hierarchy) without performing the full sequence of available actions (e.g. fighting to death).
6. Now the picture is almost complete:
- Linking actions to meaning -> performable actions serving a goal.
- Registering actions (gestures) -> mirrored recognition.
- From meaning to gesture -> ritualisation.
- Robustness in recognition -> allows abstraction.
7. Now the gesture or symbol referring to a meaning or idea can be far removed from the original sequence.
For example, when you pull out your smart phone, and “dial” a number by touching the screen. The gestures with which you communicate with the computer are really many steps away from the etymology, there’s no dial and you are not really dialing anything — except you are performing an action signified by such a word.
And that essentially what a lexis allows you to do: representing ideas using abstract symbols that are far removed from the original action sequence or quality or thing or even its associated pentomimes.
Actually the above only goes to the level of bonobos on lexigrams, that’s only about one third of the story. The second step is to explain how speech is basically “audible gestures”, and how a combinatorial encoding system takes over — along the expansion of lexicon (Acredolo & Goodwyn, 1985; Capirci et al., 1996; Butcher, 2000; Iverson & Goldin-Meadow, 2005) where it goes from one-word to one-word-plus (gesture) to two-word. See also: Anisfeld, M., Rosenberg, E. S., Hoberman, M. J., & Gasparini, D. (1998). Lexical acceleration coincides with the onset of combinatorial speech. First Language, 18(53), 165-184.
Then the third part is explaining the emergence of generative grammar… a rule-based system for planning and executing sequences. Perhaps see: Fitch, W. T. (2011). The evolution of syntax: an exaptationist perspective. Frontiers in evolutionary neuroscience, 3.
[Video]: in slow-motion, you can see the cat modifying the “tactical positioning” of its footholds as well as various “action components” with high precision in executing a well-coordinated leap sequence.
Saying “Ahhh” can be just another gesture, it’s no more “removed” or abstract than clapping hands (which happens to be an audible gesture) — only that you are “clapping” your vocal folds to make the sound.
Consider these “units of meaning” with no sonorant components and seemingly non-conformative to how English phonology would define a word:
- "Tsk tsk…"
They are closer to “audible gestures” than to lexical items with a re-combinatory encoding scheme (that is, made up by combining and re-arranging phonemes).
This difference in between or threshold is what I alluded to as the “switch" from referential gestures to linguistic phonology. I have two speculations about this:
1. This “phonology module” — though this module may be psycholinguistically but not neurologically real i.e. it’s actually an interplay of various exapted (rather than de novo) sub-systems, as Arbib would say — emerged at some point of the evolutionary course. And it gave its bearers (our common ancestors) an advantage because the vastly expanded lexical capacity of a combinatorial system.
2. This “module” matures along some point of the developmental course, roughly corresponding to the sharp “kink” or inflection point you see in the vocabulary curve. The child would move from controlled gestures and gesture-like utterances to multiple gestuers and expanded one-word vocabulary and coordinated word-plus-gesture uses, and eventually to a switch onto a phonologically based model.
Thoughts of the Day:
- Why do diphthongs move along with monophthongs when sound shifts happen? It must be that, the “vowel targets” underlying both categories are actually doing the moving, that is, the sign posts defining the vowel space are shifting, rather than exemplar positions or definitions of individual sounds.
- Transcription is a model. It should be useful but needs not be true. Approaching what is true is the work of theories vetted by empirical investigation. The (potential) danger of phonetic realism is that it conflates interpretation with documentation, and applicability with validity.
September 30, 2014 at 3:55pm
Questions of the Day:
- Assimilation has often been described in terms of how neighbouring segments affect each other — and the output are thought of as segments with changed features — what if the products are something else altogether?
- To what extend are segments real? We often assume they have abstract mental representations, and each of them holds a bundle of features together — they are seen as units on which phonological rules operate — are these units an illusion?
September 17, 2014 at 12:46am
Topics of potential interest:
- Multi-word Units on the Frequency List
- Effect of Mass Exposure to Citations
- Typology: Permutation of Constituents
- Measuring Comprehension with Eyetracking
- Efficient Sample Test of Vocabulary Size
- Memory, Delusions, and Hypnosis
- Dance Dance Revolution for Prosody
- Platonic Models of Generative Register
- Visual Feedback for Vowel Targeting
- Neat, Informative Graphs with ggplot2
So far it’s only Tuesday…
September 12, 2014 at 5:44pm
The hiddenness of abstraction
From: a discussion about what we can extract from spectrograms.
What make the analysis so confounding and “hidden” is the many layers of conversion from abstraction to realisation. What we can observe and collect are only at the surface, many steps removed from their “top-down” origins. And as Anderson said quite cryptically: "Physical events are notoriously neutral.”
Phonologists tend to think (and phoneticians and researchers working on speech synthesis have come to realise this too) of speech articulation as a continuous stream of “gestures”. It’s kind of like interpretive dance, or choreography, where you are trying to convey contrastive meaning using perceivable motion. It’s not what was conventionally thought of as a string of idealised “targets” stitched together.
Now imagine what you can capture are images, then you have to solve vision, starting from edge detection and all that, and then the anatomy and stick figures, and then step-by-step you get to the system of movement, and then perhaps to meaning. I would speculate it’s notoriously difficult for computers to “understand” interpretive dance.
Phonological features and classes (the elements out of which we make speech, and the rules for doing so) are abstract. There used to be an impression that they must have some articulatory and then acoustic targets (as in, “this is the sound to produce”); but apparently no.
- People who lost front teeth or just had dental aneasthetics can still speak, and we can still understand them to an extent.
- The phoneme /r/ varies so greatly in manner and place, it’s very hard to explain their relations through acoustics.
- Sign languages have concepts and processes analogous to phonemes, co-articulation, rhyming and so on; only they use hand shapes, postion, movements, and facial gestures instead.
Phonology is realised through anatomy, but not bound by it. You could imagine an alien species with a completely different set of organs as articulators, or imagine sci-fi implants giving us the ability to use very novel gestures — it just so happened that our ancestors went down the path of utilising the vocal tract. And if you look at cross-linguistic data, even just the vocal tract can have a very diverse range of expressive possibities, some less obvious than others, from clicks to tones to labial protrusion to breathiness to ingressives and more.
The spectrogram gives us spectral and temporal resolution, depending on the maths applied to it, we can see the individual beat of the vocal folds, we can see harmonics, and we can see resonance characteristics, and the patterns of acoustic energy. But after all physical events are just pointers to the real thing, like moving shadows cast by hand puppets.
And naturally when we look at these, we interpret them as both physical phenomena and “linguistically interesting cues” that we are searching for. When coding in Praat, for example, you would hear people talk about: voicing, formants, turbulance, closures, “energy droping off”, movements, glottalisation, periodicity, and so on — multiple levels of representation and interpretation (acoustics, anatomic, phonetic) mixed altogether.
What we want to extract are not intrinsic in the sound data, therefore they cannot be analysed in isolation. They must be interpreted with reference to a whole range of constraints and predictions from phonetics to phonology, from anatomy to sociolinguistic to pragmatics and beyond.
At this point, I have pretty much started to ramble, so I’ll just leave it there with some blog posts on the topic:
September 10, 2014 at 11:00am
When a noun becomes a verb, the logical structure or construction of its semantic value is not always predictable:
to paint (vt.)
to water (vt.)
to paint (vi.)
September 4, 2014 at 1:35pm
Re-published from an answer to:
Why are the French “throat R”, the Scottish / Italian “trilled R”, and the spectrum of English / Irish / American “rolled R” usually considered as instances of one “R”, not three independent sounds?
Sometimes it may not be meaningful to consider such categories across languages, especially if they are historically far apart.
The choices of symbols when inventing an orthography are partly convenient (using approximations of what’s already in practice), e.g. the Japanese ‘r’ (which could have been ‘l’), partly arbitrary, cf. Cherokee, and partly a product of various legacies (cf. Cyrillics).
With that caveat out of the way, like consider the /r/ phoneme in just one language, say German, where it exhibits such large variety of variations, from an alveolar trill, to a central vowel, to some flap in the back.
You would think, there must be some mental representation for the “underlying R”, because when you get used to the quirkiness of some other accent (where the R is rendered very differently in a systematic way), you can still understand that speaker, maybe even forget they are saying the R differently.
There are two possible hypotheses:
We ignore other people’s “accented” variations because of phonetic similarity. It’s just like listening to someone who just had dental works with aneasthetics, or someone with a lisp, or someone who says a lot of things with the retroflex… Do you ignore these differences because these sounds are acoustically similar?
We have “abstract” representations of phonemes that often correlate to but are not bound by phonetic primatives — we think of them as distinct or non-distinct, not because how close they are acoustically, but because they belong to different mental categories.
For now, I would say Hypothesis Two seems to win out:
- 'S' and 'SH' can be very similar acoustically (in fact, it's hard for non-natives and Siri to tell them apart in isolation), but English speakers all know 'sit' and 'shit' are different.
- Vowel height varies across different accents, sometimes a person’s ‘bEd’ may be so low that it’s lower than another person’s very high ‘bAd’, but it’s possible for these two people to have a smooth conversation.
What are phonemes for?
In brief, they have a “phonological function" in a given language, that is, to form units of meaning (morphemes).
Yet their distribution and combination are subject to the phonological rules of that language (phonotactics).
For example, in English, we would have a class called “nasals”, which includes /m/, /n/, and /ŋ/. But there are some subsets of restrictions of how these phonemes can be used in forming words: only /m/ and /n/ can occur word-initially.
Thus, what we have is a “phonological class”, that can be used in a certain way in word forming, that is defined by their shared features. In this case the features are [+nasal] [-dorsal]. These are the necessary set of features to distinguish them from the rest of the phonemes, which are subject to different sets of rules for their use in forming words.
But /m/ and /n/ share a lot of other feature, e.g. [+consonant] [+voicing] and so on. These are considered “redundant” features that wouldn’t affect whether the resulting product is inside or outside the phonological class — which is already filtered/selected by the “defining features”.
Therefore, in the case of ‘R’:
- It belongs to a specific sub-subclass of /r/, that can only be used in a certain way to form words — these are the rules of that phonological class.
- This specific class is selected by the defining features. The features can describe what is required in terms of place, manner, voicing and so on.
- The requirement can be in plus or minus forms, e.g. “it must be at this place” or “it must not be at this place”. That is, a phoneme can be defined by what it is not in terms of features.
- For example, the phonological class /m/ and /n/ belong to is defined as “must be nasal” AND “must not be dorsal”.
- As long as the defining features are satified, these rest of the features they also have are considered redundant and somewhat fungible. If you say the /m/ biting your lip, it’s still /m/.
- Therefore, that specific class to which /r/ belongs have some defining features — but does not restrict the place, manner, and the range of features observed in the many varieties of /r/.
- And some of these defining features are in the negative form. You can say the /r/ as a tap or as a trill, as long as it is not /l/, that is, as long as it has the [-lateral] feature.
Also from Michael Proctor
It depends what you mean by ‘compare’ across time and space. A similar sound change is currently in progress in parts of Quebec (http://ycmorin.net/wp-content/uploads/2012/11/2012ms-Apical_to_uvular_R.pdf
), so it’s informative to compare how young speakers differ from their parents, and how the drivers of the sound change differ from other Canadians and contemporary continental French speakers. Direct comparisons of phonetic and phonological forms with older and more removed varieties would obviously be more difficult, and in many cases more speculative - we don’t really know how speakers of the earlier prestige French dialects (who originally drove the spread of uvular variants throughout Europe) spoke.
One reason why it’s been difficult to account for this type of sound change is that there’s no consensus about the goals of production of rhotics, their phonological representation, or even what unifies rhotics as a class. Ohala, Stevens, Lindau proposed that a lowered F3 is the common characteristic, and others (Delattre & Freeman 1968) suggest that there’s more articulatory commonality amongst different types of ‘r’-sounds.
Since tongue tip trills are also produced with a tongue body gesture (Zawadzki & Kuehn 1980), it’s possible to imagine a sound change where the coronal gesture is lenited, leaving only a dorsal constriction, which could be realized as [x], [R], etc. Felicity [Cox]’s recent work (http://dx.doi.org/10.1017/S0025100314000036
) shows how rhoticity is variably perceived in different phonological environments, which has important implications for this type of sound change too.
Re-published from a Quora answer to:
"Is it possible to quantify the number of words in a language?"
I would say the practical answer is yes.
Bear in mind that there are qualitative differences embedded in the quantity, that it’s not just linear addition. This is like asking how many apples you have. There may be larger ones and smaller ones, different varieties, some ripening, some decaying to the point that it’s bordering becoming something else…
But to think of this question as an engineering problem, we can simplify “languages” as shared sets of vocabularies. Naturally there’s a specialising effect: printers know more about fonts, zoologists are familiar with animals, and Finns may have more words for snow. But then again, if you think about learning a second language like French or Spanish or German — you would aim for a set of vocabulary that will allow you to “use that language”.
Then you can imagine the job of lexicographers: making dictionaries:
- Why does someone refer to a dictionary? Because he encountered some word that he doesn’t know…
- And the dictionary contains an entry that gives him that information.
- There is a frequency effect: if there’s a word you don’t know, it’s probably a rarer word.
- Then we can have dictionaries of difference sizes: a pocket-sized one, or a very big one — how many words to include is prioritised by frequency.
And this progression from a 20,000-word dictionary to a 100k dictionary is not linear. For a beginner, a thin dictionary is a good and practical goal. It’s less likely that you would encounter a word that you couldn’t find in there. Then as you become more advanced, you might encounter rarer words where you need a bigger dictionary. Until at some point, you might be standing at the frontier of a very niche specialised sub-field, and you need to invent some new words.
The level of competency in a second language learning setting is well studied — I suppose the same standard would just move on a continuum for native speakers as well (say, developmentally).
When you look at this: Common European Framework of Reference for Languages, the question as an engineering problem becomes simpler:
To reach an equivalent level of C2, what is the size of the dictionary one needs to master?
It’s quite viable to quantify the “C2-Equivalent Vocabulary Size”.
A corpus-centric sociolinguitic approach to this question is only helpful in surfacing a lot of the issues faced by this question, but it does not lead to an answer.
I still think this question is most relevant to the field of lexicography: the practical art of making a dictionary. It is all that is concerned with the variety and variability of words, minus morphology, syntax, orthography, and all those other fields of linguistic competence or technology.
Again, let’s consider a dictionary for an adult native speaker who already has the linguistic competence:
- Why would he refer to a dictionary?
- Because he encountered a ‘word' the meaning of which cannot be derived from morphology or syntax or compositional analysis or logical screening e.g. “Ah, that’s an onomatopeic sequence”, or “That’s a typo" or "That’s just some jibberish"…
Let’s consider the “principle of compositionality”:
- If I say: “There’s a pink dog in the living room.”
- You would know what I mean and be able to imagine the situation.
- Even if I say: “There’s a pink-ish doggie in the living room.” It doesn’t really change much, you still get it. Because you have those faculties for morphological and syntactic processing.
But if someone says: “There’s a DALEK in the TARDIS.” Then, a competent but naïve speaker will have to refer to a dictionary, because the meanings cannot be deduced from other linguistic processes.
After those lexical entries are resolved, you can easily move to “There’s a dalekishness in his tardismanship.” Right? Because they are not new lexical entries, they are just new lexical compositions.
As shown in the example of “naikimlyiia
”, a verb can have half a million forms. In English, there’s “Antidisestablishmentarianism
”. It’s not sensible to cover infinity with a morphological table (look-up data tables).
There must be an algorithmic solution, you put a string through, and the function breaks it up to sensible components. Then for speed, we can cache the common look-ups in a table.
Also, in learning, words should be used as examples (or Bayesian evidence) for rules (lexical hypotheses), rather than each as a new separate rule.
In some languages, a whole sentence can be just a single word:
Polysynthetic languages typically have long “sentence-words” such as the Yupik
which means “He had not yet said again that he was going to hunt reindeer.”
See: Polysynthetic language
What cannot be deduced compositionally, becomes lexicalised, e.g. “He kicked the bucket.” It’s no longer the composition of ‘kick’ and ‘bucket’.
Before man wrote the first tokeniser, there was already vocabulary. Just like in a dictionary, its size can be quanitified, the number of lexical entries can be counted, the number of senses listed under each entry is finite.
If we could do a frequency distribution on not just the text string and lemmatisation, but also accounting for the senses — then we’ll have a good picture of the cumulative curve — and at some point it must approach a limit.
Indeed, there are more variables to be accounted for, such as how transparent or opaque a derivation is, even modulated by their frequency effect, e.g. if you knew ‘ridonculous’ is from ‘ridiculous’ and ‘donkey’ and you heard it used a number of time in the last year…
These variables can still be modelled, and quantified.