“The fruit fly has a meager 100,000 neurons in its nervous system. A computer can trivially solve calculus problems, search 100 billion web-pages, and calculate pi to thousands of decimal places. A fly brain can’t do this. However a fly brain can fly around in unknown terrain, avoid bumping into things, and is miraculously able to find flies of the opposite sex “out in the wild”, dock, and mate with them. Flies can adapt to changing conditions, find hidden food, walk around on uneven surfaces, and land upside down on ceilings. Computers can’t do any of these things reliably. And the fly brain is the size of a grain of sand.” - Paul King
A word is not just yet another unique string, it’s better conceptualised as a multi-dimensional organic compound. The most obvious group would be verbs, many of them have a specific “shape”, like a puzzle piece with its particularities — it affects how other words can be fitted together with it, sometimes even many words down the line.
He told John to wash himself. He promised John to wash himself. He told John to look after himself. He promised John to look after him. He promised John to look after himself.
A verb can dictate how it should be used and what kinds of words should follow or surround them or at least co-occur in the same phrase. Each verb comes with an instruction manual.
Now, consider this cross-linguistically:
Es funktioniert. => It works. Es gilt. => It applies. Es gilt. => It is true / It is valid.
You may say ‘funktionieren’ and ‘gelten’ are both verbs. But depending on context (semantics), it may or may not have a word-for-word equivalent in English.
gelten => to be valid
Its equivalent is not another word of the same lexical class, but a syntactical construction => v. + adj. that maintains the same semantic value.
In Mandarin, there are a large number of ‘complements' or 'effect particles', to the extent that they are often considered a separation lexical class or “phrasal component”.
kan => stem for ‘look’, ‘see’, ‘watch’, etc. kanźjen => to see kanźjen’lə => having become able to see
In fact, a lot of verbs (semantically equivalent to their English counterparts) are formed this way, by extending a verb root with “effect particles” that modify the meaning of the stem.
The construction is then lexicalised (by virtual of frequent use) and used as a whole — natives rarely ponder the etymology or the derivation of a lexical item. Yet, lexicalised doesn’t mean “fixed”, it’s still a dynamic multi-dimensional template, with shapes and notches and holes. For example, you can insert in-fixes into them:
kanbźjen => not able to see kanbźjen’lə => having become not able to see
Thus, back to the topic, it means these words cannot be learned as “paired equivalents”, as a semantic memory item on a flashcard, or some string-for-a-string association task. They should be learned as “operables” that can transform, expand, and click with and in-fit other nuance-modifying components.
What we hear does not represent a hi-fi rendering of the real sounds. When the sound reaches our perception, it has already been processed and distorted. It is the same kinds of ‘distortion’ and ‘optimising’ that give us the amazing ability to extract meaning in a noisy, cocktail party environment.
When novice learners try to ‘imitate’ and re-produce the L2 phonemes, the input signals must be squeezed through a special, convenient filter, removing details, and adding some stuff not really there, essentially to facilitate the production process. Therefore, this task adds pressure to make the input more heavily “processes”.
Whereas, if you imagine a mindfulness exercise or some meditation-like task, where you sit in a forest or a busy market, trying to discern the dozens of different sound sources, from wind to trees to multiple kinds of birds, without having to do anything about them — in that situation, the input is probably left more in its original, raw form.
And because of the selective nature of processing, stimuli can often be attenuated due to de-sensitisation, hence leading to loss of the range of signals that would be “perceivable” to begin with. It is the observer’s paradox in the introspective sense.
The lexicon is more than Zipfian. The top words are not just more frequent, but fundamentally “special”. We should consider some of them, namely “function words” to be entirely different categories, which are not learned the same way as general vocabulary.
I have long pondered about the “central coherence paradox” (the phenomenon of having a feeling of knowing each word within a sentence, yet still unable to comprehend the integrated meaning), and perhaps it comes down to two points:
The most useful words often have multiple senses;
Function words require much more practice to master.
Missing Senses from Mental Lexicon
The “most useful” words feature about 2,500 entries (some of which have multiple lexical classes collapsed into one). If you have seen a General List or some similar list of roughly that size, you’ll get the idea. My version of this list is based on the 1K, 2K, and Coxhead Academic lists hand-combed by Cobb.
And the theory is, if you “know” all these words, the vocabulary coverage in general reading (or even non-specialised academic) contexts is going to be so high, that you are “push through” above the “surfing threshold”, approximately 95% of all the types, and nearly all the facilitating tokens you encounter. At this point, you can always get through the text, and get most of the meaning, even guessing and figuring out some of the unknown words contextually here and there. It’s like a proficient reader visiting a Wikipedia page about some plant, there is a Latin name of the plant that he surely doesn’t know, but at least he knows it’s referring to that plant.
The problem is, these 2,500 words have more than 7,000 major senses or meaning groups. And for most novice learners, many of these are neglected. Thus when deciphering a sentence, the “translation item” they have stored (which gives them the false impression of know that word) does not match with the nuanced context of that particular usage — the meaning does not “click” — leaving the “puzzle” unsolved.
(Erroneous) Semantic Reduction
Just a side note, that’s another problem, when learners memorise these “words” as translation items or paired semantic memory items, they would need to de-compile, or unpack the lexical meaning each time, on the spot of “solving a sentence”, causing a bottleneck in processing.
This is like memorising each lexical item as a one-to-one translation pair, discarding all the scaffolding information in the process:
instruct => anweisen
Whereas, a full ‘scaffolded’ explanation (rather than just ‘translation’) would be, with examples from COBUILD:
If you instruct someone to do something, you formally tell them to do it. Someone who instructs people in a subject or skill teaches it to them.
This approach lists the multiple senses, and gives a constructed “episodic-like” context for each.
The next problem, is that for words like basic prepositions and functional pronouns, the usage is even harder to demonstrate through “reduced” semantic pairing schemes. The learner would benefit from a large battery of “transformative” and “generative” exercises to get a more solid grasp of the grammatical behaviours and meanings-in-usage of these “stop words”.
Basically, I’m imagining hand-picked example sentences, with accurate traslations (or explanations of meaning) that demonstrate these function words in different configurations. Then we could also replace and alter the filled-in words (like actors for the various thematic roles) to highlight the fungibility of those parts, and the essential structural functions of these “grammar words”.
Not All Subsets of the Lexicon are Created Equal
For a proficient user of a language, additions to lexicon is something very shallow. It’s like the brand name for some new canned tomato soup. It could be this or it could be that and it wouldn’t matter either way. And the acquisition is very rapid, and it doesn’t matter if you remembered it wrong.
Obviously, this is the kind of model assumed by various methods of building vocabulary: that they are treated as semantic memory items, like two lines on a flashcard. This may be okay for a subset of the vocabulary, that is, those words that are actually “shallow” in their lexical entanglement.
For the L2 subjects to learn to produce a new set of allophones, it may be helpful for them to hear what they are producing in real time.
The setup is similar to that of a voice-over studio, with two key elements: the mic feeding into the computer, and the enclosure headphones.
First, the demonstration should include two parts:
Videos and 3D diagrams from multiple angles, perhaps with slow motion, to explain the anatomical construction of the phone.
High quality audio samples from different actors, preferrably matched to vocal characteristics (sex, base pitch) of the learner.
Each cycle will go through these steps:
A sample phone is played, along with visual cueing by waveforms.
Learner tries to produce the same sound;
The production is fed in and played back by through the headphones instantaneously.
Waveform of the learner’s production is also drawn and shown on the screen, with analytics and colour cues.
With further speech recognition moduals, more detailed analyses and feedback can be given about the formants, timing, and other acoutic features. These can also be summarised as some simple indication of accuracy (that is, “whether the learner has hit the target”).
Each phone will be repeated through many cycles, until and even after reaching contiguous sessions of success, and over many days.
The end result should be easily solicited accurate production of the phone, and rapid self-correction when deviation occurs (“allophonic awareness”).
The training will probably start with monophthongs, since they are less ambiguous. As McGurk Effect demonstrates, the difference between stops (e.g. ‘da’ vs. ‘ga’) can be very minute. When played as isolated sounds, there is the closure, and then there is the release; it is difficult for both machines and novice learners to discern the acoustic differences (if there are any perceivable). Therefore, these phones should be trained in conjunction with their co-articulatory neighbours, e.g. ‘down’ vs. ‘gown’.
After the individual sounds, we might as well move onto clusters, going throught the probablistic list of phonotactically plausible onset and coda clusters, and rhymes.