Linguistic Curiosities

July 24, 2014 at 7:55pm
0 notes

Troubles with vocabulary in Mandarin

A word is not just yet another unique string, it’s better conceptualised as a multi-dimensional organic compound. The most obvious group would be verbs, many of them have a specific “shape”, like a puzzle piece with its particularities — it affects how other words can be fitted together with it, sometimes even many words down the line.

He told John to wash himself.
He promised John to wash himself.
He told John to look after himself.
He promised John to look after him.
He promised John to look after himself.

A verb can dictate how it should be used and what kinds of words should follow or surround them or at least co-occur in the same phrase. Each verb comes with an instruction manual.

Now, consider this cross-linguistically:

Es funktioniert. => It works.
Es gilt. => It applies.
Es gilt. => It is true / It is valid.

You may say ‘funktionieren’ and ‘gelten’ are both verbs. But depending on context (semantics), it may or may not have a word-for-word equivalent in English.

gelten => to be valid

Its equivalent is not another word of the same lexical class, but a syntactical construction => v. + adj. that maintains the same semantic value.

In Mandarin, there are a large number of ‘complements' or 'effect particles', to the extent that they are often considered a separation lexical class or “phrasal component”.

kan => stem for ‘look’, ‘see’, ‘watch’, etc.
kanźjen => to see
kanźjen’lə => having become able to see

In fact, a lot of verbs (semantically equivalent to their English counterparts) are formed this way, by extending a verb root with “effect particles” that modify the meaning of the stem.

The construction is then lexicalised (by virtual of frequent use) and used as a whole — natives rarely ponder the etymology or the derivation of a lexical item. Yet, lexicalised doesn’t mean “fixed”, it’s still a dynamic multi-dimensional template, with shapes and notches and holes. For example, you can insert in-fixes into them:

kanbźjen => not able to see
kanbźjen’lə => having become not able to see

Even looking a bit agglutinative:

kandəukanbźjen (adj.) => not even visible

Ẑəśje śinpjän həənśjaau, ŕəujään kandəukanbźjen.
= These chips (are) tiny, to-the-naked-eye not-even-visible.

Thus, back to the topic, it means these words cannot be learned as “paired equivalents”, as a semantic memory item on a flashcard, or some string-for-a-string association task. They should be learned as “operables” that can transform, expand, and click with and in-fit other nuance-modifying components.

0 notes

The problem with L2 imitation tasks

What we hear does not represent a hi-fi rendering of the real sounds. When the sound reaches our perception, it has already been processed and distorted. It is the same kinds of ‘distortion’ and ‘optimising’ that give us the amazing ability to extract meaning in a noisy, cocktail party environment.

When novice learners try to ‘imitate’ and re-produce the L2 phonemes, the input signals must be squeezed through a special, convenient filter, removing details, and adding some stuff not really there, essentially to facilitate the production process. Therefore, this task adds pressure to make the input more heavily “processes”.

Whereas, if you imagine a mindfulness exercise or some meditation-like task, where you sit in a forest or a busy market, trying to discern the dozens of different sound sources, from wind to trees to multiple kinds of birds, without having to do anything about them — in that situation, the input is probably left more in its original, raw form.

And because of the selective nature of processing, stimuli can often be attenuated due to de-sensitisation, hence leading to loss of the range of signals that would be “perceivable” to begin with. It is the observer’s paradox in the introspective sense.

Children’s resistance to corrections may be of a different cause…

0 notes

Auditory awareness exercise

We should also go to very busy places (malls, train stations, etc.) and record hi-fi bi-aural samples, then ask the listener: how many different sounds can you hear in the environment?

By the way, I just found some really weird stuff on biaural recording:

0 notes

I’m thinking about making a diagramming tool.


When you load a sentence, it will be automatically tokenised, with some auto-tagging too… then you can drag-and-drop the various components into a hierarchical topology:

                sie sich
                        den Anordnungen
                                    ihrer Eltern.


  • Assignment of thematic “stage” for a clause, i.e. templates of who’s doing what to whom with what…
  • "Entanglement" of separable verbs, reflexive verbs, verbs that require certain cases or other words…
  • Reference connection for pronouns…
  • Drag-and-drop assignment of the target of qualifying adj. or adv.
  • The diagram will automatically re-adjust itself as a force directed tree-graph. Demo:

July 23, 2014 at 8:13pm
0 notes

Iterative consturction examples:

She waited.

She waited until midnight.

She waited until after midnight.

She waited until well after midnight.

0 notes

The lexicon is more than Zipfian. The top words are not just more frequent, but fundamentally “special”. We should consider some of them, namely “function words” to be entirely different categories, which are not learned the same way as general vocabulary.

I have long pondered about the “central coherence paradox” (the phenomenon of having a feeling of knowing each word within a sentence, yet still unable to comprehend the integrated meaning), and perhaps it comes down to two points:

  • The most useful words often have multiple senses;
  • Function words require much more practice to master.

Missing Senses from Mental Lexicon

The “most useful” words feature about 2,500 entries (some of which have multiple lexical classes collapsed into one). If you have seen a General List or some similar list of roughly that size, you’ll get the idea. My version of this list is based on the 1K, 2K, and Coxhead Academic lists hand-combed by Cobb.

And the theory is, if you “know” all these words, the vocabulary coverage in general reading (or even non-specialised academic) contexts is going to be so high, that you are “push through” above the “surfing threshold”, approximately 95% of all the types, and nearly all the facilitating tokens you encounter. At this point, you can always get through the text, and get most of the meaning, even guessing and figuring out some of the unknown words contextually here and there. It’s like a proficient reader visiting a Wikipedia page about some plant, there is a Latin name of the plant that he surely doesn’t know, but at least he knows it’s referring to that plant.

The problem is, these 2,500 words have more than 7,000 major senses or meaning groups. And for most novice learners, many of these are neglected. Thus when deciphering a sentence, the “translation item” they have stored (which gives them the false impression of know that word) does not match with the nuanced context of that particular usage — the meaning does not “click” — leaving the “puzzle” unsolved.

(Erroneous) Semantic Reduction

Just a side note, that’s another problem, when learners memorise these “words” as translation items or paired semantic memory items, they would need to de-compile, or unpack the lexical meaning each time, on the spot of “solving a sentence”, causing a bottleneck in processing.

This is like memorising each lexical item as a one-to-one translation pair, discarding all the scaffolding information in the process:

instruct => anweisen

Whereas, a full ‘scaffolded’ explanation (rather than just ‘translation’) would be, with examples from COBUILD:

If you instruct someone to do something, you formally tell them to do it.
Someone who instructs people in a subject or skill teaches it to them.

This approach lists the multiple senses, and gives a constructed “episodic-like” context for each.

Function Words

The next problem, is that for words like basic prepositions and functional pronouns, the usage is even harder to demonstrate through “reduced” semantic pairing schemes. The learner would benefit from a large battery of “transformative” and “generative” exercises to get a more solid grasp of the grammatical behaviours and meanings-in-usage of these “stop words”.

Basically, I’m imagining hand-picked example sentences, with accurate traslations (or explanations of meaning) that demonstrate these function words in different configurations. Then we could also replace and alter the filled-in words (like actors for the various thematic roles) to highlight the fungibility of those parts, and the essential structural functions of these “grammar words”.

Not All Subsets of the Lexicon are Created Equal

For a proficient user of a language, additions to lexicon is something very shallow. It’s like the brand name for some new canned tomato soup. It could be this or it could be that and it wouldn’t matter either way. And the acquisition is very rapid, and it doesn’t matter if you remembered it wrong.


Obviously, this is the kind of model assumed by various methods of building vocabulary: that they are treated as semantic memory items, like two lines on a flashcard. This may be okay for a subset of the vocabulary, that is, those words that are actually “shallow” in their lexical entanglement.

But not for the highly frequent words.

And definitely not for the function words.

July 19, 2014 at 3:35pm
0 notes

Looped Feedback Experiment

For the L2 subjects to learn to produce a new set of allophones, it may be helpful for them to hear what they are producing in real time.

The setup is similar to that of a voice-over studio, with two key elements: the mic feeding into the computer, and the enclosure headphones.


First, the demonstration should include two parts:

  • Videos and 3D diagrams from multiple angles, perhaps with slow motion, to explain the anatomical construction of the phone.
  • High quality audio samples from different actors, preferrably matched to vocal characteristics (sex, base pitch) of the learner.


Each cycle will go through these steps:

  • A sample phone is played, along with visual cueing by waveforms.
  • Learner tries to produce the same sound;
  • The production is fed in and played back by through the headphones instantaneously.
  • Waveform of the learner’s production is also drawn and shown on the screen, with analytics and colour cues.
  • With further speech recognition moduals, more detailed analyses and feedback can be given about the formants, timing, and other acoutic features. These can also be summarised as some simple indication of accuracy (that is, “whether the learner has hit the target”).
  • Each phone will be repeated through many cycles, until and even after reaching contiguous sessions of success, and over many days.
  • The end result should be easily solicited accurate production of the phone, and rapid self-correction when deviation occurs (“allophonic awareness”).


The training will probably start with monophthongs, since they are less ambiguous. As McGurk Effect demonstrates, the difference between stops (e.g. ‘da’ vs. ‘ga’) can be very minute. When played as isolated sounds, there is the closure, and then there is the release; it is difficult for both machines and novice learners to discern the acoustic differences (if there are any perceivable). Therefore, these phones should be trained in conjunction with their co-articulatory neighbours, e.g. ‘down’ vs. ‘gown’.

After the individual sounds, we might as well move onto clusters, going throught the probablistic list of phonotactically plausible onset and coda clusters, and rhymes.

July 18, 2014 at 12:39am
0 notes

Awareness and Sensitivity

Auditory Awareness:

Which ones sounds like a bear / bird?

Phonological Awareness:

Which words rhyme with ‘bird’?

Segmentation, Blending, Insersion, Substitution, Reversal

(Parallel to Production Errors)

Boundary Detection:

Words - Syllables - Idealised Components

Quality Sensitivity:

's' vs. ‘sh’ vs. ‘tch’

Quantity Sensitivity:

'Stadt' vs. ‘Staat’

Minimal Pairs

Multi-Foil, Cycle-Until-Pass Leitner Boxes

Dolby Demo:

July 10, 2014 at 2:04pm
1 note

When we study acoustic evidence, we are looking for corresponding traces of a multi-layered process, a chain reaction from mental lexical representation, to articulatory implementation (gestures), to acoustic features.

Every vocal tract is somewhat different in details, yet people manage to produce an acceptable range around an acoustic target, of what can be categorised as “the same sound”. It is also possible to achieve the same acoustic target with different “articulatory strategies”, just like a buzz generator can substitute for the vocal cords.

The question is, how much conscious control do we have over each step of this chain process? When we try to imitate a sound, what exactly happens?

0 notes

Selective attention is necessary because of the costliness of cognitive processing power. This “distortion” starting from intake is not at all a  “defect”, but an advantage, optimised for what it’s meant to do.

Humans are very good at focusing on one speaker amid a sea of noise and other voices at a cocktail party, or follow a face of someone but not the face on that person’s t-shirt. Machines are not so good at these, yet, because they were built to perform a different kind of computation. Our perception of the world is not a “true representation” of reality, but a constructed, convenient, and useful representation.

Why do we have trichromatic vision and not a broader or narrower spectrum? Well, arguably individuals are distributed along a varied scale… Let’s say, the average man is built like so, because his long line of ancestors have been shaped by selection pressure over the evolutionary history to occupy a particular “niche”, to thrive in their environment using certain specialties, including the various faculties of cognition.