September 12, 2014 at 5:44pm
The hiddenness of abstraction
From: a discussion about what we can extract from spectrograms.
What make the analysis so confounding and “hidden” is the many layers of conversion from abstraction to realisation. What we can observe and collect are only at the surface, many steps removed from their “top-down” origins. And as Anderson said quite cryptically: "Physical events are notoriously neutral.”
Phonologists tend to think (and phoneticians and researchers working on speech synthesis have come to realise this too) of speech articulation as a continuous stream of “gestures”. It’s kind of like interpretive dance, or choreography, where you are trying to convey contrastive meaning using perceivable motion. It’s not what was conventionally thought of as a string of idealised “targets” stitched together.
Now imagine what you can capture are images, then you have to solve vision, starting from edge detection and all that, and then the anatomy and stick figures, and then step-by-step you get to the system of movement, and then perhaps to meaning. I would speculate it’s notoriously difficult for computers to “understand” interpretive dance.
Phonological features and classes (the elements out of which we make speech, and the rules for doing so) are abstract. There used to be an impression that they must have some articulatory and then acoustic targets (as in, “this is the sound to produce”); but apparently no.
- People who lost front teeth or just had dental aneasthetics can still speak, and we can still understand them to an extent.
- The phoneme /r/ varies so greatly in manner and place, it’s very hard to explain their relations through acoustics.
- Sign languages have concepts and processes analogous to phonemes, co-articulation, rhyming and so on; only they use hand shapes, postion, movements, and facial gestures instead.
Phonology is realised through anatomy, but not bound by it. You could imagine an alien species with a completely different set of organs as articulators, or imagine sci-fi implants giving us the ability to use very novel gestures — it just so happened that our ancestors went down the path of utilising the vocal tract. And if you look at cross-linguistic data, even just the vocal tract can have a very diverse range of expressive possibities, some less obvious than others, from clicks to tones to labial protrusion to breathiness to ingressives and more.
The spectrogram gives us spectral and temporal resolution, depending on the maths applied to it, we can see the individual beat of the vocal folds, we can see harmonics, and we can see resonance characteristics, and the patterns of acoustic energy. But after all physical events are just pointers to the real thing, like moving shadows cast by hand puppets.
And naturally when we look at these, we interpret them as both physical phenomena and “linguistically interesting cues” that we are searching for. When coding in Praat, for example, you would hear people talk about: voicing, formants, turbulance, closures, “energy droping off”, movements, glottalisation, periodicity, and so on — multiple levels of representation and interpretation (acoustics, anatomic, phonetic) mixed altogether.
What we want to extract are not intrinsic in the sound data, therefore they cannot be analysed in isolation. They must be interpreted with reference to a whole range of constraints and predictions from phonetics to phonology, from anatomy to sociolinguistic to pragmatics and beyond.
At this point, I have pretty much started to ramble, so I’ll just leave it there with some blog posts on the topic:
September 10, 2014 at 11:00am
When a noun becomes a verb, the logical structure or construction of its semantic value is not always predictable:
to paint (vt.)
to water (vt.)
to paint (vi.)
September 4, 2014 at 1:35pm
Re-published from an answer to:
Why are the French “throat R”, the Scottish / Italian “trilled R”, and the spectrum of English / Irish / American “rolled R” usually considered as instances of one “R”, not three independent sounds?
Sometimes it may not be meaningful to consider such categories across languages, especially if they are historically far apart.
The choices of symbols when inventing an orthography are partly convenient (using approximations of what’s already in practice), e.g. the Japanese ‘r’ (which could have been ‘l’), partly arbitrary, cf. Cherokee, and partly a product of various legacies (cf. Cyrillics).
With that caveat out of the way, like consider the /r/ phoneme in just one language, say German, where it exhibits such large variety of variations, from an alveolar trill, to a central vowel, to some flap in the back.
You would think, there must be some mental representation for the “underlying R”, because when you get used to the quirkiness of some other accent (where the R is rendered very differently in a systematic way), you can still understand that speaker, maybe even forget they are saying the R differently.
There are two possible hypotheses:
We ignore other people’s “accented” variations because of phonetic similarity. It’s just like listening to someone who just had dental works with aneasthetics, or someone with a lisp, or someone who says a lot of things with the retroflex… Do you ignore these differences because these sounds are acoustically similar?
We have “abstract” representations of phonemes that often correlate to but are not bound by phonetic primatives — we think of them as distinct or non-distinct, not because how close they are acoustically, but because they belong to different mental categories.
For now, I would say Hypothesis Two seems to win out:
- 'S' and 'SH' can be very similar acoustically (in fact, it's hard for non-natives and Siri to tell them apart in isolation), but English speakers all know 'sit' and 'shit' are different.
- Vowel height varies across different accents, sometimes a person’s ‘bEd’ may be so low that it’s lower than another person’s very high ‘bAd’, but it’s possible for these two people to have a smooth conversation.
What are phonemes for?
In brief, they have a “phonological function" in a given language, that is, to form units of meaning (morphemes).
Yet their distribution and combination are subject to the phonological rules of that language (phonotactics).
For example, in English, we would have a class called “nasals”, which includes /m/, /n/, and /ŋ/. But there are some subsets of restrictions of how these phonemes can be used in forming words: only /m/ and /n/ can occur word-initially.
Thus, what we have is a “phonological class”, that can be used in a certain way in word forming, that is defined by their shared features. In this case the features are [+nasal] [-dorsal]. These are the necessary set of features to distinguish them from the rest of the phonemes, which are subject to different sets of rules for their use in forming words.
But /m/ and /n/ share a lot of other feature, e.g. [+consonant] [+voicing] and so on. These are considered “redundant” features that wouldn’t affect whether the resulting product is inside or outside the phonological class — which is already filtered/selected by the “defining features”.
Therefore, in the case of ‘R’:
- It belongs to a specific sub-subclass of /r/, that can only be used in a certain way to form words — these are the rules of that phonological class.
- This specific class is selected by the defining features. The features can describe what is required in terms of place, manner, voicing and so on.
- The requirement can be in plus or minus forms, e.g. “it must be at this place” or “it must not be at this place”. That is, a phoneme can be defined by what it is not in terms of features.
- For example, the phonological class /m/ and /n/ belong to is defined as “must be nasal” AND “must not be dorsal”.
- As long as the defining features are satified, these rest of the features they also have are considered redundant and somewhat fungible. If you say the /m/ biting your lip, it’s still /m/.
- Therefore, that specific class to which /r/ belongs have some defining features — but does not restrict the place, manner, and the range of features observed in the many varieties of /r/.
- And some of these defining features are in the negative form. You can say the /r/ as a tap or as a trill, as long as it is not /l/, that is, as long as it has the [-lateral] feature.
Also from Michael Proctor
It depends what you mean by ‘compare’ across time and space. A similar sound change is currently in progress in parts of Quebec (http://ycmorin.net/wp-content/uploads/2012/11/2012ms-Apical_to_uvular_R.pdf
), so it’s informative to compare how young speakers differ from their parents, and how the drivers of the sound change differ from other Canadians and contemporary continental French speakers. Direct comparisons of phonetic and phonological forms with older and more removed varieties would obviously be more difficult, and in many cases more speculative - we don’t really know how speakers of the earlier prestige French dialects (who originally drove the spread of uvular variants throughout Europe) spoke.
One reason why it’s been difficult to account for this type of sound change is that there’s no consensus about the goals of production of rhotics, their phonological representation, or even what unifies rhotics as a class. Ohala, Stevens, Lindau proposed that a lowered F3 is the common characteristic, and others (Delattre & Freeman 1968) suggest that there’s more articulatory commonality amongst different types of ‘r’-sounds.
Since tongue tip trills are also produced with a tongue body gesture (Zawadzki & Kuehn 1980), it’s possible to imagine a sound change where the coronal gesture is lenited, leaving only a dorsal constriction, which could be realized as [x], [R], etc. Felicity [Cox]’s recent work (http://dx.doi.org/10.1017/S0025100314000036
) shows how rhoticity is variably perceived in different phonological environments, which has important implications for this type of sound change too.
Re-published from a Quora answer to:
"Is it possible to quantify the number of words in a language?"
I would say the practical answer is yes.
Bear in mind that there are qualitative differences embedded in the quantity, that it’s not just linear addition. This is like asking how many apples you have. There may be larger ones and smaller ones, different varieties, some ripening, some decaying to the point that it’s bordering becoming something else…
But to think of this question as an engineering problem, we can simplify “languages” as shared sets of vocabularies. Naturally there’s a specialising effect: printers know more about fonts, zoologists are familiar with animals, and Finns may have more words for snow. But then again, if you think about learning a second language like French or Spanish or German — you would aim for a set of vocabulary that will allow you to “use that language”.
Then you can imagine the job of lexicographers: making dictionaries:
- Why does someone refer to a dictionary? Because he encountered some word that he doesn’t know…
- And the dictionary contains an entry that gives him that information.
- There is a frequency effect: if there’s a word you don’t know, it’s probably a rarer word.
- Then we can have dictionaries of difference sizes: a pocket-sized one, or a very big one — how many words to include is prioritised by frequency.
And this progression from a 20,000-word dictionary to a 100k dictionary is not linear. For a beginner, a thin dictionary is a good and practical goal. It’s less likely that you would encounter a word that you couldn’t find in there. Then as you become more advanced, you might encounter rarer words where you need a bigger dictionary. Until at some point, you might be standing at the frontier of a very niche specialised sub-field, and you need to invent some new words.
The level of competency in a second language learning setting is well studied — I suppose the same standard would just move on a continuum for native speakers as well (say, developmentally).
When you look at this: Common European Framework of Reference for Languages, the question as an engineering problem becomes simpler:
To reach an equivalent level of C2, what is the size of the dictionary one needs to master?
It’s quite viable to quantify the “C2-Equivalent Vocabulary Size”.
A corpus-centric sociolinguitic approach to this question is only helpful in surfacing a lot of the issues faced by this question, but it does not lead to an answer.
I still think this question is most relevant to the field of lexicography: the practical art of making a dictionary. It is all that is concerned with the variety and variability of words, minus morphology, syntax, orthography, and all those other fields of linguistic competence or technology.
Again, let’s consider a dictionary for an adult native speaker who already has the linguistic competence:
- Why would he refer to a dictionary?
- Because he encountered a ‘word' the meaning of which cannot be derived from morphology or syntax or compositional analysis or logical screening e.g. “Ah, that’s an onomatopeic sequence”, or “That’s a typo" or "That’s just some jibberish"…
Let’s consider the “principle of compositionality”:
- If I say: “There’s a pink dog in the living room.”
- You would know what I mean and be able to imagine the situation.
- Even if I say: “There’s a pink-ish doggie in the living room.” It doesn’t really change much, you still get it. Because you have those faculties for morphological and syntactic processing.
But if someone says: “There’s a DALEK in the TARDIS.” Then, a competent but naïve speaker will have to refer to a dictionary, because the meanings cannot be deduced from other linguistic processes.
After those lexical entries are resolved, you can easily move to “There’s a dalekishness in his tardismanship.” Right? Because they are not new lexical entries, they are just new lexical compositions.
As shown in the example of “naikimlyiia
”, a verb can have half a million forms. In English, there’s “Antidisestablishmentarianism
”. It’s not sensible to cover infinity with a morphological table (look-up data tables).
There must be an algorithmic solution, you put a string through, and the function breaks it up to sensible components. Then for speed, we can cache the common look-ups in a table.
Also, in learning, words should be used as examples (or Bayesian evidence) for rules (lexical hypotheses), rather than each as a new separate rule.
In some languages, a whole sentence can be just a single word:
Polysynthetic languages typically have long “sentence-words” such as the Yupik
which means “He had not yet said again that he was going to hunt reindeer.”
See: Polysynthetic language
What cannot be deduced compositionally, becomes lexicalised, e.g. “He kicked the bucket.” It’s no longer the composition of ‘kick’ and ‘bucket’.
Before man wrote the first tokeniser, there was already vocabulary. Just like in a dictionary, its size can be quanitified, the number of lexical entries can be counted, the number of senses listed under each entry is finite.
If we could do a frequency distribution on not just the text string and lemmatisation, but also accounting for the senses — then we’ll have a good picture of the cumulative curve — and at some point it must approach a limit.
Indeed, there are more variables to be accounted for, such as how transparent or opaque a derivation is, even modulated by their frequency effect, e.g. if you knew ‘ridonculous’ is from ‘ridiculous’ and ‘donkey’ and you heard it used a number of time in the last year…
These variables can still be modelled, and quantified.
"We’ll burn the bridge when we come to it, and pull a rabbit out of the fire.”
How much does the human process of “caching” phrases resemble the NLTK co-location based generator?
Two possible cases making a distinction between genitive and attributive:
"Juho is a photographer."
”His photos are on display at the gallery.”
"Juho is a model."
"Photos of him are found in this magazine.”
"Marjut is a mother."
"She takes her children to school.”
"Marjut is a kindergarten teacher."
"She takes her children on a school trip.”
“Whose skull is this?”
"It belonged to the second fire victim."
"It belongs to the collection of Prof. Päävinen."
August 19, 2014 at 1:35pm
Sentences are in theory infinitely generative. Yet are they infinitely differentiated? If we think of sentences as made up of concatenative and nested sets of smaller component phrases, I for one, speculate that the categories for phrasal types may be finite.
Some differences are shallower than others. Proper nouns for example, may be completely fungible. It’s not linguistically interest whether the tomato soup is branded “Juho’s Tomato Soup” or “Marjut’s Tomato Soup”. The permutation is solely on the surface.
Then we go deeper, and look at adjectives:
Juho took the blue pill.
Marjut took the red pill.
Neither combination offers more or less of a ‘step-up’ in terms of “linguistic interestingness”. Pepper sauce or ketchup, a swim or a jog, to be or not to be, they are about the same.
I was trying to enumerate the “intrinsically different” phrasal types the other day, not as a systematic effort to capture a comprehensive list of them, but just as doodling as giving some examples:
your tomatoes are red
are your tomatoes red?
where are your tomatoes?
where are they?
they are here
they are red
I have two tomatoes
a red one and a green one
give me the red tomatoe
you want a tomato
do you want a tomato?
do you want the red one?
I can give her a tomato
I should give her a tomato
I will give her a tomato
If I give Marjut a tomato, she will be happy.
Marjut is happy. <— this is essentially the same type as “The tomato is red.” or “Juho is tall.”
At some point, we might be pushing towards the edge of exhaustiveness, and any new permutations we come up with would closely resemble at least one of these types or prototypes.
Here we only covered the finiteness of categories on the axis of syntactic construction. There are at least two other possible axes: thematic frame and functional classification.
Thematic frame follows closely on the discussion about verbs. The phrase or sentence is like a stage, and there are various “actors” doing this or that to one another. For more, see:
McRae, K., & Ferretti and Liane Amyote, T. R. (1997). Thematic roles as verb-specific concepts. Language and cognitive processes, 12(2-3), 137-176.
Here I would again speculate, that the types of plots or scripts would eventually approach finiteness. It’s just like Hollywood movies, after a while they inevitably remake the same stories with new faces or props.
Picture from doi: 10.3389
By the way, here’s a cool interactive infographic of story types: http://designthroughstorytelling.net/periodic/
And the third axis was inspired by the Aarne-Thompson Index… it’s basically an effort to classify all sentences that have a purpose based on the function they intend to serve. It might be cool to even have a decimal coding system.
If we look through a corpus (say, a long list of sentences produced in conversational contexts), conceivably the instances can be categorised into functional groups and sub-groups: “phatic greeting”, “polite request (sub type such and such)”, “compliment”,
“reproach”, “offering”, “scoffing” and so on. Well, there may be only a few hundred types after all…
"[What we see as complex sentences] may in fact be [rather simple]…"
… when we put them through the category filters along these axes of analysis and reduction:
Functional Type: Contradiction, Assertion, …
Grammatical Structure: “X may be Y.”
Thematic Setting: Subject - Mood - Descriptive Predicate
Central coherence templating
Research note: find out who claimed “word comprehension saves processing capacity for sentence comprehension"…
Robert DeKeyser (1998)
Richard Schmidt (2001)
- Central coherence, or sense making at the sentence level, is like puzzle solving on the run: matching a matrix of possible configurations to find at least one that clicks (when multiple ones click, you get puns)
- When an instance is successfully solved, i.e. the result makes ample sense with high clarity, that pattern of configuration or that “solution frame” may be saved as a solution primer or template, which makes future induction faster.
- When a sufficient number of these templates have been internalised, the speaker has built up an internal grammar that allows fast sentence comprehension. At this level, one may be able to start understanding conversations, even with some unknown words.
Anyway, I used to play Minecraft while listening to recordings of German phrases which sandwiched their translation in English (German sentence - English translation - German sentence). After a while, I subjective felt that I understood more German even with novel sentences…
Another anecdote is there was a “threshold point” where I started to understand English conversations (“Studio Classrom”) with simpler vocabulary (perhaps after being acquainted to the basic words…)
"Physical events are notoriously neutral."
Stephen R. Anderson (1974)
- Phoneticians have long dreamt of charting a comprehensive inventory of “all the sounds humans can possible make”.
- Yet only a subset of these sounds would be linguistically relevant — that is, being products of language. The rest would range from snoring to smacking one’s lips.
- With advances in technology and research, phoneticians are able to observe in close detail more and more of the physical events that are happening, such as formants, voicing, air flow, and other physical properties.
- And indeed, phonetic events (e.g. production of allophones) can be described in terms of these physical properties. One could specific, this particular sub-category of sound, shall have such and such characteristics (e.g. pre-voicing of 50±5ms).
- Again, only some of these physical properties are linguistically interesting, some of them are just physics (mechanics) or products of someone having a cold.
- It’s not obvious from the physical properties themselves which ones should be included in the categories of linguistic products. “Tsk-tsk” could be a word, or it could be just the result of a non-linguistic tic.
- How do you know when smacking one’s lips is something a chimp could do, and when does it arise through a process of trying to convey meaning? The two events look awfully similar on the surface, both involving the physical events of smacking lips.
- The finer the descriptions go (where is the tongue raised, how much is it raised), the more neutral they seem in terms linguistic relevance.
- Had we had a God’s eyes into the top-down process inside someone’s mind, how that carries over to production, to articultion, to motor control, and then to the motion of the actual organs, and then to the physical events and the physical properties that eminate from these events — then we have an a posteriori view of what’s going on.
- As of now, there’s no way of bootstraping and back-paddling the way up in reverse, to figure out a priori what’s happening from the physical observations.
Just to amuse you, here’s a list of idiosyncratic sayings I collected from German-speaking people (while in Finland I was surrounded by many of them). Whenever they said something funny, I would (figuratively) lick the corner of my grin and pull out my notebook or phone: “That one’s going in!”
I do that. (‘Do’ is said like ‘du’)
You do that.
I go now.
We meet us in the city.
We see us.
I just stood up.
Make a photo.
We make sport.
I shower myself.
I brush me the teeth.
My tooth pasta is frozen.
I learn you a song.
I can borrow you the broom.
What do we now?
Did you throw the USB out correctly?
I will overthink it.
What for a table is it?
What is for aftertable?
Guys, what is now with the ferry.
Should we go whereother?
He explained me something.
Are you becoming a flu?
I pull myself on.
I hurry myself.