Linguistic Curiosities

August 21, 2014 at 6:19pm
"We’ll burn the bridge when we come to it, and pull a rabbit out of the fire.”

How much does the human process of “caching” phrases resemble the NLTK co-location based generator?

Two possible cases making a distinction between genitive and attributive:

"Juho is a photographer."
His photos are on display at the gallery.”

"Juho is a model."
"Photos of him are found in this magazine.”

"Marjut is a mother."
"She takes her children to school.”

"Marjut is a kindergarten teacher."
"She takes her children on a school trip.”

Whose skull is this?”
"It belonged to the second fire victim."
"It belongs to the collection of Prof. Päävinen."

August 19, 2014 at 1:35pm
Syntactic reductionism

Sentences are in theory infinitely generative. Yet are they infinitely differentiated? If we think of sentences as made up of concatenative and nested sets of smaller component phrases, I for one, speculate that the categories for phrasal types may be finite.

Some differences are shallower than others. Proper nouns for example, may be completely fungible. It’s not linguistically interest whether the tomato soup is branded “Juho’s Tomato Soup” or “Marjut’s Tomato Soup”. The permutation is solely on the surface.

Then we go deeper, and look at adjectives:

Juho took the blue pill.
Marjut took the red pill.

Neither combination offers more or less of a ‘step-up’ in terms of “linguistic interestingness”. Pepper sauce or ketchup, a swim or a jog, to be or not to be, they are about the same.

I was trying to enumerate the “intrinsically different” phrasal types the other day, not as a systematic effort to capture a comprehensive list of them, but just as doodling as giving some examples:

one tomato
two tomatoes
your tomatoes
your tomatoes are red
are your tomatoes red?
where are your tomatoes?
where are they?
they are here
they are red
I have two tomatoes
a red one and a green one
give me the red tomatoe
you want a tomato
do you want a tomato?
do you want the red one?
I can give her a tomato
I should give her a tomato
I will give her a tomato
If I give Marjut a tomato, she will be happy.

Marjut is happy. <— this is essentially the same type as “The tomato is red.” or “Juho is tall.”

At some point, we might be pushing towards the edge of exhaustiveness, and any new permutations we come up with would closely resemble at least one of these types or prototypes.

Here we only covered the finiteness of categories on the axis of syntactic construction. There are at least two other possible axes: thematic frame and functional classification.

Thematic frame follows closely on the discussion about verbs. The phrase or sentence is like a stage, and there are various “actors” doing this or that to one another. For more, see:

McRae, K., & Ferretti and Liane Amyote, T. R. (1997). Thematic roles as verb-specific concepts. Language and cognitive processes, 12(2-3), 137-176.

Here I would again speculate, that the types of plots or scripts would eventually approach finiteness. It’s just like Hollywood movies, after a while they inevitably remake the same stories with new faces or props.

imagePicture from doi: 10.3389

By the way, here’s a cool interactive infographic of story types:


And the third axis was inspired by the Aarne-Thompson Index… it’s basically an effort to classify all sentences that have a purpose based on the function they intend to serve. It might be cool to even have a decimal coding system.

If we look through a corpus (say, a long list of sentences produced in conversational contexts), conceivably the instances can be categorised into functional groups and sub-groups: “phatic greeting”, “polite request (sub type such and such)”, “compliment”, “reassurance”, “reproach”, “offering”, “scoffing” and so on. Well, there may be only a few hundred types after all…


"[What we see as complex sentences] may in fact be [rather simple]…"

… when we put them through the category filters along these axes of analysis and reduction:

Functional Type: Contradiction, Assertion, …
Grammatical Structure: “X may be Y.”
Thematic Setting: Subject - Mood - Descriptive Predicate

August 17, 2014 at 1:16am
Central coherence templating

Research note: find out who claimed “word comprehension saves processing capacity for sentence comprehension"…


Robert DeKeyser (1998)
Richard Schmidt (2001)


  1. Central coherence, or sense making at the sentence level, is like puzzle solving on the run: matching a matrix of possible configurations to find at least one that clicks (when multiple ones click, you get puns)
  2. When an instance is successfully solved, i.e. the result makes ample sense with high clarity, that pattern of configuration or that “solution frame” may be saved as a solution primer or template, which makes future induction faster.
  3. When a sufficient number of these templates have been internalised, the speaker has built up an internal grammar that allows fast sentence comprehension. At this level, one may be able to start understanding conversations, even with some unknown words.

Anyway, I used to play Minecraft while listening to recordings of German phrases which sandwiched their translation in English (German sentence - English translation - German sentence). After a while, I subjective felt that I understood more German even with novel sentences…

Another anecdote is there was a “threshold point” where I started to understand English conversations (“Studio Classrom”) with simpler vocabulary (perhaps after being acquainted to the basic words…)

August 16, 2014 at 3:00pm
"Physical events are notoriously neutral."
Stephen R. Anderson (1974)

  • Phoneticians have long dreamt of charting a comprehensive inventory of “all the sounds humans can possible make”.
  • Yet only a subset of these sounds would be linguistically relevant — that is, being products of language. The rest would range from snoring to smacking one’s lips.
  • With advances in technology and research, phoneticians are able to observe in close detail more and more of the physical events that are happening, such as formants, voicing, air flow, and other physical properties.
    • And indeed, phonetic events (e.g. production of allophones) can be described in terms of these physical properties. One could specific, this particular sub-category of sound, shall have such and such characteristics (e.g. pre-voicing of 50±5ms).
  • Again, only some of these physical properties are linguistically interesting, some of them are just physics (mechanics) or products of someone having a cold.
    • It’s not obvious from the physical properties themselves which ones should be included in the categories of linguistic products. “Tsk-tsk” could be a word, or it could be just the result of a non-linguistic tic.
    • How do you know when smacking one’s lips is something a chimp could do, and when does it arise through a process of trying to convey meaning? The two events look awfully similar on the surface, both involving the physical events of smacking lips.
  • The finer the descriptions go (where is the tongue raised, how much is it raised), the more neutral they seem in terms linguistic relevance.
  • Had we had a God’s eyes into the top-down process inside someone’s mind, how that carries over to production, to articultion, to motor control, and then to the motion of the actual organs, and then to the physical events and the physical properties that eminate from these events — then we have an a posteriori view of what’s going on.
  • As of now, there’s no way of bootstraping and back-paddling the way up in reverse, to figure out a priori what’s happening from the physical observations.

August 11, 2014 at 5:48pm
Just to amuse you, here’s a list of idiosyncratic sayings I collected from German-speaking people (while in Finland I was surrounded by many of them). Whenever they said something funny, I would (figuratively) lick the corner of my grin and pull out my notebook or phone: “That one’s going in!”

I do that. (‘Do’ is said like ‘du’)
You do that.
I go now.
We meet us in the city.
We see us.
I just stood up.
Make a photo.
We make sport.
I shower myself.
I brush me the teeth.
My tooth pasta is frozen.
I learn you a song.
I can borrow you the broom.
What do we now?
Did you throw the USB out correctly?
I will overthink it.
What for a table is it?
What is for aftertable?
Guys, what is now with the ferry.
Should we go whereother?
He explained me something.
Are you becoming a flu?
I pull myself on.
I hurry myself.

August 9, 2014 at 2:14am
Linear Mandarin: An Experiment

Chapter 11 Strong Verbs

Here’s a list of some basic verbs:

(be acquaited with)   => ŕnnŝ
(have info about)       => ẑŕdá
(have skills in)            => hui

be                      => ŝŕ
be at                  => zai
make, do           => zɔ                 
have                  => joó
be able to          => nɜɜŋ, hui

sleep                    => ŝui (goes with ‘-źjá’)
eat                        => ĉŕ
drink                     => hɜ
taste (transitive)    => ĉaaŋ
touch                    => mɔ, pɜŋ

give                      => geé
take                      => naa

doubt                   => huæjí
hope                    => śivaŋ
assume                => jívéi
like                       => śíhuan
(dislike)                 => b’śíhuan
find (look for)        => ẑaá
desire                   => śjaaŋjá

come                  => læi
go                       => tśü
meet                   => źjän

start                     => kæŝŝ
stop (=not)           => bú, b’ + v. + ‘la

see                      => kan
hear                    => tiŋ

say                    => ŝɔ
ask                    => wɜn
feel (opine)        => źüedd

buy         =>  maæ
sell          => mæ
help         => baŋ, baŋẑɜ
heißen     => źjá

Níhaá. Vuɔ źjau Piitəŕ.  =>  Hallo. Ich heiße Peter.

Linear Mandarin: An Experiment

Chapter 10 Affirmation, Negation

Affirmation Phrases

Mnn.  =  “Eh-hmm.”

Dui.  = “Correct.”

Ŝŕ. = “It is.”

Ŝŕdd. = “That’s the case.”

Haá. = “Good/okay.”

Óké. = “Okay.”

NOTE: if the verb is given in a question, it’s more common to reply using that verb or its verb stem. See below.

Negation Particle

This particle can be transcribed in various ways:

bú, bu, bə, bb, b’

It can be added to modal verbs or common verbs:

hui jóujúŋ ma?  =  Can you swim?
Bu’hui. =  No (I cannot).


Similar to answering a verb-led question in Finnish, the answer in Mandarin can be produced by repeating the verb, in either the affirmative or the negative form.

hui jóujúŋ ma?  =  Can you swim?

Hui.  = Yes (I can).
Bu’hui. =  No (I cannot).

Ŝŕ ma?

Ŝŕ'a. OR Ŝŕ'ja.  =  It is. OR It is, yeah.
Bŝŕ.  =  No (it’s not).

Sometimes the verb may be shortened (stripped down to the stem) in an answer or subsequent references:

ĉójän ma?  =  Do you smoke?
B’ĉó, śjeśe.  =  No (I don’t), thanks.

"Non-Effect" Negation

Other than the “to not”, there is also a “not yet”:

Ta húilæ’la ma?  =  She/he has-come’back yet?
Méjó.  =  Not-yet.

It also functions as a verb in the negative form:

ŝŕŕźen ma?  =  (Do) you have time?
Méjó.  =  No (I haven’t).

August 6, 2014 at 1:50pm
Linear Mandarin: An Experiment

Chapter 9: Modal Verbs

These are also called modal auxiliaries. They are typically very strong verbs that express some degree or extent of what is functionally the mood, which can be thought of along several axes:

Intent: planning, prediction, counter-factuals
Desire: volition, liking, appreciation
Freedom: likelihood, ability, permission

In each language, the nuance and specificity provided by off-the-shelf modals vary. Sometimes the categories are collapsed.

be bound to -> “We must/have to do that.”
benefit from -> “I need some chocolate.”
be pressed to -> “I must/should go now.”
be emphatically -> “It IS!” “I WILL.”

have a strong desire for -> “I want some chocolate!”
have the willingness to -> "I could/would do that."
expect enjoyment from -> "I could have some coffee."
request politely -> "Would you …?" "Could you…?"

have the ability to -> “Can you swim?”
have the possibility to -> "It might rain today."
present likelihood of -> “That must be it.” “It will rain.”
be allowed to -> “You may eat the chocolate now.”

For now, I’ll use the German modals as a guideline:

Axis 1: müssen <—> sollen

Jídiŋ źi’ẑu…
Lit. = Very-importantly (you) must remember…

Zæẑaŕ kæĉз biśü ká zuɔbiŕ.
Lit. = Around-here to-drive (you) must stay on-the-left-side.

jiŋgæ źinkuæ tśü śüeśjá.
Lit. = I should ASAP go-to the-school.

deé śjaaŋ jíśa.
Lit. = I have to think (about it) for-a-bit.

Axis 2: wollen <—> mögen

śjaaŋ ŝui jíhuŕ.
Lit. = I want to sleep for-a-while.

Méiŕзn jüänji tśü.
Lit. = Nobody is-willing to-go.

Axis 3: können <—> dürfen

Nззŋ kan’źen ma?
Lit. = Can (one) look’and-see?

hui jóujúŋ ma?.
Lit. = You can swim?

Źeig jízz kзjí ẑзз’tśílæ.
Lit. = This chair can be-folded’up.

Źaŕ bŕaŋ tíŋĉз.
Lit. = Here (one) is-not-allowed to-park.

Linear Mandarin: Sweeping Changes in Orthography


Trailing schwa in the clitic tense marker is now spelled with ‘a’:

Old: 'lə  =>  New: 'la

Double consonants indicate syllabicity or gemination, sometimes used to replace vowels like /ɜ/ or /ɻ/ that are no longer necessarilly there.

ŝŝ, ẑẑ

Labial approximant is now spelled with ‘v’, it generally doesn’t affect the meaning if you say it as labial-dental.

Old: u  =>  New: v

Character Set

Affricates are now consistently marked by circumflex:

Ŝ  ŝ
Ĉ  ĉ
Ẑ  ẑ

The central vowel is now spelled with ‘з’:

Old: ə  =>  New: З з


Palatal context:

ji+V  now uniformly spelled as j+V

Some diphthongs are now spelled with single characters:

Old: ai   =>  New: Æ æ

Old: ei   =>  New: É é

Old: au  =>  New: Á á

Old: əu  =>  New: Ó ó

Old: uo  =>  New: Ɔ ɔ

Qualified Vowels

Two of them have changed:

Old: ii   =>  New: Í í

Old: uu  =>  New: Ú ú