ummute

Pronunciation

Connected speech: why 'what are you doing' sounds like 'whatcha doin'

26 June 2026 · 7 min read

There is a moment most English learners recognise: you have studied hard, you can read a newspaper, you can write a decent email — and then a fluent speaker opens their mouth and it sounds like a completely different language. Connected speech in English is the main reason this happens. Understanding it will change both how you listen and how you speak.

The written form of English is a poor map of the spoken form. Words on a page sit in neat, separate boxes. In the mouth, at normal conversational speed, those boxes dissolve. Sounds borrow from neighbouring sounds, some disappear entirely, and whole clusters of words fuse into something that looks almost unrecognisable when written down phonetically. None of this is careless or uneducated. It is simply how fluency works.

The four main processes

Connected speech is not one single phenomenon. It is a family of related processes, each doing something slightly different to the sounds at word boundaries.

Linking (catenation)

When one word ends in a consonant and the next begins with a vowel, speakers typically link the two sounds together, moving the consonant across the boundary. The phrase turn it off becomes something closer to tur-ni-toff — the n and the t are carried forward. Pick it up sounds like pi-ki-tup.

This is not sloppiness. It is the natural result of the mouth preparing for the next sound before it has finished the current one. If you try to insert a tiny pause between each word to keep them separate, the result sounds clipped and effortful, like someone reading a list of items rather than speaking.

Elision

Elision is the disappearance of a sound — usually at the end of a word or in a consonant cluster that is difficult to produce at speed. The t in next door is almost never pronounced; the phrase sounds like nex' door. Handbag loses its d entirely in natural speech, becoming closer to hambag. The t in mostly is routinely dropped: mos'ly.

Common candidates for elision:

  • Final t and d before a consonant: last nightlas' night, old manol' man
  • h at the start of unstressed pronouns: tell himtell 'im, give hergive 'er
  • The middle syllable of three-syllable words: comfortablecomf'table, interestingint'resting

Assimilation

This is where one sound changes to become more like a neighbouring sound. The phrase that person illustrates it well: the t at the end of that moves towards the p that follows, and many speakers produce something close to thap person. The mouth is already forming the p shape before it finishes the t, and the result is a compromise sound.

A particularly common assimilation in English: n before p or b often shifts to m. So one pound sounds like wum pound, and ten pence becomes tem pence. The boundary between the words blurs.

Reduction (weak forms)

English has a large set of function words — and, but, of, to, for, at, them, can, have — that carry little meaning by themselves and are almost always unstressed in natural speech. Unstressed syllables in English are heavily compressed. They use the schwa sound (the short, neutral uh) rather than the full vowel the spelling suggests.

And becomes ən or even just n. For becomes . Of becomes əv or ə. Can becomes kən. Have (as an auxiliary) becomes əv or even ə.

This is why fish and chips sounds like fish 'n' chips, and a cup of tea sounds like a cup ə tea. The function words are there — a careful listener can perceive them — but they are not given their full dictionary pronunciation.

The phrase that started this

What are you doing? is the example in the title, and it rewards close analysis.

Written out, the phrase has five words and a total of seven vowels and consonants worth noting. In fluent conversational speech, this is what actually happens:

  1. What loses its final t (elision before a vowel is optional, but the t often weakens).
  2. Are reduces to ə — barely a syllable on its own.
  3. You reduces to or even further, and the t of what (if retained) and the y of you assimilate into an affricate: ch.
  4. Doing loses its final g sound (the ng is there, but the hard g disappears) and the unstressed second syllable shrinks.

The result: whatcha doin'. Every sound change is explicable. Nothing is random. A phonetician could trace each step from the full form to the reduced one.

Say this sentence aloud, slowly first, then at a natural conversational pace:

What are you doing this weekend?

At speed, aim for: Whatcha doin' this weekend? Notice that this and weekend are content words — they carry the meaning — and so they keep their full pronunciation. Connected speech is not about swallowing everything. It is about compressing what is unimportant and leaving what matters clear.

Why this matters for listening

When you know these processes exist, the rapid speech of a fluent speaker becomes less mysterious. You stop hearing a blur and start hearing a pattern. Didja is did you. Gonna is going to. Wanna is want to. Coulda is could have. Innit is isn't it. None of these are corruptions. They are the natural output of a mouth moving efficiently through familiar phrases.

Knowing the rules also tells you where to direct your attention. In connected speech, the content words — nouns, main verbs, adjectives, adverbs — are usually given their full pronunciation and carry primary stress. Function words are reduced. So when you are struggling to parse a sentence, listen for the stressed, unreduced syllables. They are the load-bearing walls. Everything around them is mortar.

Why this matters for speaking

If you speak English word by word, placing equal weight and separation on each one, you will be understood — but you will sound effortful and unnatural, and listeners will unconsciously find it harder to process. Rhythm in English is built on the contrast between stressed and unstressed syllables, and connected speech is what creates that rhythm. Without linking, without reduction, without elision, the music disappears.

This does not mean you must perform a perfect cockney accent or adopt any particular regional variety. It means you should let sounds flow across word boundaries rather than artificially separating them, allow function words to reduce, and not feel obliged to pronounce every letter that the spelling implies. Spelling and speech are different systems in English — they always have been.

If you want to understand more about how practising spoken English actually works, the key insight is that awareness must come before habit. You cannot change what you cannot hear. Spending time listening to the same short phrases spoken at natural speed, then at slowed speed, and then attempting to reproduce them, is not a shortcut — it is the work.

A practical starting point

Take three very common phrases you use daily. Find a recording of a fluent speaker saying them — a podcast, a film, a conversation. Listen for what actually happens to the sounds at word boundaries. Write down what you hear phonetically, not what the spelling would suggest. Then practise saying each phrase at three speeds: slow and careful, moderate, and conversational.

The goal is not to sound like someone you are not. The goal is to have the full range of spoken English available to you — so that you can choose how you speak rather than being limited by gaps in what you know.

Connected speech is the difference between English as a foreign language and English as a living one. Once you hear it clearly, you will find it everywhere — and your own speech will start to move differently, more freely, and with the natural grain of the language rather than against it.