Make Pragmatics Labels Usable in Text Annotation
Text annotation projects often struggle with pragmatic labels that seem abstract or difficult to apply consistently. This article presents practical strategies for making these labels more usable, drawing on insights from experienced annotation specialists and computational linguists. Learn how to anchor decisions in concrete textual evidence and use rewritable intent techniques to achieve better neutrality in your annotations.
Anchor Decisions to Textual Evidence
When we need stable labels for things like politeness, sarcasm, or stance across languages, we set boundaries around what is visible in the text, not what an annotator feels the writer probably meant.
One rule that stopped confusion was:
Label only what the text gives you evidence for. Do not infer intention unless the wording clearly supports it.
For sarcasm, that made a big difference. Instead of saying "sarcasm is when someone means the opposite," we trained annotators to look for a clear mismatch between the literal words and the surrounding cue.
For example:
"Great, another delay. Exactly what we needed."
That is sarcasm because the positive wording clashes with the negative situation.
But "Thanks a lot" on its own is not enough. Depending on context, it could be sincere or annoyed, so we would not force a sarcasm label.
The same thinking applied to politeness and stance. For politeness, we looked for visible markers like "please," hedges, apologies, or respectful phrasing. For stance, we told annotators to label the writer's position toward the specific target, not just the overall tone.
That one shift—from interpretation to evidence—was what kept the project consistent and reduced drift.

Apply Rewritable Intent for Neutrality
The occurrence of drift is a common error made by annotators when gathering data about either the emotion identified in a statement by the speaker, or the business purpose behind the statement. The drift between the emotional tone alone, and the business intent behind a given statement can be resolved through implementation of the 'Rewritable Intent' guidelines; that is, if an annotator is able to rephrase an act of sarcasm into the form of an appropriate, business-like request and maintain both the underlying action and purpose of the request as stated in the original text, this will be classified as having a 'neutral intent' rather than having a 'sarcastic' or 'hostile intent'. The enactment of this simple definition of intent completely eliminates drift since it requires humans to view beyond the mental state of the speaker and look for actionable data elements regarding the act itself and all of the associated verbs and nouns.
Training on these rules is accomplished by providing annotators with pairs of examples: the same business request phrased both in a professional manner and one with sarcasm and requiring both of them to have identical labels. Should there be inconsistencies between how both examples map to their respective labels, we will alter our definitions until there are none before we process any real world batches. This separation of the 'how' of the action, from the 'what needs to happen because of this action', will enable us to consistently apply our labels consistently across all dialects/styles/variations of language.

Build a Lean Label Hierarchy
A small, layered set of pragmatics labels makes annotation faster and clearer. Start with broad parent labels that capture core speech functions, then add child labels only when needed. Each label should have a plain meaning, a short rule, and one brief example so that raters decide more quickly.
Overlaps should be handled by clear parent rules that direct which branch to follow. Rare or niche labels can be mapped to the closest parent to prevent label sprawl. Draft a small, layered label tree and pilot it on real data now.
Guide Choices with Checklists and Flows
Built-in checklists and simple decision paths help raters choose the right pragmatics label at the point of work. The tool can ask one clear question at a time and guide the next step based on the answer. Short hints can appear when doubt is likely so that choices stay consistent.
When a case does not fit the path, the flow should allow marking it as unclear and asking for help. Over time, slow or confusing branches can be shortened based on real use. Embed these guided steps in the tool and run a short pilot this week.
Track Agreement and Fix Hotspots
Strong labels lead different raters to the same choice on the same text. Track agreement rates, look for places where choices clash, and fix the rules until those hot spots fade. Short calibration rounds can reveal vague words and unclear tests that slow people down.
Every change should be logged with a short note so progress can be seen across rounds. When results stop improving, pause the edits and confirm strength on new data. Launch a steady review loop and begin tracking agreement now.
Use Assistive Hints with Human Control
Machine help can speed annotation when suggestions act as hints, not commands. The system can show a top label, a brief reason, and a confidence score so raters know when to accept or double check. Easy items can be accepted with one action, while tough items are flagged for human judgment.
Active learning can pick the most useful texts for people, which improves both the model and the labels. Every step should be tracked so audits can see how a label was made. Activate assistive suggestions with human control and evaluate the gains this month.
Show Context and Speaker Details
Pragmatic meaning often depends on nearby context and who is speaking. An annotation tool should show a small window of surrounding turns so that cues like tone and timing are visible. It should also record simple details about the speaker, such as role and relation to others, so later reviews can explain hard calls.
When space is tight, a quick preview can hide distant lines but reveal them on demand. Linking each label to its context view makes checks fair and repeatable. Enable a default context window and simple speaker fields, then test their impact today.
